Skip to main content

Extended Thinking Configuration

Enable extended thinking/reasoning modes for AI models that support deeper reasoning capabilities. This feature allows models to "think through" complex problems before providing a response.

Overview

NeuroLink supports extended thinking/reasoning configuration for models that provide this capability. Extended thinking enables models to perform more thorough reasoning, particularly useful for complex tasks like mathematical proofs, coding problems, and multi-step analysis.

Supported Models

Gemini 3 Models (Google Vertex AI / AI Studio)

  • gemini-3.1-pro - Full thinking support with high token budgets (up to 100,000)
  • gemini-3-flash-preview - Fast thinking with support for "minimal" level (up to 50,000)

Gemini 2.5 Models (Google Vertex AI / AI Studio)

  • gemini-2.5-pro - Supports thinking configuration (up to 32,000 tokens)
  • gemini-2.5-flash - Supports thinking configuration (up to 32,000 tokens)

Claude Models (Anthropic)

All Claude 4.0+ models support extended thinking via budget tokens:

  • claude-sonnet-4-20250514 (Claude Sonnet 4)
  • claude-opus-4-20250514 (Claude Opus 4)
  • claude-opus-4-1-20250805 (Claude Opus 4.1)
  • claude-sonnet-4-5-20250929 (Claude Sonnet 4.5)
  • claude-opus-4-5-20251101 (Claude Opus 4.5)
  • claude-haiku-4-5-20251001 (Claude Haiku 4.5)
  • claude-sonnet-4-6 (Claude Sonnet 4.6)
  • claude-opus-4-6 (Claude Opus 4.6)

Quick Start

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

const result = await neurolink.generate({
input: { text: "Prove that the square root of 2 is irrational" },
provider: "google-ai",
model: "gemini-2.5-flash",
thinkingConfig: { thinkingLevel: "high" },
});

console.log(result.content);

Gemini 3 Thinking Configuration

For Gemini 3 models, use thinkingLevel to control reasoning depth:

const response = await neurolink.generate({
input: { text: "Prove that the square root of 2 is irrational" },
provider: "vertex",
model: "gemini-3-flash-preview",
thinkingConfig: {
thinkingLevel: "high", // 'minimal' | 'low' | 'medium' | 'high'
},
});

Thinking Levels

LevelDescriptionBest For
minimalNear-zero thinking (Flash models only)Simple queries requiring speed
lowFast reasoning for simple tasksQuick analysis, summaries
mediumBalanced reasoning/latency trade-offGeneral-purpose tasks
highMaximum reasoning depthComplex reasoning, math, coding

Maximum Token Budgets by Model

ModelMax Thinking Budget
gemini-3-pro-*100,000 tokens
gemini-3-flash-*50,000 tokens
gemini-2.5-*32,000 tokens
claude-opus-4-6100,000 tokens
claude-sonnet-4-6100,000 tokens
claude-opus-4-5-*100,000 tokens
claude-sonnet-4-5-*100,000 tokens
claude-haiku-4-5-*100,000 tokens
claude-opus-4-1-*100,000 tokens
claude-opus-4-*100,000 tokens
claude-sonnet-4-*100,000 tokens

Anthropic Claude Thinking Configuration

For Claude models, use budgetTokens to set the thinking token budget:

const response = await neurolink.generate({
input: { text: "Solve this complex math problem step by step..." },
provider: "anthropic",
model: "claude-sonnet-4-6",
thinkingConfig: {
enabled: true,
budgetTokens: 10000, // Range: 5000-100000
},
});

Budget Token Guidelines

  • Minimum: 5,000 tokens
  • Maximum: 100,000 tokens
  • Recommended for simple tasks: 5,000-10,000 tokens
  • Recommended for complex reasoning: 20,000-50,000 tokens
  • Maximum depth: 50,000-100,000 tokens

Configuration Options

The thinkingConfig object supports the following options:

thinkingConfig: {
enabled?: boolean; // Enable/disable thinking
type?: "enabled" | "disabled"; // Alternative enable/disable
budgetTokens?: number; // Token budget (Anthropic models)
thinkingLevel?: "minimal" | "low" | "medium" | "high"; // Thinking level (Gemini models)
}

CLI Usage

Extended thinking is also available via the CLI:

# Enable thinking with default settings
neurolink generate "Solve this problem" --thinking

# Set thinking budget for Anthropic
neurolink generate "Complex problem" --provider anthropic --thinking --thinkingBudget 20000

# Set thinking level for Gemini 3
neurolink generate "Complex problem" --provider vertex --model gemini-3-pro-preview --thinkingLevel high

CLI Options

OptionDescriptionDefault
--thinkingEnable extended thinkingfalse
--thinkingBudgetToken budget (Anthropic: 5000-100000)10000
--thinkingLevelThinking level (Gemini 3: minimal, low, medium, high)medium

Best Practices

When to Use High Thinking

  • Complex mathematical proofs and calculations
  • Multi-step coding problems and debugging
  • Detailed analysis requiring multiple considerations
  • Tasks where accuracy is more important than speed

When to Use Low/Minimal Thinking

  • Simple queries where speed matters
  • Straightforward information retrieval
  • Quick summaries and formatting tasks
  • High-volume, latency-sensitive applications

General Guidelines

  1. Start with medium: Use medium as your default and adjust based on results
  2. Match model to task: Use Pro models for complex tasks, Flash for speed
  3. Monitor token usage: Higher thinking levels consume more tokens
  4. Test performance: Compare response quality vs. latency for your use case

Example: Complex Reasoning Task

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

// Complex coding problem with high reasoning
const result = await neurolink.generate({
input: {
text: `
Design an optimal algorithm to find the longest palindromic subsequence
in a string. Explain your approach, prove its correctness, and analyze
the time and space complexity.
`,
},
provider: "vertex",
model: "gemini-3-pro-preview",
thinkingConfig: {
thinkingLevel: "high",
},
maxTokens: 4000,
});

console.log(result.content);

Model Detection Utilities

NeuroLink provides utilities to check thinking support:

import {
supportsThinkingConfig,
getMaxThinkingBudgetTokens,
} from "@juspay/neurolink";

// Check if a model supports thinking
const supports = supportsThinkingConfig("gemini-3-pro-preview"); // true

// Get maximum budget for a model
const maxBudget = getMaxThinkingBudgetTokens("gemini-3-flash-preview"); // 50000

Important Notes

  • Provider compatibility: Thinking configuration is provider-specific. Gemini uses thinkingLevel, Claude uses budgetTokens
  • Token consumption: Extended thinking uses additional tokens beyond the response
  • Latency impact: Higher thinking levels increase response time
  • Not all models support thinking: Check supportsThinkingConfig() before enabling
  • Streaming support: Thinking configuration works with both generate() and stream()

See Also