⚙️ Provider Configuration Guide
NeuroLink supports multiple AI providers with flexible authentication methods. This guide covers complete setup for all supported providers.
Supported Providers
- OpenAI - GPT-4o, GPT-4o-mini, GPT-4-turbo
- Amazon Bedrock - Claude 3.7 Sonnet, Claude 3.5 Sonnet, Claude 3 Haiku
- Amazon SageMaker - Custom models deployed on SageMaker endpoints
- Google Vertex AI - Gemini 3 Flash/Pro (preview), Gemini 2.5 Flash, Claude 4.0 Sonnet
- Google AI Studio - Gemini 1.5 Pro, Gemini 2.0 Flash, Gemini 1.5 Flash
- Anthropic - Claude 4.5 Opus/Sonnet/Haiku, Claude 4.0 Opus/Sonnet, Claude 3.7 Sonnet
- Azure OpenAI - GPT-4, GPT-3.5-Turbo
- LiteLLM - 100+ models from all providers via proxy server
- Hugging Face - 100,000+ open source models including DialoGPT, GPT-2, GPT-Neo
- Ollama - Local AI models including Llama 2, Code Llama, Mistral, Vicuna
- Mistral AI - Mistral Tiny, Small, Medium, and Large models
- DeepSeek - deepseek-chat (V3) and deepseek-reasoner (R1)
- NVIDIA NIM - Llama 3.3 70B and 400+ catalog models via NVIDIA hosted or self-hosted NIM
- LM Studio - Any model loaded in LM Studio desktop app (local, no API key required)
- llama.cpp - Any GGUF model served by llama-server (local, no API key required)
💰 Model Availability & Cost Considerations
Important Notes:
- Model Availability: Specific models may not be available in all regions or require special access
- Cost Variations: Pricing differs significantly between providers and models (e.g., Claude 3.5 Sonnet vs GPT-4o)
- Rate Limits: Each provider has different rate limits and quota restrictions
- Local vs Cloud: Ollama (local) has no per-request cost but requires hardware resources
- Enterprise Tiers: AWS Bedrock, Google Vertex AI, and Azure typically offer enterprise pricing
Best Practices:
- Use
new NeuroLink()with automatic provider selection for cost-optimized routing - Monitor usage through built-in analytics to track costs
- Consider local models (Ollama) for development and testing
- Check provider documentation for current pricing and availability
🏢 Enterprise Proxy Support
All providers support corporate proxy environments automatically. Simply set environment variables:
export HTTPS_PROXY=http://your-corporate-proxy:port
export HTTP_PROXY=http://your-corporate-proxy:port
No code changes required - NeuroLink automatically detects and uses proxy settings.
For detailed proxy setup → See Enterprise & Proxy Setup Guide
OpenAI Configuration
Basic Setup
export OPENAI_API_KEY="sk-your-openai-api-key"
Optional Configuration
export OPENAI_MODEL="gpt-4o" # Default model to use
Supported Models
gpt-4o(default) - Latest multimodal modelgpt-4o-mini- Cost-effective variantgpt-4-turbo- High-performance model
Usage Example
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
const result = await neurolink.generate({
input: { text: "Explain machine learning" },
provider: "openai",
model: "gpt-4o",
temperature: 0.7,
maxTokens: 500,
timeout: "30s", // Optional: Override default 30s timeout
});
Timeout Configuration
- Default Timeout: 30 seconds
- Supported Formats: Milliseconds (
30000), human-readable ('30s','1m','5m') - Environment Variable:
OPENAI_TIMEOUT='45s'(optional)
Amazon Bedrock Configuration
🚨 Critical Setup Requirements
⚠️ IMPORTANT: Anthropic Models Require Inference Profile ARN
For Anthropic Claude models in Bedrock, you MUST use the full inference profile ARN, not simple model names:
# ✅ CORRECT: Use full inference profile ARN
export BEDROCK_MODEL="arn:aws:bedrock:us-east-2:<account_id>:inference-profile/us.anthropic.claude-3-7-sonnet-20250219-v1:0"
# ❌ WRONG: Simple model names cause "not authorized to invoke this API" errors
# export BEDROCK_MODEL="anthropic.claude-3-sonnet-20240229-v1:0"
Basic AWS Credentials
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION="us-east-2"
Session Token Support (Development)
For temporary credentials (common in development environments):
export AWS_SESSION_TOKEN="your-session-token" # Required for temporary credentials
Available Inference Profile ARNs
Replace <account_id> with your AWS account ID:
# Claude 3.7 Sonnet (Latest - Recommended)
BEDROCK_MODEL="arn:aws:bedrock:us-east-2:<account_id>:inference-profile/us.anthropic.claude-3-7-sonnet-20250219-v1:0"
# Claude 3.5 Sonnet
BEDROCK_MODEL="arn:aws:bedrock:us-east-2:<account_id>:inference-profile/us.anthropic.claude-3-5-sonnet-20241022-v2:0"
# Claude 3 Haiku
BEDROCK_MODEL="arn:aws:bedrock:us-east-2:<account_id>:inference-profile/us.anthropic.claude-3-haiku-20240307-v1:0"
Why Inference Profiles?
- Cross-Region Access: Faster access across AWS regions
- Better Performance: Optimized routing and response times
- Higher Availability: Improved model availability and reliability
- Different Permissions: Separate permission model from base models
Complete Bedrock Configuration
# Required AWS credentials
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION="us-east-2"
# Optional: Session token for temporary credentials
export AWS_SESSION_TOKEN="your-session-token"
# Required: Inference profile ARN (not simple model name)
export BEDROCK_MODEL="arn:aws:bedrock:us-east-2:<account_id>:inference-profile/us.anthropic.claude-3-7-sonnet-20250219-v1:0"
# Alternative environment variable names (backward compatibility)
export BEDROCK_MODEL_ID="arn:aws:bedrock:us-east-2:<account_id>:inference-profile/us.anthropic.claude-3-7-sonnet-20250219-v1:0"
Usage Example
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
const result = await neurolink.generate({
input: { text: "Write a haiku about AI" },
provider: "bedrock",
temperature: 0.8,
maxTokens: 100,
timeout: "45s", // Optional: Override default 45s timeout
});
Timeout Configuration
- Default Timeout: 45 seconds (longer due to cold starts)
- Supported Formats: Milliseconds (
45000), human-readable ('45s','1m','2m') - Environment Variable:
BEDROCK_TIMEOUT='1m'(optional)
Account Setup Requirements
To use AWS Bedrock, ensure your AWS account has:
- Bedrock Service Access: Enable Bedrock in your AWS region
- Model Access: Request access to Anthropic Claude models
- IAM Permissions: Your credentials need
bedrock:InvokeModelpermissions - Inference Profile Access: Access to the specific inference profiles
IAM Policy Example
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": ["arn:aws:bedrock:*:*:inference-profile/us.anthropic.*"]
}
]
}
Amazon SageMaker Configuration
Amazon SageMaker allows you to use your own custom models deployed on SageMaker endpoints. This provider is perfect for:
- Custom Model Hosting - Deploy your fine-tuned models
- Enterprise Compliance - Full control over model infrastructure
- Cost Optimization - Pay only for inference usage
- Performance - Dedicated compute resources
Basic AWS Credentials
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_REGION="us-east-1" # Your SageMaker region
SageMaker-Specific Configuration
# Required: Your SageMaker endpoint name
export SAGEMAKER_DEFAULT_ENDPOINT="your-endpoint-name"
# Optional: Timeout and retry settings
export SAGEMAKER_TIMEOUT="30000" # 30 seconds (default)
export SAGEMAKER_MAX_RETRIES="3" # Retry attempts (default)
Advanced Model Configuration
# Optional: Model-specific settings
export SAGEMAKER_MODEL="custom-model-name" # Model identifier
export SAGEMAKER_MODEL_TYPE="custom" # Model type
export SAGEMAKER_CONTENT_TYPE="application/json"
export SAGEMAKER_ACCEPT="application/json"
Session Token Support (for IAM Roles)
export AWS_SESSION_TOKEN="your-session-token" # For temporary credentials
Complete SageMaker Configuration
# AWS Credentials
export AWS_ACCESS_KEY_ID="AKIAIOSFODNN7EXAMPLE"
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
export AWS_REGION="us-east-1"
# SageMaker Settings
export SAGEMAKER_DEFAULT_ENDPOINT="my-model-endpoint-2024"
export SAGEMAKER_TIMEOUT="45000"
export SAGEMAKER_MAX_RETRIES="5"
Usage Example
# Test SageMaker endpoint
npx @juspay/neurolink sagemaker test my-endpoint
# Generate text with SageMaker
npx @juspay/neurolink generate "Analyze this data" --provider sagemaker
# Interactive setup
npx @juspay/neurolink sagemaker setup
CLI Commands
# Check SageMaker configuration
npx @juspay/neurolink sagemaker status
# Validate connection
npx @juspay/neurolink sagemaker validate
# Show current configuration
npx @juspay/neurolink sagemaker config
# Performance benchmark
npx @juspay/neurolink sagemaker benchmark my-endpoint
# List available endpoints (requires AWS CLI)
npx @juspay/neurolink sagemaker list-endpoints
Timeout Configuration
Configure request timeouts for SageMaker endpoints:
export SAGEMAKER_TIMEOUT="60000" # 60 seconds for large models
Prerequisites
- SageMaker Endpoint: Deploy a model to SageMaker and get the endpoint name
- AWS IAM Permissions: Ensure your credentials have
sagemaker:InvokeEndpointpermission - Endpoint Status: Endpoint must be in "InService" status
IAM Policy Example
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["sagemaker:InvokeEndpoint"],
"Resource": "arn:aws:sagemaker:*:*:endpoint/*"
}
]
}
Environment Variables Reference
| Variable | Required | Default | Description |
|---|---|---|---|
AWS_ACCESS_KEY_ID | ✅ | - | AWS access key |
AWS_SECRET_ACCESS_KEY | ✅ | - | AWS secret key |
AWS_REGION | ✅ | us-east-1 | AWS region |
SAGEMAKER_DEFAULT_ENDPOINT | ✅ | - | SageMaker endpoint name |
SAGEMAKER_TIMEOUT | ❌ | 30000 | Request timeout (ms) |
SAGEMAKER_MAX_RETRIES | ❌ | 3 | Retry attempts |
AWS_SESSION_TOKEN | ❌ | - | For temporary credentials |
📖 Complete SageMaker Guide
For comprehensive SageMaker setup, advanced features, and production deployment: 📖 Complete SageMaker Integration Guide - Includes:
- Model deployment examples
- Cost optimization strategies
- Enterprise security patterns
- Multi-model endpoint management
- Performance testing and monitoring
- Troubleshooting and debugging
Google Vertex AI Configuration
NeuroLink supports three authentication methods for Google Vertex AI to accommodate different deployment environments:
Method 1: Service Account File (Recommended for Production)
Best for production environments where you can store service account files securely.
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
export GOOGLE_VERTEX_PROJECT="your-project-id"
export GOOGLE_VERTEX_LOCATION="us-central1"
Setup Steps:
- Create a service account in Google Cloud Console
- Download the service account JSON file
- Set the file path in
GOOGLE_APPLICATION_CREDENTIALS
Method 2: Service Account JSON String (Good for Containers/Cloud)
Best for containerized environments where file storage is limited.
export GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account","project_id":"your-project",...}'
export GOOGLE_VERTEX_PROJECT="your-project-id"
export GOOGLE_VERTEX_LOCATION="us-central1"
Setup Steps:
- Copy the entire contents of your service account JSON file
- Set it as a single-line string in
GOOGLE_SERVICE_ACCOUNT_KEY - NeuroLink will automatically create a temporary file for authentication
Method 3: Individual Environment Variables (Good for CI/CD)
Best for CI/CD pipelines where individual secrets are managed separately.
export GOOGLE_AUTH_CLIENT_EMAIL="[email protected]"
export GOOGLE_AUTH_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\nMIIE..."
export GOOGLE_VERTEX_PROJECT="your-project-id"
export GOOGLE_VERTEX_LOCATION="us-central1"
Setup Steps:
- Extract
client_emailandprivate_keyfrom your service account JSON - Set them as individual environment variables
- NeuroLink will automatically assemble them into a temporary service account file
Authentication Detection
NeuroLink automatically detects and uses the best available authentication method in this order:
- File Path (
GOOGLE_APPLICATION_CREDENTIALS) - if file exists - JSON String (
GOOGLE_SERVICE_ACCOUNT_KEY) - if provided - Individual Variables (
GOOGLE_AUTH_CLIENT_EMAIL+GOOGLE_AUTH_PRIVATE_KEY) - if both provided
Complete Vertex AI Configuration
# Required for all methods
export GOOGLE_VERTEX_PROJECT="your-gcp-project-id"
# Optional
export GOOGLE_VERTEX_LOCATION="us-east5" # Default: us-east5
export VERTEX_MODEL_ID="claude-sonnet-4@20250514" # Default model
# Choose ONE authentication method:
# Method 1: Service Account File
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account.json"
# Method 2: Service Account JSON String
export GOOGLE_SERVICE_ACCOUNT_KEY='{"type":"service_account","project_id":"your-project","private_key_id":"...","private_key":"-----BEGIN PRIVATE KEY-----\n...","client_email":"...","client_id":"...","auth_uri":"https://accounts.google.com/o/oauth2/auth","token_uri":"https://oauth2.googleapis.com/token","auth_provider_x509_cert_url":"https://www.googleapis.com/oauth2/v1/certs","client_x509_cert_url":"..."}'
# Method 3: Individual Environment Variables
export GOOGLE_AUTH_CLIENT_EMAIL="[email protected]"
export GOOGLE_AUTH_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----\nMIIEvQIBADANBgkqhkiG9w0BAQEFAASCBKcwggSjAgEAAoIBAQC...\n-----END PRIVATE KEY-----"
Usage Example
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
const result = await neurolink.generate({
input: { text: "Explain quantum computing" },
provider: "vertex",
model: "gemini-2.5-flash",
temperature: 0.6,
maxTokens: 800,
timeout: "1m", // Optional: Override default 60s timeout
});
Timeout Configuration
- Default Timeout: 60 seconds (longer due to GCP initialization)
- Supported Formats: Milliseconds (
60000), human-readable ('60s','1m','2m') - Environment Variable:
VERTEX_TIMEOUT='90s'(optional)
Supported Models
Gemini 3 (Preview):
gemini-3-flash-preview- Latest Gemini 3 Flash with extended thinking supportgemini-3-pro-preview- Latest Gemini 3 Pro with extended thinking support
Gemini 2.x:
gemini-2.5-flash(default) - Fast, efficient model
Anthropic Models:
claude-sonnet-4@20250514- High-quality reasoning (Anthropic via Vertex AI)
Video Generation:
veo-3.1/veo-3.1-generate-001- Video generation from image + text prompt (8-second videos with audio)
Video Generation: Use
output.mode: "video"with Veo 3.1 to generate videos. See Video Generation Guide.
PPT Generation: Use
output.mode: "ppt"with supported providers (Vertex AI, Google AI, OpenAI, Anthropic, Azure OpenAI, or Bedrock) and compatible text models to generate PowerPoint presentations. See PPT Generation Guide.
Gemini 3 Extended Thinking Configuration
Gemini 3 models support extended thinking (also known as "thinking mode"), which allows the model to reason more deeply before providing responses. This is particularly useful for complex reasoning tasks, math problems, and multi-step analysis.
Environment Variables for Gemini 3
# Required: Google Vertex AI credentials (same as above)
export GOOGLE_VERTEX_PROJECT="your-project-id"
export GOOGLE_VERTEX_LOCATION="us-central1"
# Gemini 3 model selection
export VERTEX_MODEL_ID="gemini-3-flash-preview" # or gemini-3-pro-preview
Extended Thinking Configuration
Configure thinking level to control how much reasoning the model performs:
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
// Enable extended thinking with thinkingLevel configuration
const result = await neurolink.generate({
input: { text: "Solve this complex math problem step by step: ..." },
provider: "vertex",
model: "gemini-3-flash-preview",
temperature: 0.7,
maxTokens: 4000,
// Gemini 3 extended thinking configuration
thinkingLevel: "medium", // Options: "minimal", "low", "medium", "high"
});
Thinking Levels
| Level | Description | Best For |
|---|---|---|
minimal | No extended thinking, fastest responses | Simple queries, quick answers |
low | Brief reasoning before responding | Moderate complexity tasks |
medium | Balanced reasoning depth (recommended) | Most use cases |
high | Deep reasoning, thorough analysis | Complex math, multi-step problems |
Usage Example with Extended Thinking
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
// Complex reasoning task with high thinking level
const result = await neurolink.generate({
input: {
text: "Analyze the following business scenario and provide strategic recommendations...",
},
provider: "vertex",
model: "gemini-3-pro-preview",
thinkingLevel: "high",
maxTokens: 8000,
timeout: "2m", // Extended timeout for deep thinking
});
console.log(result.content);
CLI Usage with Gemini 3
# Generate with Gemini 3 Flash
npx @juspay/neurolink generate "Explain quantum computing" --provider vertex --model gemini-3-flash-preview
# Stream with Gemini 3 Pro
npx @juspay/neurolink stream "Write a detailed analysis" --provider vertex --model gemini-3-pro-preview
Claude Sonnet 4 via Vertex AI Configuration
NeuroLink provides first-class support for Claude Sonnet 4 through Google Vertex AI. This configuration has been thoroughly tested and verified working.
Working Configuration Example
# ✅ VERIFIED WORKING CONFIGURATION
export GOOGLE_VERTEX_PROJECT="your-project-id"
export GOOGLE_VERTEX_LOCATION="us-east5"
export GOOGLE_AUTH_CLIENT_EMAIL="[email protected]"
export GOOGLE_AUTH_PRIVATE_KEY="-----BEGIN PRIVATE KEY-----
[Your private key content here]
-----END PRIVATE KEY-----"
Performance Metrics (Verified)
- Generation Response: ~2.6 seconds
- Health Check: Working status detection
- Streaming: Fully functional
- Tool Integration: Ready for MCP tools
Usage Examples
# Generation test
node dist/cli/index.js generate "test" --provider vertex --model claude-sonnet-4@20250514
# Streaming test
node dist/cli/index.js stream "Write a short poem" --provider vertex --model claude-sonnet-4@20250514
# Health check
node dist/cli/index.js status
# Expected: vertex: ✅ Working (2599ms)
Google Cloud Setup Requirements
To use Google Vertex AI, ensure your Google Cloud project has:
- Vertex AI API Enabled: Enable the Vertex AI API in your project
- Service Account: Create a service account with Vertex AI permissions
- Model Access: Ensure access to the models you want to use
- Billing Enabled: Vertex AI requires an active billing account
Service Account Permissions
Your service account needs these IAM roles:
Vertex AI UserorVertex AI AdminService Account Token Creator(if using impersonation)
Google AI Studio Configuration
Google AI Studio provides direct access to Google's Gemini models with a simple API key authentication.
Basic Setup
export GOOGLE_AI_API_KEY="AIza-your-google-ai-api-key"
Optional Configuration
export GOOGLE_AI_MODEL="gemini-2.5-pro" # Default model to use
Supported Models
gemini-2.5-pro- Comprehensive, detailed responses for complex tasksgemini-2.5-flash(recommended) - Fast, efficient responses for most tasks
Usage Example
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
const result = await neurolink.generate({
input: { text: "Explain the future of AI" },
provider: "google-ai",
model: "gemini-2.5-flash",
temperature: 0.7,
maxTokens: 1000,
timeout: "30s", // Optional: Override default 30s timeout
});
Timeout Configuration
- Default Timeout: 30 seconds
- Supported Formats: Milliseconds (
30000), human-readable ('30s','1m','5m') - Environment Variable:
GOOGLE_AI_TIMEOUT='45s'(optional)
How to Get Google AI Studio API Key
- Visit Google AI Studio: Go to aistudio.google.com
- Sign In: Use your Google account credentials
- Create API Key:
- Navigate to the API Keys section
- Click Create API Key
- Copy the generated key (starts with
AIza)
- Set Environment: Add to your
.envfile or export directly
Google AI Studio vs Vertex AI
| Feature | Google AI Studio | Google Vertex AI |
|---|---|---|
| Setup Complexity | 🟢 Simple (API key only) | 🟡 Complex (Service account) |
| Authentication | API key | Service account JSON |
| Free Tier | ✅ Generous free limits | ❌ Pay-per-use only |
| Enterprise Features | ❌ Limited | ✅ Full enterprise support |
| Model Selection | 🎯 Latest Gemini models | 🔄 Broader model catalog |
| Best For | Prototyping, small projects | Production, enterprise apps |
Complete Google AI Studio Configuration
# Required: API key from Google AI Studio (choose one)
export GOOGLE_AI_API_KEY="AIza-your-google-ai-api-key"
# OR
export GOOGLE_GENERATIVE_AI_API_KEY="AIza-your-google-ai-api-key"
# Optional: Default model selection
export GOOGLE_AI_MODEL="gemini-2.5-pro"
Rate Limits and Quotas
Google AI Studio includes generous free tier limits:
- Free Tier: 15 requests per minute, 1,500 requests per day
- Paid Usage: Higher limits available with billing enabled
- Model-Specific: Different models may have different rate limits
Error Handling for Google AI Studio
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
try {
const result = await neurolink.generate({
input: { text: "Generate a creative story" },
provider: "google-ai",
temperature: 0.8,
maxTokens: 500,
});
console.log(result.content);
} catch (error) {
if (error.message.includes("API_KEY_INVALID")) {
console.error(
"Invalid Google AI API key. Check your GOOGLE_AI_API_KEY environment variable.",
);
} else if (error.message.includes("QUOTA_EXCEEDED")) {
console.error("Rate limit exceeded. Wait before making more requests.");
} else {
console.error("Google AI Studio error:", error.message);
}
}
Security Considerations
- API Key Security: Treat API keys as sensitive credentials
- Environment Variables: Never commit API keys to version control
- Rate Limiting: Implement client-side rate limiting for production apps
- Monitoring: Monitor usage to avoid unexpected charges
LiteLLM Configuration
LiteLLM provides access to 100+ models through a unified proxy server, allowing you to use any AI provider through a single interface.
Prerequisites
- Install LiteLLM:
pip install litellm
- Start LiteLLM proxy server:
# Basic usage
litellm --port 4000
# With configuration file (recommended)
litellm --config litellm_config.yaml --port 4000
Basic Setup
export LITELLM_BASE_URL="http://localhost:4000"
export LITELLM_API_KEY="sk-anything" # Optional, any value works
Optional Configuration
export LITELLM_MODEL="openai/gpt-4o-mini" # Default model to use
Supported Model Formats
LiteLLM uses the provider/model format:
# OpenAI models
openai/gpt-4o
openai/gpt-4o-mini
openai/gpt-4
# Anthropic models
anthropic/claude-3-5-sonnet
anthropic/claude-3-haiku
# Google models
google/gemini-2.0-flash
vertex_ai/gemini-pro
# Mistral models
mistral/mistral-large
mistral/mixtral-8x7b
# And many more...
LiteLLM Configuration File (Optional)
Create litellm_config.yaml for advanced configuration:
model_list:
- model_name: openai/gpt-4o
litellm_params:
model: gpt-4o
api_key: os.environ/OPENAI_API_KEY
- model_name: anthropic/claude-3-5-sonnet
litellm_params:
model: claude-3-5-sonnet-20241022
api_key: os.environ/ANTHROPIC_API_KEY
- model_name: google/gemini-2.0-flash
litellm_params:
model: gemini-2.0-flash
api_key: os.environ/GOOGLE_AI_API_KEY
Usage Example
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
// Use LiteLLM provider with specific model
const result = await neurolink.generate({
input: { text: "Explain quantum computing" },
provider: "litellm",
model: "openai/gpt-4o",
temperature: 0.7,
});
console.log(result.content);
Advanced Features
- Cost Tracking: Built-in usage and cost monitoring
- Load Balancing: Automatic failover between providers
- Rate Limiting: Built-in rate limiting and retry logic
- Caching: Optional response caching for efficiency
Production Considerations
- Deployment: Run LiteLLM proxy as a separate service
- Security: Configure authentication for production environments
- Scaling: Use Docker/Kubernetes for high-availability deployments
- Monitoring: Enable logging and metrics collection
Hugging Face Configuration
Basic Setup
export HUGGINGFACE_API_KEY="hf_your_token_here"
Optional Configuration
export HUGGINGFACE_MODEL="microsoft/DialoGPT-medium" # Default model
Model Selection Strategy
Hugging Face hosts 100,000+ models. Choose based on:
- Task: text-generation, conversational, code
- Size: Larger models = better quality but slower
- License: Check model licenses for commercial use
Rate Limiting
- Free tier: Limited requests
- PRO tier: Higher limits
- Handle 503 errors (model loading) with retry logic
Usage Example
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
const result = await neurolink.generate({
input: { text: "Explain machine learning" },
provider: "huggingface",
model: "gpt2",
temperature: 0.8,
maxTokens: 200,
timeout: "45s", // Optional: Override default 30s timeout
});
Timeout Configuration
- Default Timeout: 30 seconds
- Supported Formats: Milliseconds (
30000), human-readable ('30s','1m','5m') - Environment Variable:
HUGGINGFACE_TIMEOUT='45s'(optional) - Note: Model loading may take additional time on first request
Popular Models
microsoft/DialoGPT-medium(default) - Conversational AIgpt2- Classic GPT-2distilgpt2- Lightweight GPT-2EleutherAI/gpt-neo-2.7B- Large open modelbigscience/bloom-560m- Multilingual model