Skip to main content

OpenAI Compatible Provider Guide

Connect to any OpenAI-compatible API: OpenRouter, vLLM, LocalAI, and more


Overview

The OpenAI Compatible provider enables NeuroLink to work with any service that implements the OpenAI API specification. This includes third-party aggregators like OpenRouter, self-hosted solutions like vLLM, and custom OpenAI-compatible endpoints.

Key Benefits

  • 🌐 Universal Compatibility: Works with any OpenAI-compatible endpoint
  • 🔄 Provider Aggregation: Access multiple providers through one endpoint (OpenRouter)
  • 🏠 Self-Hosted: Run your own models with vLLM, LocalAI
  • 💰 Cost Optimization: Compare pricing across providers
  • 🔧 Custom Endpoints: Integrate proprietary AI services
  • 📊 Auto-Discovery: Automatic model detection via /v1/models endpoint

Supported Services

ServiceDescriptionBest For
OpenRouterAI provider aggregator (100+ models)Multi-provider access
vLLMHigh-performance inference serverSelf-hosted models
LocalAILocal OpenAI alternativePrivacy, offline usage
Text Generation WebUICommunity inference serverLocal LLMs
Custom APIsYour own OpenAI-compatible serviceProprietary models

Quick Start

OpenRouter provides access to 100+ models from multiple providers through a single API.

1. Get OpenRouter API Key

  1. Visit OpenRouter.ai
  2. Sign up for free account
  3. Go to Keys
  4. Create new key
  5. Add credits ($5 minimum)
# Add to .env
OPENAI_COMPATIBLE_BASE_URL=https://openrouter.ai/api/v1
OPENAI_COMPATIBLE_API_KEY=sk-or-v1-your-key-here

3. Test Setup

# Auto-discover available models
npx @juspay/neurolink models --provider openai-compatible

# Generate with specific model
npx @juspay/neurolink generate "Hello from OpenRouter!" \
--provider openai-compatible \
--model "anthropic/claude-3.5-sonnet"

Option 2: vLLM (Self-Hosted)

vLLM is a high-performance inference server for running models locally.

1. Install vLLM

# Install vLLM
pip install vllm

# Start server with a model
python -m vllm.entrypoints.openai.api_server \
--model mistralai/Mistral-7B-Instruct-v0.2 \
--port 8000
# Add to .env
OPENAI_COMPATIBLE_BASE_URL=http://localhost:8000/v1
OPENAI_COMPATIBLE_API_KEY=none # vLLM doesn't require key

3. Test Setup

npx @juspay/neurolink generate "Hello from vLLM!" \
--provider openai-compatible

Option 3: LocalAI (Privacy-Focused)

LocalAI runs completely offline for maximum privacy.

1. Install LocalAI

# Using Docker
docker run -p 8080:8080 \
-v $PWD/models:/models \
localai/localai:latest

# Or install directly
curl https://localai.io/install.sh | sh
OPENAI_COMPATIBLE_BASE_URL=http://localhost:8080/v1
OPENAI_COMPATIBLE_API_KEY=none

Model Auto-Discovery

NeuroLink automatically discovers available models through the /v1/models endpoint.

Discover Available Models

# List all models from endpoint
npx @juspay/neurolink models --provider openai-compatible

SDK Auto-Discovery

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink();

// Discover models programmatically
const models = await ai.listModels("openai-compatible");
console.log("Available models:", models);

// Use discovered model
const result = await ai.generate({
input: { text: "Hello!" },
provider: "openai-compatible",
model: models[0].id, // Use first available model
});

OpenRouter Integration

OpenRouter aggregates 100+ models from multiple providers.

Available Models on OpenRouter

# List all OpenRouter models
npx @juspay/neurolink models --provider openai-compatible

# Popular models available:
# - anthropic/claude-3.5-sonnet
# - openai/gpt-4-turbo
# - google/gemini-pro-1.5
# - meta-llama/llama-3-70b-instruct
# - mistralai/mistral-large

Model Selection by Provider

// Use Claude through OpenRouter
const claude = await ai.generate({
input: { text: "Explain quantum computing" },
provider: "openai-compatible",
model: "anthropic/claude-3.5-sonnet",
});

// Use GPT-4 through OpenRouter
const gpt4 = await ai.generate({
input: { text: "Write a poem" },
provider: "openai-compatible",
model: "openai/gpt-4-turbo",
});

// Use Gemini through OpenRouter
const gemini = await ai.generate({
input: { text: "Analyze this data" },
provider: "openai-compatible",
model: "google/gemini-pro-1.5",
});

OpenRouter Features

// Cost tracking (OpenRouter provides in response)
const result = await ai.generate({
input: { text: "Your prompt" },
provider: "openai-compatible",
model: "anthropic/claude-3.5-sonnet",
enableAnalytics: true,
});

console.log("Tokens used:", result.usage.totalTokens);
console.log("Cost:", result.cost); // OpenRouter returns actual cost

// Provider selection preferences
const result = await ai.generate({
input: { text: "Your prompt" },
provider: "openai-compatible",
model: "openai/gpt-4",
headers: {
"X-Provider-Preferences": "order:cost", // Cheapest first
},
});

vLLM Integration

vLLM provides high-performance inference for self-hosted models.

Starting vLLM Server

# Basic setup
python -m vllm.entrypoints.openai.api_server \
--model mistralai/Mistral-7B-Instruct-v0.2 \
--port 8000

# With GPU optimization
python -m vllm.entrypoints.openai.api_server \
--model mistralai/Mistral-7B-Instruct-v0.2 \
--tensor-parallel-size 2 \ # Multi-GPU
--gpu-memory-utilization 0.9 \
--port 8000

# With quantization for lower memory
python -m vllm.entrypoints.openai.api_server \
--model TheBloke/Mistral-7B-Instruct-v0.2-AWQ \
--quantization awq \
--port 8000
const ai = new NeuroLink({
providers: [
{
name: "openai-compatible",
config: {
baseUrl: "http://localhost:8000/v1",
apiKey: "none", // vLLM doesn't require authentication
defaultModel: "mistralai/Mistral-7B-Instruct-v0.2",
},
},
],
});

// Use vLLM-hosted model
const result = await ai.generate({
input: { text: "Explain Docker containers" },
provider: "openai-compatible",
});

Multiple vLLM Instances

// Load balance across multiple vLLM servers
const ai = new NeuroLink({
providers: [
{
name: "openai-compatible-1",
config: {
baseUrl: "http://server1:8000/v1",
apiKey: "none",
},
priority: 1,
},
{
name: "openai-compatible-2",
config: {
baseUrl: "http://server2:8000/v1",
apiKey: "none",
},
priority: 1,
},
],
loadBalancing: "round-robin",
});

SDK Integration

Basic Usage

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink();

// Simple generation
const result = await ai.generate({
input: { text: "Hello from OpenAI Compatible!" },
provider: "openai-compatible",
});

console.log(result.content);

With Model Selection

// Specify exact model (OpenRouter format)
const result = await ai.generate({
input: { text: "Explain blockchain" },
provider: "openai-compatible",
model: "anthropic/claude-3.5-sonnet",
});

// Or use auto-discovered model
const models = await ai.listModels("openai-compatible");
const result = await ai.generate({
input: { text: "Your prompt" },
provider: "openai-compatible",
model: models[0].id,
});

Streaming

// Stream responses for better UX
for await (const chunk of ai.stream({
input: { text: "Write a long story" },
provider: "openai-compatible",
model: "anthropic/claude-3.5-sonnet",
})) {
process.stdout.write(chunk.content);
}

Custom Headers

// Pass custom headers (e.g., for OpenRouter)
const result = await ai.generate({
input: { text: "Your prompt" },
provider: "openai-compatible",
headers: {
"HTTP-Referer": "https://your-app.com",
"X-Title": "YourApp",
"X-Provider-Preferences": "order:cost",
},
});

Error Handling

try {
const result = await ai.generate({
input: { text: "Your prompt" },
provider: "openai-compatible",
model: "non-existent-model",
});
} catch (error) {
if (error.message.includes("model not found")) {
// List available models
const models = await ai.listModels("openai-compatible");
console.log(
"Available models:",
models.map((m) => m.id),
);
} else if (error.message.includes("connection")) {
console.error("Cannot connect to endpoint");
} else {
throw error;
}
}

CLI Usage

Basic Commands

# Generate with default model
npx @juspay/neurolink generate "Hello world" --provider openai-compatible

# Use specific model
npx @juspay/neurolink gen "Write code" \
--provider openai-compatible \
--model "anthropic/claude-3.5-sonnet"

# Stream response
npx @juspay/neurolink stream "Tell a story" \
--provider openai-compatible

# List available models
npx @juspay/neurolink models --provider openai-compatible

OpenRouter-Specific Commands

# Use cheap models for cost optimization
npx @juspay/neurolink gen "Customer support query" \
--provider openai-compatible \
--model "meta-llama/llama-3-8b-instruct" # Cheap

# Use premium models for complex tasks
npx @juspay/neurolink gen "Complex analysis task" \
--provider openai-compatible \
--model "anthropic/claude-3-opus" # Premium

Configuration Options

Environment Variables

# Required
OPENAI_COMPATIBLE_BASE_URL=https://openrouter.ai/api/v1
OPENAI_COMPATIBLE_API_KEY=sk-or-v1-your-key

# Optional
OPENAI_COMPATIBLE_MODEL=anthropic/claude-3.5-sonnet # Default model
OPENAI_COMPATIBLE_TIMEOUT=60000 # Timeout (ms)
OPENAI_COMPATIBLE_VERIFY_SSL=true # SSL verification

Programmatic Configuration

const ai = new NeuroLink({
providers: [
{
name: "openai-compatible",
config: {
baseUrl: process.env.OPENAI_COMPATIBLE_BASE_URL,
apiKey: process.env.OPENAI_COMPATIBLE_API_KEY,
defaultModel: "anthropic/claude-3.5-sonnet",
timeout: 60000,
headers: {
"HTTP-Referer": "https://yourapp.com",
"X-Title": "YourApp",
},
},
},
],
});

Use Cases

1. Multi-Provider Access via OpenRouter

// Access multiple providers through one endpoint
const providers = {
claude: "anthropic/claude-3.5-sonnet",
gpt4: "openai/gpt-4-turbo",
gemini: "google/gemini-pro-1.5",
llama: "meta-llama/llama-3-70b-instruct",
};

for (const [name, model] of Object.entries(providers)) {
const result = await ai.generate({
input: { text: "Explain quantum computing in one sentence" },
provider: "openai-compatible",
model,
});
console.log(`${name}: ${result.content}`);
}

2. Self-Hosted Private Models

// Complete privacy with local vLLM
const privateAI = new NeuroLink({
providers: [
{
name: "openai-compatible",
config: {
baseUrl: "http://localhost:8000/v1",
apiKey: "none",
},
},
],
});

// Process sensitive data locally
const result = await privateAI.generate({
input: { text: sensitiveData },
provider: "openai-compatible",
});
// Data never leaves your infrastructure

3. Cost Optimization

// Compare costs across providers via OpenRouter
async function generateCheapest(prompt: string) {
const models = [
{
name: "llama-3-8b",
model: "meta-llama/llama-3-8b-instruct",
costPer1M: 0.2,
},
{
name: "mistral-7b",
model: "mistralai/mistral-7b-instruct",
costPer1M: 0.15,
},
{ name: "gemma-7b", model: "google/gemma-7b-it", costPer1M: 0.1 },
];

// Sort by cost
models.sort((a, b) => a.costPer1M - b.costPer1M);

// Try cheapest first
for (const { model } of models) {
try {
return await ai.generate({
input: { text: prompt },
provider: "openai-compatible",
model,
});
} catch (error) {
continue; // Try next model
}
}
}

Troubleshooting

Common Issues

1. "Connection refused"

Problem: Endpoint is not accessible.

Solution:

# Test endpoint manually (local development)
curl http://localhost:8000/v1/models

# Test endpoint manually (production - always use HTTPS)
curl https://your-production-endpoint.com/v1/models

# Check if server is running
ps aux | grep vllm

# Verify firewall allows connection
telnet localhost 8000

2. "Model not found"

Problem: Model ID is incorrect or not available.

Solution:

# List available models first
npx @juspay/neurolink models --provider openai-compatible

# Use exact model ID from list
npx @juspay/neurolink gen "test" \
--provider openai-compatible \
--model "exact-model-id-from-list"

3. "Invalid API key"

Problem: API key format is incorrect (OpenRouter).

Solution:

# OpenRouter keys start with sk-or-v1-
OPENAI_COMPATIBLE_API_KEY=sk-or-v1-your-key # ✅ Correct

# For local servers, use 'none' or empty string
OPENAI_COMPATIBLE_API_KEY=none # ✅ For vLLM

Best Practices

1. Model Discovery

// ✅ Good: Auto-discover models on startup
const models = await ai.listModels("openai-compatible");
console.log(
"Available models:",
models.map((m) => m.id),
);

// Cache model list
const modelCache = new Map();
modelCache.set("openai-compatible", models);

2. Endpoint Health Checks

// ✅ Good: Verify endpoint before use
async function healthCheck() {
try {
const models = await ai.listModels("openai-compatible");
return models.length > 0;
} catch (error) {
return false;
}
}

if (await healthCheck()) {
// Use provider
} else {
// Fall back to alternative
}

3. Cost Tracking

// ✅ Good: Track usage with OpenRouter
const result = await ai.generate({
input: { text: prompt },
provider: "openai-compatible",
enableAnalytics: true,
});

await costTracker.record({
provider: "openrouter",
model: result.model,
tokens: result.usage.totalTokens,
cost: result.cost,
});


Additional Resources


Need Help? Join our GitHub Discussions or open an issue.