AI Provider Guides
Complete setup guides for all supported AI providers.
🆓 Free Tier Providers
Start with zero cost using these free-tier options:
Hugging Face
100,000+ open-source models
- ✅ Free inference API
- 🌍 Largest model collection
- 🔓 Fully open source
- 📊 Models by task: chat, classification, NER, summarization
Google AI Studio
Gemini models with generous free tier
- ✅ 1,500 requests/day free
- ⚡ Fast Gemini 2.0 Flash
- 🎯 15 requests/minute
- 💰 Pay-as-you-go option
🤖 Direct AI Providers
Access leading AI models directly from their creators:
Anthropic
Claude models with API key or OAuth authentication
- 🧠 Claude 4.5 Opus/Sonnet/Haiku, Claude 4.0 Opus/Sonnet
- 🔐 API key or OAuth (Pro/Max subscription)
- 💭 Extended thinking for deep reasoning
- 📄 200K context window, multimodal support
🏢 Enterprise Providers
Production-grade providers for enterprise deployments:
Azure OpenAI
Enterprise AI with Microsoft Azure
- 🔒 SOC2, HIPAA, ISO 27001 compliant
- 🌍 Multi-region deployment (30+ regions)
- 🛡️ Private endpoints with VNet
- 💼 Enterprise SLAs
Google Vertex AI
Google Cloud ML platform
- ☁️ GCP integration
- 🔐 IAM, VPC, service accounts
- 🌏 Global deployment
- 🎯 Gemini, PaLM, Codey models
AWS Bedrock
Serverless AI on AWS
- 📦 13 foundation models (Claude, Llama, Mistral)
- 🔐 IAM, VPC integration
- 🌍 Multi-region (us-east-1, eu-west-1, ap-southeast-1)
- 💰 Pay-per-use pricing
🌍 Compliance-Focused
Providers with specific compliance certifications:
Mistral AI
European AI with GDPR compliance
- 🇪🇺 EU data residency
- ✅ GDPR compliant by default
- 🔓 Open source models
- 💰 Cost-effective
🧑💻 Hosted Inference Providers
Access frontier models via hosted cloud inference APIs:
DeepSeek
deepseek-chat (V3) and deepseek-reasoner (R1)
- 🧠 deepseek-chat — high-quality general chat at low cost
- 💭 deepseek-reasoner — R1 chain-of-thought reasoning model
- 🔑 API key from platform.deepseek.com
- 🔄 Aliases:
ds
NVIDIA NIM
400+ models via NVIDIA's hosted and self-hosted inference platform
- 🚀 Llama 3.3 70B Instruct (default), Mistral, Nemotron, and 400+ catalog models
- 🔧 NIM-specific extras: top_k, min_p, repetition_penalty, reasoning_budget
- 🔑 API key from build.nvidia.com
- 🖥️ Also supports self-hosted NIM endpoints via
NVIDIA_NIM_BASE_URL - 🔄 Aliases:
nim,nvidia
💻 Local Providers
Run models entirely on your own hardware — no API key or internet required for inference:
LM Studio
Run any supported model locally with a GUI app
- 🖥️ Download and run models via the LM Studio desktop application
- 🔍 Auto-discovers the loaded model from
/v1/models(no model name required) - 🌐 OpenAI-compatible API at
http://localhost:1234/v1by default - 🆓 No API key needed for local use (key optional for reverse-proxy setups)
- 🔄 Aliases:
lmstudio,lms
llama.cpp
High-performance local inference via llama-server
- ⚡ Run GGUF models with llama-server at
http://localhost:8080/v1by default - 🔍 Auto-discovers the loaded model from
/v1/models - 🛠️ Tool support requires
--jinjaflag when starting llama-server - 🆓 No API key needed for local use (key optional for reverse-proxy setups)
- 🔄 Aliases:
llama.cpp
🔌 Aggregators & Proxies
Access multiple providers through unified interfaces:
OpenRouter
300+ models from 60+ providers
- 🌐 Single API for all major providers (Anthropic, OpenAI, Google, Meta, etc.)
- ⚡ Automatic failover and routing
- 💰 Competitive pricing with cost optimization
- 🎯 Zero lock-in - switch models instantly
- 📊 Usage tracking dashboard
- 🆓 Free models available
OpenAI Compatible
OpenRouter, vLLM, LocalAI, and more
- 🌐 100+ models through OpenRouter
- 💻 Local deployment with vLLM
- 🔓 Self-hosted with LocalAI
- 🔄 Drop-in OpenAI replacement
LiteLLM
100+ providers through proxy
- 🔄 Unified API for 100+ providers
- 📊 Load balancing and fallbacks
- 💰 Cost tracking
- 🎯 Model routing
🎙️ Voice Providers
Synthesize speech, transcribe audio, or run live voice sessions. Voice providers are separate from LLM providers — they handle audio I/O rather than text generation.
Text-to-Speech (TTS)
OpenAI TTS
Highest-quality text-to-speech
- 🎙️ Voices: alloy, echo, fable, onyx, nova, shimmer
- 🎵 Models: tts-1 (fast) and tts-1-hd (high quality)
- 🎼 Formats: MP3, WAV, OGG, Opus
- 🔑 Auth: API Key (
OPENAI_API_KEY)
ElevenLabs
Best multilingual and voice-cloning TTS
- 🌍 Supports 30+ languages with natural prosody
- 🎭 Custom voice cloning from short audio samples
- 🎼 Formats: MP3, WAV (raw PCM, surfaced as
pcm16), Opus (Ogg container) - 🔑 Auth: API Key (
ELEVENLABS_API_KEY)
Google TTS
1M characters/month free tier
- 💰 Generous free tier for standard voices
- 🌍 380+ voices across 50+ languages
- 🎼 Formats: MP3, WAV, OGG
- 🔑 Auth: Service Account
Azure TTS
Enterprise TTS with full SSML support
- 🏢 Fine-grained prosody control via SSML
- 🌍 400+ neural voices, 140+ languages
- 🎼 Formats: MP3, WAV (PCM), Opus (Ogg container)
- 🔑 Auth: API Key + Region
Speech-to-Text (STT)
Whisper (OpenAI)
Highest transcription accuracy
- 🎯 Best-in-class accuracy on diverse audio
- 🌍 Multilingual with automatic language detection
- 🎼 Formats: WAV, MP3, M4A, FLAC, OGG, OPUS, WEBM, MP4, MPEG, MPGA
- 🔑 Auth: API Key (
OPENAI_API_KEY)
Deepgram
Real-time streaming transcription via WebSocket
- ⚡ Sub-300 ms word-level results over WebSocket
- 🌊 REST batch and WebSocket streaming modes
- 🎼 Formats: WAV, MP3, OGG, FLAC
- 🔑 Auth: API Key (
DEEPGRAM_API_KEY)
Google STT
125+ languages with speaker diarization
- 🌍 Best fit for existing Google Cloud users
- 👥 Speaker diarization and multi-channel audio
- 🎼 Formats: WAV, FLAC, MP3, OGG
- 🔑 Auth: API Key (
GOOGLE_AI_API_KEY/GEMINI_API_KEY) or Service Account (GOOGLE_APPLICATION_CREDENTIALS)
Azure STT
Enterprise STT with custom model training
- 🏢 Batch transcription and custom model support
- 🔒 Compliance controls for regulated industries
- 🎼 Formats: WAV (PCM), Ogg/Opus — convert MP3 to WAV first
- 🔑 Auth: API Key + Region
Realtime Voice
Realtime providers maintain a persistent bidirectional WebSocket connection, enabling low-latency spoken conversation with the AI model.
OpenAI Realtime
Low-latency bidirectional voice over WebSocket
- ⚡ Full-duplex audio stream with GPT-4o
- 🎵 Voice activity detection (VAD) built-in
- 🎼 Formats: WAV, Opus
- 🔑 Auth: API Key (
OPENAI_API_KEY)
Gemini Live
Google's native realtime voice API
- ⚡ Native multimodal realtime session with Gemini
- 🎵 Supports audio + video input simultaneously
- 🎼 Formats: WAV, Opus
- 🔑 Auth: API Key (
GOOGLE_AI_API_KEYorGEMINI_API_KEY)
Quick Comparison
| Provider | Free Tier | Enterprise | GDPR | Latency | Best For |
|---|---|---|---|---|---|
| Anthropic | Limited | ✅ | ✅ | Low | Reasoning, coding, Claude |
| Hugging Face | ✅ | ❌ | ✅ | Medium | Open source, experimentation |
| Google AI | ✅ | ✅ | ✅ | Low | Free tier, Gemini |
| Mistral AI | ❌ | ✅ | ✅ | Low | EU compliance, cost |
| OpenRouter | ✅ | ✅ | Varies | Low | Multi-model, automatic failover |
| OpenAI Compatible | Varies | ✅ | Varies | Varies | Flexibility, local deployment |
| LiteLLM | ❌ | ✅ | Varies | Low | Multi-provider, unified API |
| Azure OpenAI | ❌ | ✅ | ✅ | Low | Enterprise, Microsoft ecosystem |
| Vertex AI | ❌ | ✅ | ✅ | Low | Enterprise, GCP ecosystem |
| AWS Bedrock |