Replicate Provider Guide
One auth token, five modalities — LLMs + image + video + avatar + music
under a single REPLICATE_API_TOKEN
Overview
Replicate is a universal hosted-model gateway. NeuroLink wraps it as a multi-modal provider so a single token gets you:
| Modality | How | Default model |
|---|---|---|
| LLM | provider: "replicate" chat / streaming | meta/meta-llama-3.1-70b-instruct |
| Image gen | provider: "replicate" with a model id matching IMAGE_GENERATION_MODELS | black-forest-labs/flux-1.1-pro |
| Video | output: { mode: "video", video: { provider: "replicate" } } | atonamy/wan-alpha |
| Avatar | output: { mode: "avatar", avatar: { provider: "replicate" } } | lucataco/musetalk |
| Music | output: { mode: "music", music: { provider: "replicate" } } | meta/musicgen |
Architectural detail: see docs/provider-integration/22-adding-multimodal-provider.md — Replicate is the canonical worked example.
Key Facts
- Protocol: Async prediction lifecycle — POST
/v1/predictions→ poll untilsucceeded→ fetch output. NeuroLink usesPrefer: wait=60so short jobs complete in the initial POST and skip polling entirely. - Default base URL:
https://api.replicate.com - Auth:
Authorization: Token $REPLICATE_API_TOKEN - Pricing: Per compute-second (not per-token) — NeuroLink reports a symbolic per-token rate so cost dashboards stay populated, but real billing is via Replicate's invoice
- Streaming: Synthetic single-chunk stream from the predict result (true SSE streaming planned for a follow-up)
- Tool calling: Not supported — Replicate predictions are stateless
- Reasoning trace: Model-dependent (e.g., DeepSeek R1 on Replicate exposes its reasoning trace in the output array)
Quick Start
1. Get an API Token
Sign up at https://replicate.com/ and create an API token at https://replicate.com/account/api-tokens.
2. Configure Environment
# Required
REPLICATE_API_TOKEN=r8_...
# Optional: override the default LLM model
REPLICATE_MODEL=meta/meta-llama-3.1-70b-instruct
# Optional: override the base URL
# REPLICATE_BASE_URL=https://api.replicate.com
3. Generate Your First Response
import { NeuroLink } from "@juspay/neurolink";
const ai = new NeuroLink();
const result = await ai.generate({
provider: "replicate",
input: { text: "Explain how a transformer's attention mechanism works." },
});
console.log(result.content);
SDK Usage by Modality
LLM (chat / streaming)
const result = await ai.generate({
provider: "replicate",
model: "meta/meta-llama-3.1-405b-instruct",
input: { text: "Write Python that calculates compound interest." },
});
Streaming:
const stream = await ai.stream({
provider: "replicate",
model: "meta/meta-llama-3.1-70b-instruct",
input: { text: "Tell me a story" },
});
for await (const chunk of stream.stream) {
if ("content" in chunk) process.stdout.write(chunk.content);
}
Image Generation
const result = await ai.generate({
provider: "replicate",
model: "black-forest-labs/flux-1.1-pro",
input: { text: "A serene mountain lake at sunrise, photorealistic" },
});
const buffer = Buffer.from(result.imageOutput.base64, "base64");
require("fs").writeFileSync("./output.png", buffer);
Other supported image models on Replicate (pass via model:):
black-forest-labs/flux-1.1-pro(default)black-forest-labs/flux-schnellstability-ai/stable-diffusion-3.5-largestability-ai/stable-diffusion-3.5-large-turboplaygroundai/playground-v2.5-1024px-aestheticideogram-ai/ideogram-v3
Video Generation
import { readFileSync } from "node:fs";
const sourceImage = readFileSync("./input.jpg");
const result = await ai.generate({
input: { text: "smooth zoom out", images: [sourceImage] },
output: {
mode: "video",
video: {
provider: "replicate",
model: "atonamy/wan-alpha",
length: 4,
aspectRatio: "16:9",
},
},
});
require("fs").writeFileSync("./output.mp4", result.video.data);
Avatar (MuseTalk)
const portrait = readFileSync("./portrait.jpg");
const audio = readFileSync("./narration.mp3");
const result = await ai.generate({
output: {
mode: "avatar",
avatar: {
provider: "replicate", // or "musetalk" alias
image: portrait,
audio,
},
},
});
require("fs").writeFileSync("./avatar.mp4", result.avatar.buffer);
Music Generation (MusicGen)
const result = await ai.generate({
output: {
mode: "music",
music: {
provider: "replicate", // or "musicgen" alias
prompt: "Lo-fi hip-hop beat with vinyl crackle",
duration: 8,
tempo: 80,
},
},
});
require("fs").writeFileSync("./track.mp3", result.music.buffer);
CLI Usage
# LLM
pnpm run cli generate "Hello" --provider replicate
# Image gen
pnpm run cli generate "A red panda" --provider replicate \
--model black-forest-labs/flux-1.1-pro --imageOutput ./panda.png
# Video gen
pnpm run cli generate "smooth pan" --image ./input.jpg \
--outputMode video --videoProvider replicate \
--videoOutput ./out.mp4
# Avatar
pnpm run cli generate --outputMode avatar \
--avatarProvider replicate \
--avatarImage ./portrait.jpg \
--avatarAudio ./narration.mp3 \
--avatarOutput ./avatar.mp4
# Music
pnpm run cli generate "Lo-fi beat" \
--outputMode music --musicProvider replicate \
--musicTempo 80 --musicDuration 8 --musicOutput ./track.mp3
Configuration Reference
| Environment Variable | Required | Default | Description |
|---|---|---|---|
REPLICATE_API_TOKEN | Yes | — | Replicate API token (r8_...) |
REPLICATE_MODEL | No | meta/meta-llama-3.1-70b-instruct | Default LLM model |
REPLICATE_BASE_URL | No | https://api.replicate.com | Base URL |
Feature Support Matrix
| Feature | LLM | Image | Video | Avatar | Music |
|---|---|---|---|---|---|
| Streaming | Synthetic (single chunk) | N/A | N/A | N/A | N/A |
| Tool calling | No | N/A | N/A | N/A | N/A |
| Structured output | Limited | N/A | N/A | N/A | N/A |
| Vision input | Model-dependent | Yes (img2img) | Yes (start frame) | Yes | No |
Cost Notes
Replicate bills by compute seconds, not by tokens. NeuroLink reports a symbolic per-token rate so cost-attribution dashboards have non-zero values, but the authoritative billing is from Replicate's own pricing dashboard.
Troubleshooting
"Invalid Replicate API token"
echo $REPLICATE_API_TOKEN
export REPLICATE_API_TOKEN=r8_...
Get / rotate at https://replicate.com/account/api-tokens.
"Replicate model 'X' not found"
Use the owner/name or owner/name:version format. Browse the catalog
at https://replicate.com/explore.
Cold-start delays
First-call latency on rare models can spike (the inference container
needs to warm). Subsequent calls reuse the warm container. NeuroLink
caps polling at 5 minutes by default — bump
REPLICATE_BASE_URL and Prefer: wait=60 configuration in the lifecycle
helper if you regularly hit this.
Streaming feels chunky
The current implementation runs the prediction synchronously and emits a single chunk. True SSE streaming is planned — for now use OpenAI / xAI / Groq for low-latency token streaming.
Output is a URL, not base64
NeuroLink downloads the URL and converts to base64 to keep the
imageOutput contract uniform. If you see a raw URL in the result, the
download failed — check network access and Replicate's CDN status.
See Also
- Adding a multi-modal provider — Replicate as the canonical example
- Adding a new modality — how Avatar / Music categories were built
- Video Generation — feature page covering Vertex / Kling / Runway / Replicate
docs/provider-integration/22-adding-multimodal-provider.md— implementation notes
Need Help? Open a GitHub Discussion or issue.