Skip to main content

Replicate Provider Guide

One auth token, five modalities — LLMs + image + video + avatar + music under a single REPLICATE_API_TOKEN


Overview

Replicate is a universal hosted-model gateway. NeuroLink wraps it as a multi-modal provider so a single token gets you:

ModalityHowDefault model
LLMprovider: "replicate" chat / streamingmeta/meta-llama-3.1-70b-instruct
Image genprovider: "replicate" with a model id matching IMAGE_GENERATION_MODELSblack-forest-labs/flux-1.1-pro
Videooutput: { mode: "video", video: { provider: "replicate" } }atonamy/wan-alpha
Avataroutput: { mode: "avatar", avatar: { provider: "replicate" } }lucataco/musetalk
Musicoutput: { mode: "music", music: { provider: "replicate" } }meta/musicgen

Architectural detail: see docs/provider-integration/22-adding-multimodal-provider.md — Replicate is the canonical worked example.

Key Facts

  • Protocol: Async prediction lifecycle — POST /v1/predictions → poll until succeeded → fetch output. NeuroLink uses Prefer: wait=60 so short jobs complete in the initial POST and skip polling entirely.
  • Default base URL: https://api.replicate.com
  • Auth: Authorization: Token $REPLICATE_API_TOKEN
  • Pricing: Per compute-second (not per-token) — NeuroLink reports a symbolic per-token rate so cost dashboards stay populated, but real billing is via Replicate's invoice
  • Streaming: Synthetic single-chunk stream from the predict result (true SSE streaming planned for a follow-up)
  • Tool calling: Not supported — Replicate predictions are stateless
  • Reasoning trace: Model-dependent (e.g., DeepSeek R1 on Replicate exposes its reasoning trace in the output array)

Quick Start

1. Get an API Token

Sign up at https://replicate.com/ and create an API token at https://replicate.com/account/api-tokens.

2. Configure Environment

# Required
REPLICATE_API_TOKEN=r8_...

# Optional: override the default LLM model
REPLICATE_MODEL=meta/meta-llama-3.1-70b-instruct

# Optional: override the base URL
# REPLICATE_BASE_URL=https://api.replicate.com

3. Generate Your First Response

import { NeuroLink } from "@juspay/neurolink";

const ai = new NeuroLink();

const result = await ai.generate({
provider: "replicate",
input: { text: "Explain how a transformer's attention mechanism works." },
});

console.log(result.content);

SDK Usage by Modality

LLM (chat / streaming)

const result = await ai.generate({
provider: "replicate",
model: "meta/meta-llama-3.1-405b-instruct",
input: { text: "Write Python that calculates compound interest." },
});

Streaming:

const stream = await ai.stream({
provider: "replicate",
model: "meta/meta-llama-3.1-70b-instruct",
input: { text: "Tell me a story" },
});
for await (const chunk of stream.stream) {
if ("content" in chunk) process.stdout.write(chunk.content);
}

Image Generation

const result = await ai.generate({
provider: "replicate",
model: "black-forest-labs/flux-1.1-pro",
input: { text: "A serene mountain lake at sunrise, photorealistic" },
});
const buffer = Buffer.from(result.imageOutput.base64, "base64");
require("fs").writeFileSync("./output.png", buffer);

Other supported image models on Replicate (pass via model:):

  • black-forest-labs/flux-1.1-pro (default)
  • black-forest-labs/flux-schnell
  • stability-ai/stable-diffusion-3.5-large
  • stability-ai/stable-diffusion-3.5-large-turbo
  • playgroundai/playground-v2.5-1024px-aesthetic
  • ideogram-ai/ideogram-v3

Video Generation

import { readFileSync } from "node:fs";

const sourceImage = readFileSync("./input.jpg");

const result = await ai.generate({
input: { text: "smooth zoom out", images: [sourceImage] },
output: {
mode: "video",
video: {
provider: "replicate",
model: "atonamy/wan-alpha",
length: 4,
aspectRatio: "16:9",
},
},
});

require("fs").writeFileSync("./output.mp4", result.video.data);

Avatar (MuseTalk)

const portrait = readFileSync("./portrait.jpg");
const audio = readFileSync("./narration.mp3");

const result = await ai.generate({
output: {
mode: "avatar",
avatar: {
provider: "replicate", // or "musetalk" alias
image: portrait,
audio,
},
},
});

require("fs").writeFileSync("./avatar.mp4", result.avatar.buffer);

Music Generation (MusicGen)

const result = await ai.generate({
output: {
mode: "music",
music: {
provider: "replicate", // or "musicgen" alias
prompt: "Lo-fi hip-hop beat with vinyl crackle",
duration: 8,
tempo: 80,
},
},
});

require("fs").writeFileSync("./track.mp3", result.music.buffer);

CLI Usage

# LLM
pnpm run cli generate "Hello" --provider replicate

# Image gen
pnpm run cli generate "A red panda" --provider replicate \
--model black-forest-labs/flux-1.1-pro --imageOutput ./panda.png

# Video gen
pnpm run cli generate "smooth pan" --image ./input.jpg \
--outputMode video --videoProvider replicate \
--videoOutput ./out.mp4

# Avatar
pnpm run cli generate --outputMode avatar \
--avatarProvider replicate \
--avatarImage ./portrait.jpg \
--avatarAudio ./narration.mp3 \
--avatarOutput ./avatar.mp4

# Music
pnpm run cli generate "Lo-fi beat" \
--outputMode music --musicProvider replicate \
--musicTempo 80 --musicDuration 8 --musicOutput ./track.mp3

Configuration Reference

Environment VariableRequiredDefaultDescription
REPLICATE_API_TOKENYesReplicate API token (r8_...)
REPLICATE_MODELNometa/meta-llama-3.1-70b-instructDefault LLM model
REPLICATE_BASE_URLNohttps://api.replicate.comBase URL

Feature Support Matrix

FeatureLLMImageVideoAvatarMusic
StreamingSynthetic (single chunk)N/AN/AN/AN/A
Tool callingNoN/AN/AN/AN/A
Structured outputLimitedN/AN/AN/AN/A
Vision inputModel-dependentYes (img2img)Yes (start frame)YesNo

Cost Notes

Replicate bills by compute seconds, not by tokens. NeuroLink reports a symbolic per-token rate so cost-attribution dashboards have non-zero values, but the authoritative billing is from Replicate's own pricing dashboard.


Troubleshooting

"Invalid Replicate API token"

echo $REPLICATE_API_TOKEN
export REPLICATE_API_TOKEN=r8_...

Get / rotate at https://replicate.com/account/api-tokens.

"Replicate model 'X' not found"

Use the owner/name or owner/name:version format. Browse the catalog at https://replicate.com/explore.

Cold-start delays

First-call latency on rare models can spike (the inference container needs to warm). Subsequent calls reuse the warm container. NeuroLink caps polling at 5 minutes by default — bump REPLICATE_BASE_URL and Prefer: wait=60 configuration in the lifecycle helper if you regularly hit this.

Streaming feels chunky

The current implementation runs the prediction synchronously and emits a single chunk. True SSE streaming is planned — for now use OpenAI / xAI / Groq for low-latency token streaming.

Output is a URL, not base64

NeuroLink downloads the URL and converts to base64 to keep the imageOutput contract uniform. If you see a raw URL in the result, the download failed — check network access and Replicate's CDN status.


See Also


Need Help? Open a GitHub Discussion or issue.