22 · Adding a Multi-Modal Provider (Replicate-style) — Exhaustive Guide
This guide covers a special case: a single upstream that spans multiple modalities (LLM + image + video + avatar + music + …) under one auth token and one prediction lifecycle.
The canonical example is Replicate, which hosts thousands of community models across categories. Adding Replicate as 5 separate providers is duplicative; adding it once as a multi-modal provider lets a single auth path serve every modality.
This guide also applies to similar gateways:
- Replicate — universal hosted-model gateway (FLUX, Wan-Alpha, MuseTalk, …)
- Together AI — open-model hosting (Llama variants, Mistral, …)
- Fireworks AI — open-model hosting
- Hugging Face Inference Endpoints — already partially modeled but cross-modality story is incomplete
Architectural insight
A multi-modal provider has:
- One auth identity (
REPLICATE_API_TOKEN) - One prediction lifecycle (POST
/v1/predictions→ poll/v1/predictions/{id}) - N modality outputs (text completion, image binary, video binary, audio binary, …)
- Model-driven dispatch (the
model: "owner/name:version"string determines what kind of output you get)
The right shape is:
src/lib/adapters/replicate/
├── predictionLifecycle.ts # Shared async-job helper
├── auth.ts # Shared auth + base URL
└── replicateClient.ts # Optional: shared low-level client
src/lib/providers/
├── replicate.ts # LLM (BaseProvider subclass)
src/lib/adapters/video/
├── replicateVideoHandler.ts # VideoHandler implementation
src/lib/avatar/providers/
├── ReplicateAvatar.ts # AvatarHandler implementation
src/lib/music/providers/
├── ReplicateMusic.ts # MusicHandler implementation (when music modality exists)
# image-gen has no separate handler — Replicate LLM provider's
# executeImageGeneration override handles model: "<owner>/flux-1.1-pro:..."
Each handler is a thin adapter calling the same predictionLifecycle.create(model, input) helper with different model slugs. The auth and polling logic lives once.
Prerequisites
Before adding the multi-modal provider, the target modalities must already exist as registries:
- LLM — exists via
ProviderFactory/ProviderRegistry(always available) - TTS / STT / Realtime — exist via
TTSProcessor/STTProcessor/RealtimeProcessor(post27a31c32) - Video — requires §A of
19-adding-video-provider.mdto introduceVideoHandler/VideoProcessor - Avatar / Music — require
21-adding-new-modality.mdfor each new category
Land the modality infrastructure first; multi-modal providers consume those registries.
Step-by-step (using Replicate as the worked example)
Step 1 — Shared prediction lifecycle helper
File: src/lib/adapters/replicate/predictionLifecycle.ts — NEW.
import { logger } from "../../utils/logger.js";
export type ReplicateAuth = {
apiToken: string;
baseUrl?: string;
};
export type Prediction = {
id: string;
status: "starting" | "processing" | "succeeded" | "failed" | "canceled";
output?: unknown; // model-specific shape (string URL, array, base64, etc.)
error?: string;
metrics?: { predict_time?: number };
urls?: { get: string; cancel: string };
};
export type CreatePredictionInput = {
model: string; // "owner/name" or "owner/name:version"
input: Record<string, unknown>;
webhook?: string;
};
const DEFAULT_BASE_URL = "https://api.replicate.com";
const POLL_INTERVAL_MS = 2_000;
const DEFAULT_TIMEOUT_MS = 5 * 60_000;
export async function createPrediction(
auth: ReplicateAuth,
input: CreatePredictionInput,
): Promise<Prediction> {
const baseUrl = auth.baseUrl ?? DEFAULT_BASE_URL;
const [modelPath, version] = input.model.split(":");
const endpoint = version
? `${baseUrl}/v1/predictions`
: `${baseUrl}/v1/models/${modelPath}/predictions`; // latest version
const body = version
? { version, input: input.input, webhook: input.webhook }
: { input: input.input, webhook: input.webhook };
const resp = await fetch(endpoint, {
method: "POST",
headers: {
Authorization: `Token ${auth.apiToken}`,
"Content-Type": "application/json",
Prefer: "wait=60", // Lets Replicate hold the request up to 60s waiting
// for completion before falling back to async polling
},
body: JSON.stringify(body),
});
if (!resp.ok) {
const errorText = await resp.text();
throw new Error(
`Replicate predictions submit failed: ${resp.status} — ${errorText}`,
);
}
return resp.json();
}
export async function pollPrediction(
auth: ReplicateAuth,
predictionId: string,
options: {
timeoutMs?: number;
pollIntervalMs?: number;
abortSignal?: AbortSignal;
} = {},
): Promise<Prediction> {
const baseUrl = auth.baseUrl ?? DEFAULT_BASE_URL;
const startTime = Date.now();
const totalTimeout = options.timeoutMs ?? DEFAULT_TIMEOUT_MS;
const pollInterval = options.pollIntervalMs ?? POLL_INTERVAL_MS;
while (Date.now() - startTime < totalTimeout) {
if (options.abortSignal?.aborted) {
throw new Error("Replicate poll aborted");
}
const resp = await fetch(`${baseUrl}/v1/predictions/${predictionId}`, {
headers: { Authorization: `Token ${auth.apiToken}` },
});
if (!resp.ok) {
throw new Error(
`Replicate poll failed: ${resp.status} — ${await resp.text()}`,
);
}
const pred = (await resp.json()) as Prediction;
if (pred.status === "succeeded") return pred;
if (pred.status === "failed" || pred.status === "canceled") {
throw new Error(
`Replicate prediction ${pred.status}: ${pred.error ?? "unknown"}`,
);
}
await new Promise((r) => setTimeout(r, pollInterval));
}
throw new Error(
`Replicate prediction ${predictionId} timed out after ${totalTimeout}ms`,
);
}
/**
* Submit a prediction and wait for completion. Combines createPrediction + pollPrediction.
* Uses the `Prefer: wait=60` hint so short jobs complete in the initial POST and
* skip polling entirely.
*/
export async function predict(
auth: ReplicateAuth,
input: CreatePredictionInput,
options: {
timeoutMs?: number;
pollIntervalMs?: number;
abortSignal?: AbortSignal;
} = {},
): Promise<Prediction> {
const submitted = await createPrediction(auth, input);
if (submitted.status === "succeeded") return submitted; // wait=60 hit
if (submitted.status === "failed" || submitted.status === "canceled") {
throw new Error(
`Replicate immediate failure: ${submitted.error ?? "unknown"}`,
);
}
return pollPrediction(auth, submitted.id, options);
}
/**
* Download the binary output from a Replicate prediction.
* Replicate returns either a single URL string or an array of URLs.
*/
export async function downloadPredictionOutput(
prediction: Prediction,
): Promise<Buffer> {
const output = prediction.output;
const url = Array.isArray(output) ? output[0] : output;
if (typeof url !== "string") {
throw new Error(
`Replicate prediction output is not a URL: ${typeof output}`,
);
}
const resp = await fetch(url);
if (!resp.ok) {
throw new Error(`Failed to download Replicate output: ${resp.status}`);
}
return Buffer.from(await resp.arrayBuffer());
}
This is the common bottom-half. Every Replicate-backed handler calls predict() + downloadPredictionOutput().
Step 2 — Shared auth helper
File: src/lib/adapters/replicate/auth.ts — NEW.
import type { ReplicateAuth } from "./predictionLifecycle.js";
export function getReplicateAuth(
override?: Partial<ReplicateAuth>,
): ReplicateAuth | null {
const apiToken = (
override?.apiToken ??
process.env.REPLICATE_API_TOKEN ??
""
).trim();
if (!apiToken) return null;
return {
apiToken,
baseUrl: override?.baseUrl ?? process.env.REPLICATE_BASE_URL,
};
}
Used by every Replicate handler. Returns null when REPLICATE_API_TOKEN is missing — handlers' isConfigured() calls this and returns auth !== null.
Step 3 — LLM provider
File: src/lib/providers/replicate.ts — NEW.
Standard BaseProvider subclass per 15-adding-llm-provider.md. Replicate's LLM models (Llama, Qwen, Mistral, etc.) are accessible via the prediction API:
import { BaseProvider } from "../core/baseProvider.js";
import { AIProviderName } from "../constants/enums.js";
import { getReplicateAuth } from "../adapters/replicate/auth.js";
import {
predict,
downloadPredictionOutput,
} from "../adapters/replicate/predictionLifecycle.js";
export class ReplicateProvider extends BaseProvider {
// ... constructor / abstract overrides per 15-adding-llm-provider.md ...
protected async executeStream(options: StreamOptions): Promise<StreamResult> {
const auth = getReplicateAuth(this.credentials);
if (!auth) {
throw new NeuroLinkError({
code: "REPLICATE_NOT_CONFIGURED",
message: "REPLICATE_API_TOKEN not set",
// ...
});
}
// Replicate streaming uses Server-Sent Events on a separate endpoint
// when `stream: true` is passed in input. For simpler handlers, fall
// back to predict-then-stream the result.
const prediction = await predict(auth, {
model: this.modelName,
input: {
prompt: this.buildSinglePrompt(options),
max_tokens: options.maxTokens,
temperature: options.temperature,
// Other model-specific params
},
});
// Stream the buffered output as if it were chunks
return this.synthesizeStreamFromText(
typeof prediction.output === "string"
? prediction.output
: (prediction.output as string[]).join(""),
);
}
// For image-gen models, route through executeImageGeneration:
protected async executeImageGeneration(options: TextGenerationOptions) {
const auth = getReplicateAuth(this.credentials);
if (!auth) throw new NeuroLinkError(/* ... */);
const prediction = await predict(auth, {
model: this.modelName, // e.g., "black-forest-labs/flux-1.1-pro"
input: {
prompt: options.prompt ?? options.input?.text,
aspect_ratio: options.aspectRatio,
num_outputs: options.numberOfImages ?? 1,
},
});
const imageBuffer = await downloadPredictionOutput(prediction);
return {
content: options.prompt ?? "",
provider: this.providerName,
model: this.modelName,
usage: { input: 0, output: 0, total: 0 },
imageOutput: {
imageBuffer,
base64: imageBuffer.toString("base64"),
mimeType: "image/png", // FLUX returns PNG
},
} as EnhancedGenerateResult;
}
}
Add to IMAGE_GENERATION_MODELS constant: a prefix that matches Replicate image models, e.g., "flux", "stability", "sdxl".
Per 15-adding-llm-provider.md, also touch:
AIProviderName.REPLICATE = "replicate"enum entryNeurolinkCredentials.replicate?: { apiToken?: string; baseUrl?: string }providerRegistry.tsregistration blockproviderConfig.tshelper (createReplicateConfig())commandFactory.tsprovider choices.env.example(REPLICATE_API_TOKEN)pricing.ts(Replicate has per-model pricing — most models charge per-second of compute; default to a generic rate)- Tests in
test/continuous-test-suite-providers.tsandtest/continuous-test-suite-new-providers.ts
Step 4 — Video handler
File: src/lib/adapters/video/replicateVideoHandler.ts — NEW.
Implements VideoHandler (defined in §A of 19-adding-video-provider.md):
import type {
VideoHandler,
VideoOutputOptions,
VideoGenerationResult,
} from "../../types/index.js";
import {
predict,
downloadPredictionOutput,
} from "../replicate/predictionLifecycle.js";
import { getReplicateAuth } from "../replicate/auth.js";
import { VideoError, VIDEO_ERROR_CODES } from "../../utils/videoProcessor.js";
import { ErrorCategory, ErrorSeverity } from "../../constants/enums.js";
export class ReplicateVideoHandler implements VideoHandler {
// Provider-agnostic — capabilities depend on the specific model id used.
// Don't claim specific durations or resolutions.
private readonly defaultModel: string;
constructor(opts: { defaultModel?: string } = {}) {
this.defaultModel = opts.defaultModel ?? "atonamy/wan-alpha";
}
isConfigured(): boolean {
return getReplicateAuth() !== null;
}
async generate(
image: Buffer,
prompt: string,
options: VideoOutputOptions,
): Promise<VideoGenerationResult> {
const auth = getReplicateAuth();
if (!auth) {
throw new VideoError({
code: VIDEO_ERROR_CODES.PROVIDER_NOT_CONFIGURED,
message: "REPLICATE_API_TOKEN not set",
category: ErrorCategory.CONFIGURATION,
severity: ErrorSeverity.HIGH,
retriable: false,
});
}
// Replicate video models vary in input shape. Common shape for image-to-video:
// input: { image: <data-uri-or-url>, prompt, num_frames, fps, ... }
const startTime = Date.now();
const dataUri = `data:image/png;base64,${image.toString("base64")}`;
const prediction = await predict(auth, {
model: (options as { model?: string }).model ?? this.defaultModel,
input: {
image: dataUri,
prompt,
num_frames: (options.length ?? 4) * 24, // 24 FPS assumption
fps: 24,
aspect_ratio: options.aspectRatio,
},
});
const videoBuffer = await downloadPredictionOutput(prediction);
return {
data: videoBuffer,
mediaType: "video/mp4",
metadata: {
duration: options.length ?? 4,
dimensions: this.calculateDimensions(options),
model: this.defaultModel,
provider: "replicate",
aspectRatio: options.aspectRatio ?? "16:9",
audioEnabled: false, // Most Replicate video models are silent
processingTime: Date.now() - startTime,
},
};
}
private calculateDimensions(opts: VideoOutputOptions): {
width: number;
height: number;
} {
// Same shape as vertexVideoHandler's calculateDimensions
return { width: 1280, height: 720 };
}
}
Register in providerRegistry.ts:
try {
const { VideoProcessor } = await import("../utils/videoProcessor.js");
const { ReplicateVideoHandler } =
await import("../adapters/video/replicateVideoHandler.js");
VideoProcessor.registerHandler("replicate", new ReplicateVideoHandler());
} catch (err) {
logger.debug(
`[ProviderRegistry] replicate video registration skipped: ${err instanceof Error ? err.message : String(err)}`,
);
}
Now nl.generate({ output: { mode: "video", video: { provider: "replicate", model: "atonamy/wan-alpha:..." } } }) works.
Step 5 — Avatar handler
File: src/lib/avatar/providers/ReplicateAvatar.ts — NEW.
Implements AvatarHandler (defined in 21-adding-new-modality.md). MuseTalk model id: <owner>/musetalk:<version> — submit image + audio, poll, download.
import type {
AvatarHandler,
AvatarOptions,
AvatarResult,
} from "../../types/index.js";
import {
predict,
downloadPredictionOutput,
} from "../../adapters/replicate/predictionLifecycle.js";
import { getReplicateAuth } from "../../adapters/replicate/auth.js";
import {
AvatarError,
AVATAR_ERROR_CODES,
} from "../../utils/avatarProcessor.js";
export class ReplicateAvatar implements AvatarHandler {
// Default to MuseTalk; callers can pass `model:` for other lip-sync models.
private readonly defaultModel =
"lucataco/musetalk:c3a2f4d7e1b5d9a8f6e2b7c3a8d4e9f1b6c7a2e8d3b9f5c1a4e7b8d2f9c6a3e5";
isConfigured(): boolean {
return getReplicateAuth() !== null;
}
async generate(options: AvatarOptions): Promise<AvatarResult> {
const auth = getReplicateAuth();
if (!auth) {
throw new AvatarError({
code: AVATAR_ERROR_CODES.PROVIDER_NOT_CONFIGURED,
message: "REPLICATE_API_TOKEN not set",
});
}
const startTime = Date.now();
const imageBuffer = await this.resolveBuffer(options.image);
const audioBuffer = options.audio
? await this.resolveBuffer(options.audio)
: undefined;
if (!audioBuffer) {
throw new AvatarError({
code: AVATAR_ERROR_CODES.INVALID_INPUT,
message:
"Replicate avatar handler requires audio. Pass options.audio (Buffer or path).",
retriable: false,
});
}
const prediction = await predict(auth, {
model: (options as { model?: string }).model ?? this.defaultModel,
input: {
image: `data:image/png;base64,${imageBuffer.toString("base64")}`,
audio: `data:audio/mp3;base64,${audioBuffer.toString("base64")}`,
// MuseTalk-specific extras
bbox_shift: 0,
fps: 25,
},
});
const videoBuffer = await downloadPredictionOutput(prediction);
return {
buffer: videoBuffer,
format: "mp4",
size: videoBuffer.length,
provider: "replicate",
metadata: {
latency: Date.now() - startTime,
provider: "replicate",
model: this.defaultModel,
},
};
}
private async resolveBuffer(
input: Buffer | string,
maxBytes: number,
label: string,
): Promise<Buffer> {
if (Buffer.isBuffer(input)) {
if (input.length > maxBytes) {
throw new Error(
`${label} buffer too large: ${input.length} bytes (max ${maxBytes})`,
);
}
return input;
}
// Reject local file paths — only Buffer or HTTPS URLs are accepted.
if (!/^https:\/\//.test(input)) {
throw new Error(
`Invalid input: expected Buffer or HTTPS URL, got "${input}". Local file reads are not supported.`,
);
}
// SSRF guard: validate the URL resolves to a public IP before fetching.
const { assertSafeUrl } = await import("../../utils/ssrfGuard.js");
await assertSafeUrl(input);
const FETCH_TIMEOUT_MS = 60_000;
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), FETCH_TIMEOUT_MS);
let r: Response;
try {
r = await fetch(input, { signal: controller.signal });
} catch (err) {
if (err instanceof Error && err.name === "AbortError") {
throw new Error(
`Fetch of ${label} timed out after ${FETCH_TIMEOUT_MS / 1000}s`,
);
}
throw err;
} finally {
clearTimeout(timeoutId);
}
if (!r.ok) {
throw new Error(`Failed to fetch ${label}: HTTP ${r.status}`);
}
// Bounded read: rejects responses that exceed the size cap.
const { readBoundedBuffer } = await import("../../utils/sizeGuard.js");
return readBoundedBuffer(r, maxBytes, label);
}
}
Register in providerRegistry.ts:
try {
const { AvatarProcessor } = await import("../utils/avatarProcessor.js");
const { ReplicateAvatar } =
await import("../avatar/providers/ReplicateAvatar.js");
AvatarProcessor.registerHandler("replicate", new ReplicateAvatar());
// Aliases for explicit model targeting:
AvatarProcessor.registerHandler("musetalk", new ReplicateAvatar());
} catch (err) {
logger.debug(
`[ProviderRegistry] replicate avatar registration skipped: ${err instanceof Error ? err.message : String(err)}`,
);
}
Step 6 — Music handler (when Music modality exists)
File: src/lib/music/providers/ReplicateMusic.ts — NEW.
Implements MusicHandler. Same shape as ReplicateAvatar with audio-only output.
Replicate music models include:
meta/musicgen— Meta's MusicGenriffusion/riffusion— Riffusion (image-to-music)mtg/audiogen— Sound effects + ambient
Step 7 — Image-gen via the LLM provider's executeImageGeneration
The LLM provider (Step 3) already handles this case. Add prefixes to IMAGE_GENERATION_MODELS:
export const IMAGE_GENERATION_MODELS = [
"dall-e",
"imagen",
"stable-diffusion",
+ "flux",
+ "sdxl",
+ "playground-v2",
+ "ideogram",
];
Now nl.generate({ provider: "replicate", model: "black-forest-labs/flux-1.1-pro:..." }) routes through executeImageGeneration automatically.
Calling pattern from the consumer's perspective
After all four flavors are wired:
const nl = new NeuroLink();
// LLM via Replicate
await nl.generate({
provider: "replicate",
model: "meta/llama-3.1-70b-instruct",
input: { text: "Explain quantum entanglement" },
});
// Image gen via Replicate
await nl.generate({
provider: "replicate",
model: "black-forest-labs/flux-1.1-pro",
input: { text: "A serene mountain landscape" },
});
// Video gen via Replicate
await nl.generate({
provider: "replicate", // The LLM provider, used as routing default
input: { text: "smooth zoom-out", images: [imageBuffer] },
output: {
mode: "video",
video: {
provider: "replicate",
model: "atonamy/wan-alpha:...",
length: 4,
},
},
});
// Avatar via Replicate
await nl.generate({
provider: "replicate",
output: {
mode: "avatar",
avatar: {
provider: "replicate",
image: portraitBuffer,
audio: narrationBuffer,
},
},
});
One auth token (REPLICATE_API_TOKEN), four modalities, four registered handlers.
Pricing nuance
Replicate charges per second of compute, not per token. The pricing table (src/lib/utils/pricing.ts) is keyed on tokens. For multi-modal providers, you have two options:
Option A — symbolic per-token rate
"replicate": {
_default: { input: 0.0001, output: 0.0002 },
},
Cost attribution shows non-zero values but doesn't reflect actual Replicate billing. Acceptable for ops-dashboard purposes.
Option B — separate compute-time pricing
Extend the pricing module to support compute-second billing for providers that use it. This is a wider change (touches pricing.ts, telemetry, dashboards). Discuss with maintainers.
The voice / video / avatar / music handlers already record processingTime in metadata; future cost-attribution for compute-time providers can derive billing from that field.
Testing
Cross-modality test suite
File: test/continuous-test-suite-replicate.ts — NEW.
const tests = [
// LLM
{
name: "Replicate LLM — Llama 3.1 70B generates response",
fn: async () => {
if (!process.env.REPLICATE_API_TOKEN) return true;
const nl = new NeuroLink();
const result = await nl.generate({
provider: "replicate",
model: "meta/llama-3.1-70b-instruct",
input: { text: "Hello" },
});
return result.content.length > 0;
},
},
// Image
{
name: "Replicate image — FLUX generates PNG",
fn: async () => {
if (!process.env.REPLICATE_API_TOKEN) return true;
const result = await new NeuroLink().generate({
provider: "replicate",
model: "black-forest-labs/flux-1.1-pro",
input: { text: "A red circle" },
});
return result.imageOutput?.imageBuffer.length > 1000;
},
},
// Video
{
name: "Replicate video — Wan-Alpha generates MP4",
fn: async () => {
if (!process.env.REPLICATE_API_TOKEN) return true;
const image = await fs.readFile("test/fixtures/test-image.jpg");
const result = await new NeuroLink().generate({
input: { text: "slow zoom", images: [image] },
output: {
mode: "video",
video: { provider: "replicate", model: "atonamy/wan-alpha:..." },
},
});
return result.video?.data.length > 100_000;
},
},
// Avatar
{
name: "Replicate avatar — MuseTalk produces talking-head",
fn: async () => {
if (!process.env.REPLICATE_API_TOKEN) return true;
const image = await fs.readFile("test/fixtures/portrait.jpg");
const audio = await fs.readFile("test/fixtures/narration.mp3");
const result = await new NeuroLink().generate({
output: {
mode: "avatar",
avatar: { provider: "replicate", image, audio },
},
});
return result.avatar?.buffer.length > 100_000;
},
},
// Shared lifecycle test
{
name: "Replicate prediction lifecycle — handles wait=60",
fn: async () => {
if (!process.env.REPLICATE_API_TOKEN) return true;
const { predict } =
await import("../src/lib/adapters/replicate/predictionLifecycle.js");
const auth = { apiToken: process.env.REPLICATE_API_TOKEN! };
const result = await predict(auth, {
model: "stability-ai/sdxl",
input: { prompt: "test", num_inference_steps: 1 },
});
return result.status === "succeeded";
},
},
];
Add test:replicate script to package.json.
Documentation
docs/getting-started/providers/replicate.md — NEW
Cover all four flavors:
- Overview — what Replicate is, the universal-gateway pattern
- Quick start — get token, run any of the 4 modalities
- Supported modalities — table mapping each modality to the example model
- Model selection — how to find / pin model versions on Replicate's catalog
- Pricing — link to Replicate's per-model pricing
- Auth scoping — production vs sandbox tokens
- Troubleshooting —
version not found,rate limit,cold start delays
docs/provider-integration/<NN>-replicate-integration.md — NEW
Implementation journal documenting:
- Why one provider, four registrations (the multi-modal architecture)
- The shared prediction lifecycle (
Prefer: wait=60optimisation, polling cadence, abort handling) - Per-modality input shapes (image-to-video, audio-to-avatar, etc.)
- Trade-offs (pinning model versions vs accepting "latest")
Cross-references
| File | Update |
|---|---|
docs/features/index.md | Add Replicate to "Supported Providers" |
docs/reference/provider-comparison.md | Add a Replicate row in each modality section |
docs/getting-started/providers/index.md | Card for Replicate |
docs/features/video-generation.md | Mention Replicate as a route to Wan-Alpha + others |
docs/features/image-generation-streaming.md | Mention Replicate as a route to FLUX + others |
Validation gates
pnpm run check
pnpm run lint
pnpm run build
pnpm run test:replicate # cross-modality suite
pnpm run test:providers # LLM-only sanity
pnpm run test:media # video / image / avatar
# Real API smoke (each modality):
export REPLICATE_API_TOKEN=...
pnpm run cli generate "Hello" --provider replicate --model meta/llama-3.1-70b-instruct
pnpm run cli generate "A cat" --provider replicate --model black-forest-labs/flux-1.1-pro