Skip to main content

Streaming Guide

Since: v8.0.0 | Status: Stable | Availability: SDK + CLI

Provider Defaults: When --provider (CLI) or provider (SDK) is not specified, NeuroLink defaults to Vertex AI with gemini-2.5-flash. Set the NEUROLINK_PROVIDER or AI_PROVIDER environment variable to change the default provider.

Overview

Streaming lets you receive AI-generated text incrementally -- token by token -- instead of waiting for the entire response. This is the same mechanism behind the "typing" effect you see in ChatGPT and other chat interfaces.

Why use streaming?

  • Faster time-to-first-token -- Users see output within milliseconds rather than waiting seconds for a complete response.
  • Better UX -- Progressive rendering feels more interactive and responsive.
  • Lower memory footprint -- Process tokens as they arrive instead of buffering the full response.
  • Early cancellation -- Stop generation as soon as you have what you need.

Quick Start

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

const result = await neurolink.stream({
input: { text: "Explain how TCP works in two paragraphs" },
});

for await (const chunk of result.stream) {
if ("content" in chunk) {
process.stdout.write(chunk.content);
}
}

That is the simplest possible streaming call. The sections below cover every option in detail.

SDK API

neurolink.stream(options): Promise<StreamResult>

The stream() method accepts a StreamOptions object and returns a StreamResult.

StreamOptions (Key Parameters)

ParameterTypeRequiredDescription
input{ text: string, ... }YesThe prompt and optional multimodal inputs (images, PDFs, files, audio)
providerstringNoAI provider name ("openai", "anthropic", "google-ai", "vertex", etc.)
modelstringNoSpecific model ("gpt-4o", "claude-3-5-sonnet", "gemini-2.5-flash")
temperaturenumberNoRandomness (0.0 = deterministic, 2.0 = creative). Default varies by provider
maxTokensnumberNoMaximum tokens in the response
systemPromptstringNoSystem message to control AI behavior
toolsRecord<string, Tool>NoCustom tools the model can invoke during generation
ragRAGConfigNoRAG configuration -- pass { files: [...] } for automatic retrieval
timeoutnumber | stringNoRequest timeout in milliseconds
abortSignalAbortSignalNoExternal cancellation signal
maxStepsnumberNoMaximum tool execution steps (default: 5)
disableToolsbooleanNoSet true to disable all tool usage
ttsTTSOptionsNoEnable text-to-speech audio alongside text

input Object

The input field is the only required parameter. At minimum it needs a text property:

// Text only
input: { text: "Your prompt here" }

// Text + images
input: {
text: "What is in this image?",
images: [Buffer.from(pngData), "https://example.com/photo.jpg"]
}

// Text + PDF files
input: {
text: "Summarize this document",
pdfFiles: ["./report.pdf"]
}

// Text + auto-detected files
input: {
text: "Review this code",
files: ["./src/app.ts"]
}

StreamResult

Calling stream() returns a StreamResult object. The response itself arrives through the .stream async iterable, while metadata fields resolve once the stream completes.

FieldTypeDescription
streamAsyncIterable<StreamChunk>The async iterable you consume with for await
providerstringName of the provider that served the request
modelstringModel that was used
usageTokenUsageToken usage (prompt, completion, total)
finishReasonstringWhy generation stopped ("stop", "length", "tool-calls")
toolCallsToolCall[]Tool calls made during generation
toolResultsToolResult[]Results from tool execution
toolExecutionsToolExecutionSummary[]Detailed summary of all tool executions
metadataobjectStream metadata (streamId, startTime, totalChunks, responseTime)
analyticsAnalyticsData | Promise<AnalyticsData>Usage analytics (when enableAnalytics: true)

Stream Chunks

Each chunk yielded by result.stream is a discriminated union:

Text Chunks

The most common chunk type. Contains a content string with the next piece of generated text.

for await (const chunk of result.stream) {
if ("content" in chunk) {
process.stdout.write(chunk.content);
}
}

You can also check the type discriminator:

for await (const chunk of result.stream) {
if (chunk.type === "text") {
process.stdout.write(chunk.content);
}
}

Audio Chunks (TTS)

When TTS is enabled, the stream interleaves text and audio chunks:

const result = await neurolink.stream({
input: { text: "Tell me a story" },
provider: "google-ai",
tts: { enabled: true, voice: "en-US-Neural2-C" },
});

const audioBuffers: Buffer[] = [];

for await (const chunk of result.stream) {
switch (chunk.type) {
case "text":
process.stdout.write(chunk.content);
break;
case "audio":
audioBuffers.push(chunk.audio.data);
break;
}
}

See the TTS Guide for full audio streaming details.

Collecting the Full Response

If you need the complete text after streaming finishes, accumulate chunks into a string:

const result = await neurolink.stream({
input: { text: "Write a haiku about programming" },
});

let fullText = "";

for await (const chunk of result.stream) {
if ("content" in chunk) {
fullText += chunk.content;
}
}

console.log("Complete response:", fullText);
console.log("Tokens used:", result.usage);
console.log("Finish reason:", result.finishReason);

Streaming with Tools

Tools work transparently during streaming. The model calls tools mid-stream, receives results, and continues generating. You consume the stream exactly the same way -- tool execution happens behind the scenes.

import { NeuroLink } from "@juspay/neurolink";
import { tool } from "ai";
import { z } from "zod";

const neurolink = new NeuroLink();

const weatherTool = tool({
description: "Get current weather for a city",
parameters: z.object({ city: z.string() }),
execute: async ({ city }) => {
return { temperature: 22, condition: "sunny", city };
},
});

const result = await neurolink.stream({
input: { text: "What is the weather like in Tokyo right now?" },
tools: { getWeather: weatherTool },
maxSteps: 3,
});

for await (const chunk of result.stream) {
if ("content" in chunk) {
process.stdout.write(chunk.content);
}
}

// After stream completes, inspect tool activity
console.log("Tool calls:", result.toolCalls);
console.log("Tool results:", result.toolResults);

Streaming with RAG

Pass rag: { files: [...] } to automatically index documents and give the model a search tool. The model decides when to search during generation.

const result = await neurolink.stream({
input: { text: "What deployment strategies does the guide recommend?" },
rag: {
files: ["./docs/deployment-guide.md"],
strategy: "markdown",
chunkSize: 512,
topK: 5,
},
});

for await (const chunk of result.stream) {
if ("content" in chunk) {
process.stdout.write(chunk.content);
}
}

See the RAG Guide for configuration details and advanced usage.

Streaming with Multimodal Input

Stream responses that analyze images, PDFs, or other files:

import { readFileSync } from "fs";

// Stream with image input
const result = await neurolink.stream({
input: {
text: "Describe what you see in detail",
images: [readFileSync("./photo.png")],
},
provider: "openai",
model: "gpt-4o",
});

for await (const chunk of result.stream) {
if ("content" in chunk) {
process.stdout.write(chunk.content);
}
}

Cancellation with AbortSignal

Use an AbortSignal to cancel a stream from outside:

const controller = new AbortController();

// Cancel after 5 seconds
setTimeout(() => controller.abort(), 5000);

const result = await neurolink.stream({
input: { text: "Write a very long story" },
abortSignal: controller.signal,
});

try {
for await (const chunk of result.stream) {
if ("content" in chunk) {
process.stdout.write(chunk.content);
}
}
} catch (error) {
if (error.name === "AbortError") {
console.log("\nStream cancelled.");
}
}

CLI Streaming

The NeuroLink CLI streams by default with the stream command:

# Basic streaming
neurolink stream "Explain quantum computing"

# With provider and model
neurolink stream "Write a poem" --provider openai --model gpt-4o

# With temperature
neurolink stream "Creative story about robots" --temperature 0.9

# With RAG
neurolink stream "Summarize the docs" --rag-files ./docs/guide.md

# With system prompt
neurolink stream "Translate to French" --system "You are a professional translator"

Error Handling

Errors can occur either when initiating the stream or while consuming chunks. Handle both cases:

try {
const result = await neurolink.stream({
input: { text: "Hello" },
provider: "openai",
timeout: 10000,
});

try {
for await (const chunk of result.stream) {
if ("content" in chunk) {
process.stdout.write(chunk.content);
}
}
} catch (streamError) {
// Error during streaming (network drop, provider error mid-stream)
console.error("Stream interrupted:", streamError.message);
}
} catch (initError) {
// Error before streaming starts (auth failure, invalid model, budget exceeded)
console.error("Failed to start stream:", initError.message);
}

Common Errors

ErrorCauseSolution
SESSION_BUDGET_EXCEEDEDSession cost exceeded maxBudgetUsd limitIncrease budget or start a new session
PROVIDER_AUTH_ERRORMissing or invalid API keySet the provider's API key environment variable
TIMEOUTRequest exceeded timeoutIncrease timeout or use abortSignal for control
MODEL_NOT_FOUNDInvalid model nameCheck provider docs for supported model names

Provider Support

All NeuroLink providers support streaming:

ProviderStreamingNotes
OpenAIYesFull streaming with tool support
AnthropicYesFull streaming with tool support
Google AI StudioYesFull streaming with tool support
Google Vertex AIYesFull streaming with tool support
Amazon BedrockYesFull streaming with tool support
Azure OpenAIYesFull streaming with tool support
MistralYesFull streaming with tool support
LiteLLMYesFull streaming; tool support depends on underlying model
OllamaYesFull streaming; tool support depends on model
Hugging FaceYesStreaming support; tool support varies by model
Amazon SageMakerLimitedFalls back to fake streaming (generate then emit as chunks)
OpenAI-CompatibleYesDepends on the endpoint's streaming support

When real streaming is not available for a provider or model, NeuroLink transparently falls back to "fake streaming" -- it generates the full response and then emits it as chunks. Your consuming code does not need to change.

See Also