LiveKit Voice Agent — Real-Time Voice over WebRTC
A WebRTC voice agent that uses LiveKit for the real-time media loop and NeuroLink as the brain (LLM, tools, memory).
Table of Contents
- Problem Statement & Solution
- Architecture Overview
- Deployment Topologies (Cloud & Self-Hosted)
- Core Components
- How NeuroLink Owns the Brain
- Runtime Flow
- Usage Example
- Source Layout
- Configuration
- Tuning the Voice Loop (VAD, Turn Detection, Interruption, Language)
- Conversation Memory
- Implementation Plan
- Operational Behavior
- Error Handling & Troubleshooting
- Extensibility Roadmap
Problem Statement & Solution
The Challenge
The original NeuroLink voice agent (see voice-agent.md) runs a browser-to-server loop over a WebSocket. That design works, but a WebSocket transport carries structural limits for real-time audio:
- TCP head-of-line blocking and no jitter buffer cause choppy audio on lossy networks
- no built-in acoustic echo cancellation — the assistant can be transcribed by its own mic input
- raw PCM is ~8–10× the bandwidth of a compressed codec, and all of it flows through the application server
- voice-activity detection runs on the application server's event loop, capping per-process concurrency
- SvelteKit and similar frameworks cannot accept the WebSocket upgrade without a custom server entry
The Solution
The LiveKit voice agent moves the transport to WebRTC via LiveKit, while keeping NeuroLink as the brain. LiveKit (an open-source WebRTC platform with a managed cloud and a self-hostable server) provides the parts that are hard to build correctly:
- WebRTC transport with echo cancellation, jitter buffering, packet-loss concealment, and Opus compression
- voice-activity detection, turn detection, and interruption handling
- a worker/job model that runs each call in its own process for isolation and horizontal scaling
NeuroLink remains responsible for the conversation itself:
- the LLM (any NeuroLink provider — Bedrock/Claude, OpenAI, Gemini, etc.)
- tool calling (MCP and registered tools), decided and executed inside
neurolink.stream() - conversation memory, keyed by a stable
conversationId
Key Benefits
- Production-grade real-time audio without building media plumbing
- NeuroLink stays the brain —
generate()/stream(), tools, and memory are unchanged - Worker-per-call scaling provided by the LiveKit Agents runtime
- Cloud or self-hosted with identical application code
- Provider-agnostic brain layer that can later back other transports
Architecture Overview
System Flow Diagram
┌─────────────────────────────────────────────────────────────┐
│ Browser (livekit-client) │
│ • Captures mic; WebRTC handles AEC, Opus, jitter │
│ • Plays assistant audio │
└────────────────────────┬────────────────────────────────────┘
│ WebRTC
▼
┌─────────────────────────────────────────────────────────────┐
│ LiveKit Server (Cloud OR self-hosted) │
│ • Auto-creates the room on first join │
│ • Routes media via its SFU │
│ • Dispatches one Job per room to a registered worker │
└────────────────────────┬────── ──────────────────────────────┘
│ one Job per call (own process)
▼
┌─────────────────────────────────────────────────────────────┐
│ Voice Agent Worker (@livekit/agents, Node) │
│ Silero VAD ─ turn detection / interruption │
│ STT plugin (Deepgram) ─ speech → text │
│ llmNode ──────────────► NeuroLink brain │
│ TTS plugin (ElevenLabs/Cartesia) ─ text → speech │
└────────────────────────┬────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ NeuroLink (the brain — runs inside llmNode) │
│ • neurolink.stream({ conversationId }) │
│ • history → NeuroLink memory (source of truth) │
│ • tools → MCP / registered tools, executed by NeuroLink │
│ • model → any NeuroLink provider │
└─────────────────────────────────────────────────────────────┘
Division of Responsibility
| Concern | Owner |
|---|---|
| WebRTC transport, AEC, jitter, Opus | LiveKit |
| VAD, turn detection, interruption | LiveKit Agents |
| Worker-per-call process isolation & scaling | LiveKit Agents |
| STT / TTS | LiveKit plugins (configurable) |
| LLM, tool-calling, memory | NeuroLink |
| Conversation history source of truth | NeuroLink memory (conversationId) |
Deployment Topologies (Cloud & Self-Hosted)
The application code is identical across topologies; only LIVEKIT_URL and credentials change.
Topology A — LiveKit Cloud (managed)
Browser ──► LiveKit Cloud (rooms + SFU on LiveKit's servers)
Voice Agent Worker (your infra) ──outbound──► LiveKit Cloud (registers; Cloud dispatches Jobs)
Token endpoint (your app) mints join tokens with the Cloud API key/secret
- Rooms are created automatically on LiveKit's servers on first join.
- The worker connects outbound to Cloud and receives dispatched Jobs over that connection — no inbound exposure or tunneling is required, even in local development.
- Billing is per participant-minute (a free Build tier is suitable for development).
- Use when: fastest setup, minimal media ops, dev/staging, or production without running media infrastructure.
Topology B — Self-Hosted LiveKit (in-house)
Browser ──► your LiveKit server (rooms + SFU on your infrastructure)
Voice Agent Worker (your infra) ──► your LiveKit server
Token endpoint (your app) mints join tokens with your server's API key/secret
- The
livekit-server(open source) runs on your own infrastructure (for example, Kubernetes behind your ingress/service mesh). - Media stays inside your network; there is no per-minute media fee — you pay only for compute and bandwidth.
- Use when: cost control at scale, data-residency/compliance requirements, or full control over the media path.
Local Development
- Console mode: the worker runs standalone using the host machine's microphone and speakers — no LiveKit server and no browser required. Best for iterating on the brain loop.
- Local server:
livekit-server --dev(placeholder credentials, no external dependencies) with the browser and worker onlocalhost. - Cloud from local: point local
LIVEKIT_URLat a Cloud project. Because the worker connects outbound, Cloud can dispatch Jobs to a locally-running worker without tunneling.
Core Components
1. LiveKit Agents Worker
A long-lived Node process built on @livekit/agents. It registers with the LiveKit server under an agentName (for example, neurolink-voice). For each room, LiveKit dispatches a Job, which the runtime runs in its own process — this is the worker-per-call isolation that bounds the blast radius of a crash and enables linear scaling by adding worker replicas.
2. Voice Activity Detection & Turn Detection
Provided by the LiveKit Agents AgentSession using the Silero VAD plugin plus the framework's turn-detection and interruption logic. This replaces the hand-built VAD/turn/barge-in logic of the WebSocket voice agent.
3. Speech-to-Text / Text-to-Speech
LiveKit handles the audio transport and turn-taking, but does not perform STT
or TTS itself — those are pluggable provider modules, each its own
@livekit/agents-plugin-<name> package configured with that provider's API key
(via environment). Selected through the stt / tts fields of the agent config.
Available providers (Node SDK, @livekit/agents-plugin-* @ 1.4.x):
| Capability | Providers |
|---|---|
| STT | Deepgram · OpenAI (Whisper) · Google · AssemblyAI · Cartesia · Sarvam · Baseten |
| TTS | ElevenLabs · Cartesia · OpenAI · Google · Rime · Neuphonic · Resemble · Inworld · Hume · Sarvam · Baseten |
| VAD | Silero |
google provides both STT and TTS, so a Google/Vertex deployment can use it for
speech on both sides while NeuroLink (Vertex) serves as the brain — without
adding a separate STT/TTS vendor.
The integration wires provider plugins on demand in
voiceAgent.ts(buildStt/buildTts). Adding a provider from the list above is a small, isolated change in those two functions.
4. NeuroLink Brain (llmNode)
The llmNode is the seam between LiveKit and NeuroLink. It extracts the latest user utterance, calls neurolink.stream() with a stable conversationId, and returns the token stream as ReadableStream<llm.ChatChunk>. Conversation history is not taken from LiveKit's ChatContext; NeuroLink's memory is the source of truth.
5. Token Endpoint
A plain HTTP endpoint in the host application that mints a LiveKit join token (livekit-server-sdk) for an authenticated user. Because WebRTC needs only this single HTTP call, frameworks that cannot accept a WebSocket upgrade (such as SvelteKit) integrate without a custom server entry.
6. Browser Client
The host application's frontend uses livekit-client to join the room, publish the microphone, and play the agent's audio. The browser handles capture, AEC, and playback natively through WebRTC.
How NeuroLink Owns the Brain
This integration is deliberately structured so NeuroLink retains its generic control surface.
History
The llmNode ignores LiveKit's accumulated ChatContext for generation and instead passes a stable conversationId to neurolink.stream(). NeuroLink's memory layer loads and persists history under that id, making NeuroLink the single source of truth for conversation state. LiveKit still maintains its own context internally for turn detection; the two do not conflict because LiveKit's turn detection is audio/transcript-driven.
Tools
Tools (MCP and registered tools) live on the NeuroLink instance. With tools enabled, NeuroLink runs the entire tool-calling loop inside stream() — the model selects a tool, NeuroLink executes it, feeds the result back, and continues. LiveKit performs no tool-calling. To make a merchant/MCP toolset available, have the createNeuroLink factory return an instance with those tools registered — it is invoked inside each job process to build the brain for that call.
Model
The model and provider are NeuroLink configuration (provider, model). Any NeuroLink provider is supported, including Bedrock/Claude.
Interruption (barge-in)
When LiveKit detects barge-in it cancels the in-flight llmNode. That cancellation must be propagated into neurolink.stream() via an abort signal so the in-flight LLM call and any running tool call stop promptly.
Tool latency
While a tool runs inside stream(), no audio is produced. To avoid dead air, instruct the model to speak a brief acknowledgment before tool use and/or emit a status event over a LiveKit data channel for the UI.
Runtime Flow
Normal Turn
- Browser publishes microphone audio to the room (WebRTC).
- LiveKit Agents detects the end of the user's turn (VAD + turn detection).
- STT produces the transcript.
llmNodecallsneurolink.stream({ conversationId, input }).- NeuroLink generates (running any tool calls internally) and streams tokens.
- TTS converts tokens to audio; LiveKit plays it back in the room.
- NeuroLink persists the turn to memory under
conversationId.
Barge-In / Abort
- The assistant is speaking.
- LiveKit detects user speech and cancels the current
llmNode. - The abort signal cancels the in-flight
neurolink.stream()(and any active tool). - The session yields to the user.
Usage Example
The integration is exposed under
@juspay/neurolink/livekit. LiveKit dependencies are optional/peer dependencies and are only required when the voice agent is used.
LiveKit runs each call as a Job in its own child process and re-imports the
agent entry file there. Because a live object cannot cross that process
boundary, the NeuroLink instance is built inside each job process via a
createNeuroLink factory — not passed in from a parent. This is split into two
files: the agent entry file (the default export) and a small launcher.
1. Define and launch the agent
1a. Agent entry file (default export)
// voice-agent-entry.ts
import { defineVoiceAgent } from "@juspay/neurolink/livekit";
import { buildConfiguredNeuroLink } from "./neurolink-instance.js";
export default defineVoiceAgent({
// Built once per call, inside the job process (registers its own tools).
createNeuroLink: async () => buildConfiguredNeuroLink(),
provider: process.env.VOICE_LLM_PROVIDER ?? "bedrock",
model: process.env.VOICE_LLM_MODEL ?? "claude-sonnet-4-6",
systemPrompt:
"You are a concise, helpful voice assistant. Keep replies short and spoken.",
stt: { provider: "deepgram" },
tts: { provider: "elevenlabs" },
});
defineVoiceAgent overrides the agent's llmNode so every turn calls
neurolink.stream() with a per-room conversationId (NeuroLink owns history
and tools), and wires abort-on-interrupt: when LiveKit cancels a turn the
in-flight stream is aborted.
1b. Launcher
// voice-agent-worker.ts — run as its own Node process
import { startVoiceAgentWorker } from "@juspay/neurolink/livekit";
await startVoiceAgentWorker({
agentFile: new URL("./voice-agent-entry.js", import.meta.url).pathname,
agentName: "neurolink-voice",
});
startVoiceAgentWorker resolves LiveKit connection settings from the
environment (LIVEKIT_URL/LIVEKIT_API_KEY/LIVEKIT_API_SECRET) and registers
the worker; LiveKit dispatches one Job per room.
2. Mint a join token (host application, plain HTTP)
import { mintJoinToken } from "@juspay/neurolink/livekit";
export async function GET({ locals }) {
const room = `voice-${locals.merchantId}-${crypto.randomUUID()}`;
const token = await mintJoinToken({
identity: locals.userId,
room,
apiKey: process.env.LIVEKIT_API_KEY!,
apiSecret: process.env.LIVEKIT_API_SECRET!,
});
return Response.json({ token, url: process.env.LIVEKIT_URL, room });
}
3. Join from the browser
import { Room } from "livekit-client";
const { token, url } = await (await fetch("/api/voice/token")).json();
// Enable the browser's built-in WebRTC audio cleanup as capture defaults. These
// are free, run client-side, and need no LiveKit Cloud: echo cancellation stops
// the agent's own voice being re-captured, noise suppression removes steady
// ambient noise, and auto gain normalizes mic level.
const room = new Room({
audioCaptureDefaults: {
echoCancellation: true,
noiseSuppression: true,
autoGainControl: true,
},
});
await room.connect(url, token); // room is auto-created on first join
await room.localParticipant.setMicrophoneEnabled(true);
// remote audio tracks (the agent's voice) play automatically
Lower-level alternative
For full control, build the agent directly with @livekit/agents and supply a custom llmNode that calls neurolink.stream(). startVoiceAgentWorker is a convenience wrapper around that pattern.
Source Layout
src/lib/voice/livekit/
├── brain.ts # provider-agnostic: (transcript, conversationId, signal) → NeuroLink stream
├── voiceAgent.ts # @livekit/agents Agent + llmNode adapter + abort-on-interrupt
├── voiceAgentWorker.ts # startVoiceAgentWorker(): WorkerOptions, agentName, plugin wiring
├── tokens.ts # mintJoinToken() (livekit-server-sdk)
├── config.ts # env resolution (LiveKit, STT/TTS, model/provider)
└── index.ts # public exports → @juspay/neurolink/livekit
brain.tsis transport-agnostic and reusable by future transports (for example, Daily.co).- LiveKit packages are declared as optional/peer dependencies, mirroring how
@picovoice/cobra-nodeis handled for the WebSocket voice agent.
Configuration
LiveKit (Cloud or self-hosted)
LIVEKIT_URL=wss://<project>.livekit.cloud # or wss://livekit.internal (self-hosted) or ws://localhost:7880 (dev)
LIVEKIT_API_KEY=
LIVEKIT_API_SECRET=
STT / TTS plugins
DEEPGRAM_API_KEY=
ELEVENLABS_API_KEY= # or CARTESIA_API_KEY
LLM (NeuroLink brain)
VOICE_LLM_PROVIDER=bedrock
VOICE_LLM_MODEL=claude-sonnet-4-6
# plus the provider's own credentials (e.g. AWS credentials for Bedrock)
Turn detection & lifecycle (optional)
LIVEKIT_EOU_TURN_DETECTION=true # opt in to the semantic end-of-utterance model (English)
LIVEKIT_EOU_UNLIKELY_THRESHOLD=0.15 # optional; lower = more patient, higher = responds sooner
VOICE_INACTIVITY_TIMEOUT_MS=600000 # auto-shut down an idle call after this many ms (default 10 min; <=0 disables)
See Semantic turn detection and Inactivity shutdown for details.
Tuning the Voice Loop (VAD, Turn Detection, Interruption, Language)
All tuning is passed to defineVoiceAgent. Every field is optional and falls back
to a noise-resistant default — you only set what you want to change.
Voice Activity Detection (VAD)
VAD decides when the user is speaking. Stricter values reject background noise so the agent does not treat ambient sound as a turn.
export default defineVoiceAgent({
createNeuroLink: async () => buildConfiguredNeuroLink(),
stt: { provider: "deepgram" },
tts: { provider: "elevenlabs" },
vad: {
activationThreshold: 0.6, // 0–1; "is this speech" cutoff. Higher = stricter.
minSpeechDuration: 0.2, // seconds of speech before a turn STARTS.
minSilenceDuration: 0.6, // seconds of silence before a turn ENDS.
},
});
| Field | Default | Raise it when… |
|---|---|---|
activationThreshold | 0.6 | A noisy room triggers false turns (try 0.7–0.8). |
minSpeechDuration | 0.2s | Short clicks/taps start spurious turns. |
minSilenceDuration | 0.6s | The agent cuts users off during natural pauses. |
Semantic turn detection (end-of-utterance)
Why this exists. VAD only hears silence — it cannot tell the difference
between "I'm finished" and a mid-thought pause. With VAD alone, a user who says
"I'd like to book a flight to… London" gets cut off at the pause, the agent
answers half a sentence, and the rest arrives as a second fragmented turn. Raising
minSilenceDuration to compensate makes the agent feel sluggish on the turns that
are finished. Semantic turn detection breaks that trade-off.
What it does. A small ML model (@livekit/agents-plugin-livekit
turnDetector.EnglishModel) runs on top of VAD and scores how likely the user has
actually finished speaking, using the words transcribed so far. If the user paused
mid-thought, the agent keeps listening; if the utterance is grammatically and
semantically complete, it responds immediately. The result is one clean turn per
thought instead of one turn per pause.
How to enable it. It is opt-in via environment variable:
LIVEKIT_EOU_TURN_DETECTION=true # enable the end-of-utterance model
LIVEKIT_EOU_UNLIKELY_THRESHOLD=0.15 # optional; override the "not done yet" cutoff
LIVEKIT_EOU_TURN_DETECTION accepts any truthy value (true, 1, yes, on).
LIVEKIT_EOU_UNLIKELY_THRESHOLD tunes sensitivity: a probability below the cutoff
means "the user is probably not done," so the agent waits longer. Lower it to make
the agent more patient (wait through more pauses); raise it to make the agent
respond sooner.
Tuning the wait. The turn config bounds how endpointing behaves once the model
has an opinion:
defineVoiceAgent({
// …stt / tts / createNeuroLink…
turn: {
mode: "stt", // turn-detection mode
minEndpointingDelay: 500, // ms grace period after the model thinks the turn is done
maxEndpointingDelay: 6000, // hard ceiling — never wait longer than this
},
});
minEndpointingDelayis the grace period applied when the model decides the turn is complete — a small buffer so a quick continuation isn't clipped.maxEndpointingDelayis a safety ceiling. Even if the model keeps believing the user might continue, the agent never waits forever — it responds once this ceiling is hit.
Cost & limits. The English model adds roughly negligible latency, but non-negligible memory, so size your worker hosts accordingly. The model is English-only; the multilingual runner is intentionally not registered. For non-English calls, leave EOU disabled and rely on VAD endpointing.
Interruption (barge-in)
Controls what counts as the user interrupting the agent while it is speaking. Requiring real words and a minimum duration stops background noise from cutting the agent off mid-sentence.
defineVoiceAgent({
// …stt / tts / createNeuroLink…
interruption: {
minWords: 2, // recognized words required to interrupt (default 2)
minDuration: 600, // milliseconds of audio required to interrupt (default 600)
},
});
Set minWords: 0 for instant barge-in on any sound — more responsive, but more
false interruptions in noisy environments.
Language & multilingual speech
The language field on stt is a soft hint: it biases recognition toward a
language without locking to it, so a user can switch languages mid-call and still
be transcribed correctly.
defineVoiceAgent({
// …
stt: {
provider: "soniox",
language: "en", // soft hint only — multilingual auto-detect still applies
},
});
- Omit
languagefor full auto-detection. - The hint only biases the first guess; it never forces the hinted language. (A strict lock causes the realtime stream to stall on other-language audio, so the integration intentionally keeps the hint soft.)
Speech provider selection
STT and TTS plugins are chosen per agent and configured by environment credentials.
defineVoiceAgent({
// …
stt: { provider: "soniox", model: "stt-rt-preview", language: "en" },
tts: { provider: "cartesia", voice: "<voice-id>", model: "sonic-2" },
});
- STT:
soniox,deepgram. TTS:cartesia,elevenlabs. - Only set
voice/modelif your account supports them; otherwise omit those fields to use the plugin's own defaults.
Conversation Memory
The agent remembers earlier turns automatically when the NeuroLink instance you
build inside createNeuroLink has conversation memory enabled. History is the
agent's source of truth — LiveKit's own transcript context is not used for
generation.
import { NeuroLink } from "@juspay/neurolink";
import { defineVoiceAgent } from "@juspay/neurolink/livekit";
export default defineVoiceAgent({
createNeuroLink: async () =>
new NeuroLink({ conversationMemory: { enabled: true } }),
stt: { provider: "deepgram" },
tts: { provider: "elevenlabs" },
});
How it behaves:
- Keyed per call. Each room/call is an isolated conversation; the id is
derived from the room name. Override the prefix with
conversationIdPrefix(default"voice"). - In-memory by default; Redis for persistence. Set
REDIS_URLto use a shared store that survives worker restarts and is shared across worker replicas — important because each call runs in its own job process. - Works across turns within the session. The user can say "my name is Alex" and later ask "what's my name?" and the agent recalls it.
Memory persists only when the instance is configured with
conversationMemory.enabled. Without it, each turn is independent.
Implementation Plan
The integration is built and validated in phases. Each phase is independently testable.
Phase 0 — Console-mode spike (no infrastructure)
Build a minimal agent (Silero VAD + Deepgram STT + ElevenLabs TTS + llmNode → neurolink.stream()) and run it in console mode using the host machine's mic/speakers. Validates the NeuroLink brain loop, conversationId history, and a tool call — with no LiveKit server and no browser. Requires only STT/TTS and LLM credentials.
Phase 1 — NeuroLink LiveKit module
Implement brain.ts, voiceAgent.ts, voiceAgentWorker.ts, tokens.ts, config.ts; add the @juspay/neurolink/livekit export and optional/peer dependencies. The worker factory accepts an external NeuroLink instance so a host application's registered tools are available. Wire abort-on-interrupt. Verify build, type-check, and lint.
Phase 2 — Host token endpoint + browser client
Add the HTTP token endpoint and a browser page using livekit-client. Verify token issuance and room connection.
Phase 3 — End-to-end (local or Cloud)
Run the worker against livekit-server --dev or a Cloud Build-tier project; complete a full loop in the browser including barge-in, a tool call, and multi-turn memory.
Phase 4 — Tool-call UX
Add abort-on-interrupt verification (barge-in cancels an in-flight tool), tool-latency feedback (acknowledgment phrase and/or data-channel status event), and turn-detection tuning.
Phase 5 — Production
Deploy the worker as its own scalable Node deployment (separate from the web tier). Choose Cloud or self-hosted LiveKit. Validate concurrency and worker-restart isolation.
Operational Behavior
Scaling
LiveKit Agents uses a Worker→Job model: a worker registers with the LiveKit server and is dispatched one Job per room, each Job running in its own process. Scale by adding worker replicas; a worker failure restarts affected Jobs on another worker without impacting others.
Inactivity shutdown
Why this matters. Every call runs in its own process, which holds real resources for the whole call: the STT/TTS connections, conversation memory, and — when semantic turn detection is on — the ~200 MB end-of-utterance model. If a caller walks away without hanging up, that process would otherwise linger indefinitely, holding RAM and (on LiveKit Cloud) continuing to bill per participant-minute. An inactivity watchdog reclaims those resources automatically.
What it does. A timer tracks how long the call has been idle. Any real activity resets it — the user speaking, the agent speaking, or a new conversation item being added. If no activity occurs within the threshold, the watchdog calls the job's graceful shutdown, which tears down the process cleanly (the same path used when a call ends normally).
How to configure it.
VOICE_INACTIVITY_TIMEOUT_MS=600000 # default 10 minutes; set <=0 to disable entirely
- Default is 10 minutes. Lower it to reclaim resources faster on short-lived calls; raise it for workflows with long expected silences.
- Set to
0(or any non-positive value) to disable the watchdog — calls then end only on explicit hang-up or transport disconnect.
Cloud vs Self-Hosted Cost
- Cloud: per participant-minute (a call has two participants — the user and the agent). A free Build tier covers development.
- Self-hosted: no per-minute media fee; cost is the compute and bandwidth of running
livekit-serverand workers on your infrastructure.
Why the brain layer is transport-agnostic
brain.ts exposes a small surface — given a transcript, a conversationId, and an abort signal, it returns a NeuroLink stream. This keeps the NeuroLink integration reusable if an alternative transport is added later.
Error Handling & Troubleshooting
Worker not receiving Jobs
- Confirm the worker registered with the correct
LIVEKIT_URLandagentName. - For Cloud, confirm the worker process is running and its outbound connection is established (no inbound exposure is required).
No assistant audio
- Verify STT/TTS plugin credentials.
- Check that the TTS plugin is producing frames for the room.
Assistant talks over the user / does not stop on interruption
- Verify abort-on-interrupt is wired: LiveKit's cancellation must abort the in-flight
neurolink.stream()(and any active tool).
Long silence during tool calls
- Expected while a tool runs inside
stream(). Add an acknowledgment phrase and/or a data-channel status event.
Tools not available in voice
- Ensure the
createNeuroLinkfactory returns an instance with tools registered, and that tools are not disabled.
Extensibility Roadmap
- Additional transport providers — back the same
brain.tswith another WebRTC provider (for example, Daily.co). Note that some providers' server-side agent paths are not Node-native. - Human-in-the-loop (HITL) — voice-native confirmation, or route NeuroLink HITL approvals over a LiveKit data channel with matching abort handling.
- Tool-call UI events — emit structured tool start/result events to the client for live status display.
- Voice personalization — selectable voices, language presets, speaking-style controls.
- Pluggable STT/TTS through NeuroLink — use NeuroLink's own STT/TTS providers via custom nodes instead of LiveKit plugins.