Pipe Architecture
Every generate() and stream() call travels the same six-stage pipe. Understanding the pipe is understanding NeuroLink.
The Six Stages
1. Context Building
Before any tokens are generated, the pipe assembles the full context:
- RAG retrieval — If
rag: { files: [...] }is set, documents are chunked, embedded, and asearch_knowledge_basetool is registered - Memory lookup — Conversation history fetched from Redis or in-memory store
- File processing — Attached files (images, PDFs, code, CSV) processed by
ProcessorRegistryinto provider-appropriate formats - System prompt injection — Custom system prompts merged with NeuroLink defaults
2. Budget Check
BudgetChecker validates the assembled context fits within the model's context window before every LLM call.
- Threshold: triggers at 80% of context window
- If over budget: runs
ContextCompactor(4-stage pipeline):- Tool output pruning — replaces old tool results with placeholders
- File read deduplication — keeps only latest read of each file
- LLM summarization — structured 9-section summary of oldest messages
- Sliding window truncation — non-destructive tagging of oldest messages
Context windows are tracked per-provider, per-model in src/lib/constants/contextWindows.ts.
3. Provider Dispatch
ProviderRegistry resolves the provider name to a concrete implementation:
// All providers loaded via dynamic import to prevent circular dependencies
ProviderFactory.registerProvider(
AIProviderName.ANTHROPIC,
async (modelName?) => {
const { AnthropicProvider } = await import("../providers/anthropic.js");
return new AnthropicProvider(modelName);
},
AnthropicModels.CLAUDE_SONNET_4_5,
["claude", "sonnet", "haiku"],
);
Switching providers requires one line — the rest of the pipe is unchanged.
4. Stream Emission
Tokens arrive as an async iterable. generate() is stream() collected — there is only stream().
// This is what generate() does internally
let text = "";
for await (const token of this.stream(options)) {
text += token;
}
return { text };
The stream handles multiple event types: text deltas, tool calls, thinking blocks, usage statistics.
5. Tool Interception
When the model emits a tool call, the stream pauses:
- Tool call extracted from the stream
MCPToolRegistrydispatches to the correct tool (built-in or external MCP server)- Tool result injected back into the conversation
- Model resumes generating from the tool result
- Stream continues
MCP transports supported: stdio (local), http (remote), sse, websocket.
6. Observability
Every stage emits OpenTelemetry spans. The full trace covers:
- Context build duration
- Token counts (input + output)
- Tool execution times
- Memory read/write latency
- Provider-specific attributes (model, temperature, finish reason)
Exporters: Langfuse, OTLP, Jaeger, Zipkin, Prometheus, Datadog, NewRelic, Honeycomb, Console.
Key Files
| File | Purpose |
|---|---|
src/lib/neurolink.ts | Main SDK class — orchestrates all six stages |
src/lib/factories/providerRegistry.ts | Provider registration with dynamic imports |
src/lib/context/budgetChecker.ts | Pre-generation context budget validation |
src/lib/context/contextCompactor.ts | Multi-stage compaction orchestrator |
src/lib/mcp/toolRegistry.ts | Tool management and MCP integration |
src/lib/rag/ragIntegration.ts | RAG auto-pipeline setup |
src/lib/telemetry/ | OpenTelemetry instrumentation |
Design Invariants
These never change regardless of provider, model, or connector:
- Dynamic imports only — No static imports of providers in the registry (prevents circular deps)
- Stream is the primitive —
generate()is always stream collected; nothing bypassesstream() - Budget checked before every call — No LLM call without a budget check
- Tools are always external — MCP protocol for all tool integrations, including built-ins
- Memory is scoped — Each conversation has isolated memory; no cross-contamination