RAG Document Processing Guide
Since: v8.44.0 | Status: Stable | Availability: SDK + CLI
Provider Defaults: When
--provider(CLI) orprovider(SDK) is not specified, NeuroLink defaults to Vertex AI with gemini-2.5-flash. Set theNEUROLINK_PROVIDERorAI_PROVIDERenvironment variable to change the default provider.
Overview
NeuroLink provides enterprise-grade RAG (Retrieval-Augmented Generation) capabilities for building production AI applications:
- 10 Chunking Strategies: Character, recursive, sentence, token, markdown, HTML, JSON, LaTeX, semantic, and semantic-markdown chunking for any content type
- Hybrid Search: Combine BM25 keyword search with vector embeddings using RRF or linear fusion
- Multi-Factor Reranking: LLM, cross-encoder, Cohere API, and simple position-based reranking options
- Factory + Registry Patterns: Extensible architecture with lazy loading, aliases, and full TypeScript support
- Resilience Built-In: Circuit breakers, retry handlers, and comprehensive error handling
Quick Start
Basic Document Processing
import { loadDocument, createChunker, createReranker } from "@juspay/neurolink";
// Load and chunk a document
const doc = await loadDocument("/path/to/document.md");
const chunker = await createChunker("markdown", {
maxSize: 1000,
overlap: 100,
});
const chunks = await chunker.chunk(doc.content);
// Each chunk includes metadata
console.log(chunks[0]);
// {
// id: "chunk-abc123",
// text: "## Introduction\n\nThis document covers...",
// metadata: {
// documentId: "doc-xyz",
// chunkIndex: 0,
// startOffset: 0,
// endOffset: 847
// }
// }
Full RAG Pipeline
import { RAGPipeline, createRAGPipeline } from "@juspay/neurolink";
const pipeline = new RAGPipeline({
embeddingModel: { provider: "vertex", modelName: "gemini-2.5-flash" },
generationModel: { provider: "vertex", modelName: "gemini-2.5-flash" },
});
// Ingest documents
await pipeline.ingest(["./docs/*.md", "./knowledge/**/*.txt"]);
// Query with automatic retrieval and generation
const response = await pipeline.query("What are the key features?");
console.log(response.answer);
console.log(response.sources); // Retrieved chunks with citations
Integration with generate() and stream()
The RAG system integrates seamlessly with NeuroLink's generate() and stream() APIs through the createVectorQueryTool. This allows AI models to automatically query your knowledge base during generation.
Using RAG with generate()
import {
NeuroLink,
createVectorQueryTool,
InMemoryVectorStore,
} from "@juspay/neurolink";
// 1. Set up vector store with your data
const vectorStore = new InMemoryVectorStore();
await vectorStore.upsert("knowledge-base", [
{
id: "doc1",
vector: embedding1,
metadata: { text: "Your document content..." },
},
// ... more documents
]);
// 2. Create the RAG tool
const ragTool = createVectorQueryTool(
{
id: "knowledge-search",
description: "Search the knowledge base for relevant information",
indexName: "knowledge-base",
embeddingModel: { provider: "vertex", modelName: "gemini-2.5-flash" },
topK: 5,
reranker: {
model: { provider: "vertex", modelName: "gemini-2.5-flash" },
topK: 3,
},
},
vectorStore,
);
// 3. Use with generate()
const neurolink = new NeuroLink();
const result = await neurolink.generate({
input: { text: "What are the key features of our product?" },
tools: { [ragTool.name]: ragTool },
provider: "vertex",
model: "gemini-2.5-flash",
});
console.log(result.content);
console.log(result.toolExecutions); // See RAG tool results
Using RAG with stream()
// Same setup as above, then:
const result = await neurolink.stream({
input: { text: "Explain our pricing model in detail" },
tools: { [ragTool.name]: ragTool },
provider: "vertex",
model: "gemini-2.5-flash",
});
for await (const chunk of result.stream) {
if ("content" in chunk) {
process.stdout.write(chunk.content);
}
}
Complete RAG Pipeline Example
This example demonstrates a full RAG pipeline from document loading to AI-powered retrieval:
import {
NeuroLink,
createVectorQueryTool,
InMemoryVectorStore,
} from "@juspay/neurolink";
import {
loadDocument,
createChunker,
createMetadataExtractor,
} from "@juspay/neurolink";
// Step 1: Load and chunk documents
const doc = await loadDocument("./docs/product-guide.md");
const chunker = await createChunker("markdown", {
maxSize: 1000,
overlap: 100,
preserveHeaders: true,
});
const chunks = await chunker.chunk(doc.content);
// Step 2: Extract metadata for better retrieval (optional)
const extractor = await createMetadataExtractor("llm", {
provider: "vertex",
modelName: "gemini-2.5-flash",
});
const enrichedChunks = await extractor.extract(chunks, {
summary: true,
keywords: true,
});
// Step 3: Generate embeddings using the NeuroLink provider
const neurolink = new NeuroLink();
// Helper function to generate embeddings
async function generateEmbeddings(texts: string[]): Promise<number[][]> {
const embeddings: number[][] = [];
for (const text of texts) {
const result = await neurolink.generate({
input: { text },
provider: "vertex",
model: "gemini-2.5-flash",
});
// Extract embedding from result (provider-specific)
embeddings.push(result.embedding || []);
}
return embeddings;
}
const embeddings = await generateEmbeddings(enrichedChunks.map((c) => c.text));
// Step 4: Store in vector store
const vectorStore = new InMemoryVectorStore();
await vectorStore.upsert(
"product-docs",
enrichedChunks.map((chunk, i) => ({
id: chunk.id,
vector: embeddings[i],
metadata: {
text: chunk.text,
summary: chunk.metadata.summary,
keywords: chunk.metadata.keywords,
source: "product-guide.md",
},
})),
);
// Step 5: Create RAG tool
const ragTool = createVectorQueryTool(
{
id: "product-search",
description: "Search product documentation for answers to user questions",
indexName: "product-docs",
embeddingModel: { provider: "vertex", modelName: "gemini-2.5-flash" },
topK: 5,
includeSources: true,
reranker: {
model: { provider: "vertex", modelName: "gemini-2.5-flash" },
topK: 3,
weights: { semantic: 0.6, vector: 0.3, position: 0.1 },
},
},
vectorStore,
);
// Step 6: Use with generate()
const response = await neurolink.generate({
input: { text: "How do I configure the billing settings?" },
tools: { [ragTool.name]: ragTool },
provider: "vertex",
model: "gemini-2.5-flash",
systemPrompt: `You are a helpful product assistant. Use the product-search tool
to find relevant information before answering questions. Always cite your sources.`,
});
console.log("Answer:", response.content);
console.log(
"Sources used:",
response.toolExecutions?.map((t) => t.result?.sources),
);
Configuration Options for createVectorQueryTool
| Option | Type | Default | Description |
|---|---|---|---|
id | string | vector-query-{uuid} | Unique identifier for the tool |
description | string | Default description | Description shown to AI for tool selection |
indexName | string | Required | Name of the index in the vector store |
embeddingModel | { provider: string, modelName: string } | Required | Embedding model configuration |
enableFilter | boolean | false | Enable metadata filtering in queries |
includeVectors | boolean | false | Include raw vectors in results |
includeSources | boolean | true | Include source documents in response |
topK | number | 10 | Number of results to retrieve |
reranker | RerankerConfig | undefined | Optional reranker configuration |
providerOptions | VectorProviderOptions | undefined | Provider-specific options (Pinecone, pgVector, Chroma) |
Reranker Configuration
| Option | Type | Default | Description |
|---|---|---|---|
model | { provider: string, modelName: string } | Required | Model for semantic reranking |
weights | { semantic?: number, vector?: number, position?: number } | { semantic: 0.5, vector: 0.3, position: 0.2 } | Score weights (must sum to 1.0) |
topK | number | Same as tool topK | Results to return after reranking |
Event Handling
Listen for tool events during RAG operations to monitor and debug:
const neurolink = new NeuroLink();
// Listen for tool execution events
neurolink.on("tool:start", (event) => {
console.log(`Tool started: ${event.toolName}`);
console.log(`Parameters:`, event.parameters);
});
neurolink.on("tool:end", (event) => {
console.log(`Tool completed: ${event.toolName}`);
console.log(`Success: ${event.success}`);
console.log(`Response time: ${event.responseTime}ms`);
if (event.result) {
console.log(`Results found: ${event.result.totalResults}`);
}
if (event.error) {
console.error(`Error:`, event.error.message);
}
});
// Listen for generation events
neurolink.on("generation:start", (event) => {
console.log(`Generation started with provider: ${event.provider}`);
});
neurolink.on("generation:end", (event) => {
console.log(`Generation completed in ${event.responseTime}ms`);
console.log(`Tools used: ${event.toolsUsed?.join(", ") || "none"}`);
});
// Execute RAG query with event monitoring
const result = await neurolink.generate({
input: { text: "What are the system requirements?" },
tools: { [ragTool.name]: ragTool },
provider: "vertex",
model: "gemini-2.5-flash",
});
Dynamic Vector Store Resolution
For multi-tenant applications, you can provide a resolver function instead of a static vector store:
const ragTool = createVectorQueryTool(
{
id: "tenant-search",
description: "Search tenant-specific knowledge base",
indexName: "documents",
embeddingModel: { provider: "vertex", modelName: "gemini-2.5-flash" },
topK: 5,
},
(context) => {
// Return different vector stores based on request context
const tenantId = context.tenantId || "default";
return getVectorStoreForTenant(tenantId);
},
);
// The context is passed from generate options
const result = await neurolink.generate({
input: { text: "Search query" },
tools: { [ragTool.name]: ragTool },
context: { tenantId: "tenant-123", userId: "user-456" },
});
Metadata Filtering
Enable metadata filtering for more precise retrieval:
const ragTool = createVectorQueryTool(
{
id: "filtered-search",
description: "Search with metadata filters",
indexName: "knowledge-base",
embeddingModel: { provider: "vertex", modelName: "gemini-2.5-flash" },
enableFilter: true, // Enable filter parameter
topK: 10,
},
vectorStore,
);
// The AI can now use filters in its queries
// Example filter syntax supported:
// { category: 'billing' } - Exact match
// { date: { $gte: '2024-01-01' } } - Comparison operators
// { tags: { $in: ['feature', 'guide'] } } - Array membership
// { $and: [{ type: 'doc' }, { status: 'published' }] } - Logical operators
Chunking Strategies
NeuroLink provides 10 chunking strategies optimized for different content types.
Available Strategies
| Strategy | Best For | Key Config |
|---|---|---|
character | Simple text, logs | maxSize, separator |
recursive | General documents (default) | maxSize, overlap, separators |
sentence | Natural language, Q&A | maxSize, minSentences |
token | LLM context optimization | maxSize (tokens), tokenizer |
markdown | Documentation, READMEs | preserveHeaders, codeBlockHandling |
html | Web content | preserveTags, removeTags |
json | API responses, config | preserveStructure, flattenDepth |
latex | Academic papers | sectionCommands, preserveMath |
semantic | Context-aware splitting | similarityThreshold, embedder |
semantic-markdown | Knowledge bases | semanticThreshold, embedder |
Strategy Configuration
import { createChunker, getAvailableStrategies } from "@juspay/neurolink";
// List all available strategies
const strategies = getAvailableStrategies();
// ['character', 'recursive', 'sentence', 'token', 'markdown', 'html', 'json', 'latex', 'semantic', 'semantic-markdown']
// Recursive chunker (recommended for general use)
const recursiveChunker = await createChunker("recursive", {
maxSize: 1000,
overlap: 200,
separators: ["\n\n", "\n", ". ", " ", ""],
keepSeparator: true,
});
// Markdown chunker (for documentation)
const markdownChunker = await createChunker("markdown", {
maxSize: 1000,
overlap: 100,
preserveHeaders: true,
codeBlockHandling: "preserve", // 'preserve' | 'split' | 'remove'
});
// Token chunker (for LLM optimization)
const tokenChunker = await createChunker("token", {
maxSize: 512, // Max tokens per chunk
overlap: 50, // Token overlap
tokenizer: "cl100k_base", // OpenAI tokenizer
});
Content-Type Recommendations
import { getRecommendedStrategy } from "@juspay/neurolink";
// Get strategy based on content type
getRecommendedStrategy("text/markdown"); // 'markdown'
getRecommendedStrategy("text/html"); // 'html'
getRecommendedStrategy("application/json"); // 'json'
getRecommendedStrategy("text/x-latex"); // 'latex'
getRecommendedStrategy("text/plain"); // 'recursive'
Hybrid Search
Hybrid search combines BM25 keyword matching with vector similarity for improved retrieval quality.
How It Works
- BM25 Search: Traditional keyword matching using term frequency and document length normalization
- Vector Search: Semantic similarity using embeddings
- Score Fusion: Combine rankings using RRF or linear combination
Fusion Methods
Reciprocal Rank Fusion (RRF)
RRF is robust to score scale differences and works well in most cases:
import { reciprocalRankFusion } from "@juspay/neurolink";
// Combine rankings from multiple sources
const fusedScores = reciprocalRankFusion(
[vectorRankings, bm25Rankings],
60, // k parameter (default: 60)
);
// RRF formula: score(d) = sum(1 / (k + rank(d)))
Linear Combination
Linear combination allows fine-tuning the balance between vector and keyword scores:
import { linearCombination } from "@juspay/neurolink";
const combinedScores = linearCombination(
vectorScores, // Map<string, number>
bm25Scores, // Map<string, number>
0.5, // alpha: weight for vector scores (0-1)
);
// Linear formula: score(d) = alpha * vectorScore(d) + (1 - alpha) * bm25Score(d)
Hybrid Search Pipeline
import {
createHybridSearch,
InMemoryBM25Index,
InMemoryVectorStore,
} from "@juspay/neurolink";
// Create indices
const bm25Index = new InMemoryBM25Index({ k1: 1.2, b: 0.75 });
const vectorStore = new InMemoryVectorStore();
// Add documents to both indices
const documents = [
{
id: "doc1",
text: "Machine learning fundamentals...",
metadata: { topic: "ml" },
},
{
id: "doc2",
text: "Deep learning architectures...",
metadata: { topic: "dl" },
},
];
await bm25Index.addDocuments(documents);
await vectorStore.addDocuments(documents);
// Create hybrid search
const hybridSearch = createHybridSearch({
bm25Index,
vectorStore,
fusionMethod: "rrf", // 'rrf' | 'linear'
alpha: 0.5, // Vector weight (for linear fusion)
k: 60, // RRF parameter
});
// Execute search
const results = await hybridSearch.search("neural network training", {
topK: 10,
filter: { topic: "ml" },
});
BM25 Configuration
type BM25Config = {
k1: number; // Term frequency saturation (default: 1.2)
b: number; // Document length normalization (default: 0.75)
lowercase: boolean; // Normalize to lowercase (default: true)
stemming: boolean; // Apply stemming (default: false)
stopwords: string[]; // Words to ignore (default: English stopwords)
};
Reranking
Reranking re-scores initial search results for improved relevance.
Available Reranker Types
| Type | Description | Requires Model | Best For |
|---|---|---|---|
simple | Position + vector score combination | No | Fast, cost-free baseline |
llm | LLM semantic relevance scoring | Yes | High-quality semantic |
cross-encoder | Cross-encoder model scoring | Yes | Accuracy-focused tasks |
cohere | Cohere Rerank API | API Key | Production-grade results |
batch | Batch LLM processing | Yes | Large result sets |
Reranker Configuration
import { createReranker, getAvailableRerankerTypes } from "@juspay/neurolink";
// List available types
const types = getAvailableRerankerTypes();
// ['simple', 'llm', 'cross-encoder', 'cohere', 'batch']
// Simple reranker (no model required)
const simpleReranker = await createReranker("simple", {
topK: 10,
positionWeight: 0.3,
scoreWeight: 0.7,
});
// LLM reranker (requires model)
const llmReranker = await createReranker("llm", {
topK: 5,
model: "gemini-2.5-flash",
temperature: 0.0,
batchSize: 5,
});
// Cohere reranker (requires API key)
const cohereReranker = await createReranker("cohere", {
topK: 10,
model: "rerank-v3.5",
maxChunksPerDoc: 10,
});
// Rerank results
const reranked = await simpleReranker.rerank(searchResults, query, { topK: 5 });
Batch Reranking for Large Sets
import { batchRerank } from "@juspay/neurolink";
// Process large result sets efficiently
const reranked = await batchRerank(searchResults, query, {
batchSize: 10,
parallelBatches: 3,
model: "gemini-2.5-flash",
topK: 20,
});
Metadata Extraction
Extract structured metadata from chunks using LLMs.
Extraction Types
| Type | Description | Output |
|---|---|---|
title | Document/section title | string |
summary | Brief content summary | string |
keywords | Relevant keywords | string[] |
questions | Q&A pairs for the content | {question, answer}[] |
custom | Custom schema extraction | Record<string, unknown> |
Usage
import {
createMetadataExtractor,
extractMetadata,
LLMMetadataExtractor,
} from "@juspay/neurolink";
// Using factory
const extractor = await createMetadataExtractor("llm", {
provider: "vertex",
modelName: "gemini-2.5-flash",
});
// Extract metadata from chunks
const results = await extractor.extract(chunks, {
title: true,
summary: true,
keywords: true,
questions: { maxQuestions: 3 },
});
// Results include extracted metadata per chunk
console.log(results[0]);
// {
// title: "Introduction to Machine Learning",
// summary: "This section covers the fundamentals...",
// keywords: ["machine learning", "supervised learning", "classification"],
// questions: [
// { question: "What is supervised learning?", answer: "..." }
// ]
// }
Configuration Reference
Chunker Configuration
| Option | Type | Default | Description |
|---|---|---|---|
maxSize | number | 1000 | Maximum chunk size (chars/tokens) |
overlap | number | 200 | Overlap between chunks |
minSize | number | 50 | Minimum chunk size |
documentId | string | auto-UUID | Document identifier for metadata |
metadata | Record<string, unknown> | {} | Additional metadata for all chunks |
Reranker Configuration
| Option | Type | Default | Description |
|---|---|---|---|
topK | number | 10 | Number of top results to return |
minScore | number | 0.0 | Minimum score threshold |
includeOriginalScores | boolean | false | Include original scores |
Hybrid Search Configuration
| Option | Type | Default | Description |
|---|---|---|---|
fusionMethod | 'rrf' | 'linear' | 'rrf' | Score fusion method |
alpha | number | 0.5 | Vector weight (linear only) |
k | number | 60 | RRF k parameter |
topK | number | 10 | Results to return |
Environment Variables
| Variable | Description | Required |
|---|---|---|
GOOGLE_APPLICATION_CREDENTIALS | For Vertex AI (service account JSON path) | Yes |
OPENAI_API_KEY | For OpenAI provider | Optional |
COHERE_API_KEY | For Cohere reranker | Optional |
ANTHROPIC_API_KEY | For Claude-based reranking | Optional |
Advanced Usage
Integration with Observability
Track RAG operations with Langfuse for debugging and optimization:
import { setLangfuseContext } from "@juspay/neurolink";
import { RAGPipeline } from "@juspay/neurolink";
const pipeline = new RAGPipeline(config);
await setLangfuseContext(
{
userId: "user-123",
sessionId: "session-456",
operationName: "rag-query",
metadata: {
pipeline: "customer-support",
chunkingStrategy: "markdown",
},
},
async () => {
const response = await pipeline.query("How do I reset my password?");
return response;
},
);
Integration with Guardrails
Validate RAG inputs and outputs with guardrails:
import {
createGuardrail,
validateInput,
validateOutput,
} from "@juspay/neurolink";
import { RAGPipeline } from "@juspay/neurolink";
// Create guardrails for RAG
const inputGuardrail = createGuardrail({
type: "input",
rules: [
{ type: "maxLength", value: 1000 },
{ type: "noPersonalInfo", enabled: true },
],
});
const outputGuardrail = createGuardrail({
type: "output",
rules: [
{ type: "factualOnly", enabled: true },
{ type: "noPII", enabled: true },
],
});
// Apply guardrails to RAG pipeline
const validatedQuery = await validateInput(inputGuardrail, query);
const response = await pipeline.query(validatedQuery);
const validatedResponse = await validateOutput(
outputGuardrail,
response.answer,
);
Custom Chunker Registration
Extend the chunker registry with custom implementations:
import { ChunkerRegistry } from "@juspay/neurolink";
import type { Chunker, ChunkerConfig } from "@juspay/neurolink";
// Define custom chunker
class CustomChunker implements Chunker {
constructor(private config?: ChunkerConfig) {}
async chunk(text: string, options?: ChunkerConfig) {
// Custom chunking logic
const maxSize = options?.maxSize ?? this.config?.maxSize ?? 500;
// ... implementation
}
}
// Register with the registry
ChunkerRegistry.register("custom", CustomChunker, {
name: "Custom Chunker",
description: "My custom chunking strategy",
aliases: ["my-chunker"],
defaultConfig: { maxSize: 500 },
});
// Now use it
const chunker = await createChunker("custom", { maxSize: 800 });
Graph RAG
Use knowledge graphs for relationship-aware retrieval:
import { GraphRAG } from "@juspay/neurolink";
// Create graph with similarity threshold for edge creation
const graphRag = new GraphRAG({
dimension: 1536, // Embedding dimension
threshold: 0.7, // Similarity threshold for creating edges
});
// Build graph from chunks and their embeddings
const chunks = [
{ text: "Machine learning basics", metadata: { topic: "ml" } },
{ text: "Neural networks", metadata: { topic: "dl" } },
];
const embeddings = [
{ vector: [0.1, 0.2 /* ... */] },
{ vector: [0.15, 0.25 /* ... */] },
];
graphRag.createGraph(chunks, embeddings);
// Or add nodes incrementally
const nodeId = graphRag.addNode(
{ text: "Deep learning", metadata: { topic: "dl" } },
{ vector: [0.12, 0.22 /* ... */] },
);
// Query with embedding vector using random walk with restart
const results = graphRag.query({
query: queryEmbedding, // Query embedding vector
topK: 10,
randomWalkSteps: 100,
restartProb: 0.15,
});
// Get graph statistics
const stats = graphRag.getStats();
// { nodeCount: 3, edgeCount: 4, avgDegree: 1.33, threshold: 0.7 }
Resilience Patterns
Use circuit breakers and retry handlers for production reliability:
import { RAGCircuitBreaker, RAGRetryHandler } from "@juspay/neurolink";
// Circuit breaker for external API calls
const breaker = new RAGCircuitBreaker("reranker-api", {
failureThreshold: 5,
resetTimeout: 60000,
halfOpenMaxCalls: 3,
operationTimeout: 30000,
});
// Wrap reranker calls
const result = await breaker.execute(async () => {
return await cohereReranker.rerank(results, query);
}, "rerank");
// Listen to circuit breaker events
breaker.on("stateChange", ({ oldState, newState, reason }) => {
console.log(`Circuit breaker: ${oldState} -> ${newState} (${reason})`);
});
// Retry handler with exponential backoff
const retryHandler = new RAGRetryHandler({
maxRetries: 3,
initialDelay: 1000,
maxDelay: 30000,
backoffMultiplier: 2,
jitter: true,
});
const chunks = await retryHandler.executeWithRetry(async () => {
return await chunker.chunk(largeDocument);
});
CLI Usage
NeuroLink CLI provides commands for RAG operations.
Document Processing
# Chunk a document
neurolink rag chunk ./document.md --strategy markdown --max-size 1000 --overlap 100
# Chunk with output to file
neurolink rag chunk ./document.md -s recursive --format json --output chunks.json
# Process multiple documents (use shell loop)
for file in ./docs/*.md; do neurolink rag chunk "$file" --strategy markdown --format json; done
Index Management
# Build an index from a document
neurolink rag index ./docs/guide.md --indexName my-docs --provider vertex --model gemini-2.5-flash
# Query an existing index
neurolink rag query "What are the main features?" --indexName my-docs --topK 5 --provider vertex --model gemini-2.5-flash
# Index with Graph RAG enabled
neurolink rag index ./docs/guide.md --indexName my-docs --graph --provider vertex --model gemini-2.5-flash
Simplified RAG API (rag: { files })
Since: v9.2.0 | Recommended for most use cases
Instead of manually creating chunkers, vector stores, and tools, pass rag: { files } directly to generate() or stream(). NeuroLink handles the entire pipeline automatically.
SDK Usage
import { NeuroLink } from "@juspay/neurolink";
const neurolink = new NeuroLink();
// Generate with RAG - just pass files
const result = await neurolink.generate({
input: { text: "What are the key features described in the docs?" },
rag: {
files: ["./docs/guide.md", "./docs/api.md"],
strategy: "markdown", // Optional: auto-detected from file extension
chunkSize: 512, // Optional: default 1000
chunkOverlap: 50, // Optional: default 200
topK: 5, // Optional: default 5
},
});
// Stream with RAG - identical API
const streamResult = await neurolink.stream({
input: { text: "Summarize the architecture" },
rag: { files: ["./docs/architecture.md"] },
});
for await (const chunk of streamResult.stream) {
if ("content" in chunk) {
process.stdout.write(chunk.content);
}
}
CLI Usage
# Basic RAG with generate
neurolink generate "What is this about?" --rag-files ./docs/guide.md
# RAG with custom chunking strategy
neurolink generate "Explain the API" --rag-files ./docs/guide.md --rag-strategy markdown --rag-chunk-size 512
# RAG with streaming and multiple files
neurolink stream "Summarize everything" --rag-files ./docs/a.md ./docs/b.md --rag-top-k 10
CLI Flags Reference
| Flag | Type | Default | Description |
|---|---|---|---|
--rag-files | string[] | - | File paths to load for RAG context |
--rag-strategy | string | auto-detected | Chunking strategy (character, recursive, sentence, token, markdown, html, json, latex, semantic, semantic-markdown) |
--rag-chunk-size | number | 1000 | Maximum chunk size in characters |
--rag-chunk-overlap | number | 200 | Overlap between adjacent chunks |
--rag-top-k | number | 5 | Number of top results to retrieve |
RAGConfig Type
type RAGConfig = {
files: string[]; // Required: file paths to load
strategy?: ChunkingStrategy; // Default: auto-detected from file extension
chunkSize?: number; // Default: 1000
chunkOverlap?: number; // Default: 200
topK?: number; // Default: 5
toolName?: string; // Default: "search_knowledge_base"
toolDescription?: string; // Custom tool description
embeddingProvider?: string; // Defaults to generation provider
embeddingModel?: string; // Defaults to provider's default
};
How It Works
- Files are loaded from disk and auto-detected for chunking strategy (
.md-> markdown,.html-> html,.json-> json, etc.) - Content is chunked using the selected strategy with configurable size and overlap
- Chunks are embedded using a simple character-frequency hash (128 dimensions) and stored in an in-memory vector store
- A
search_knowledge_basetool is created and injected into the AI model's available tools - A system prompt instructs the AI to use the search tool before answering
- The AI autonomously decides when to search the knowledge base during generation/streaming
Auto-Detected Strategies by Extension
| Extension | Strategy |
|---|---|
.md, .mdx | markdown |
.html, .htm | html |
.json | json |
.tex, .latex | latex |
.txt, .csv, .xml, .yaml, .yml | recursive |
.ts, .js, .py, .java, .go, .rs, .c, .cpp, .rb, .php, .swift, .kt | recursive |
Best Practices
Chunking
- Match chunk size to model context - Use token chunker when optimizing for specific LLM context windows
- Choose strategy by content type - Markdown for docs, HTML for web content, JSON for structured data
- Use 10-20% overlap - Prevents context loss at chunk boundaries
- Preserve structure when possible - Format-aware chunkers maintain semantic coherence
- Test with your data - Optimal settings vary by domain and use case
Reranking
- Start with simple reranker - Fast, free, and often sufficient for basic use cases
- Use LLM reranking for quality - When accuracy matters more than latency
- Batch large result sets - Use batch reranker for 50+ results
- Consider cost - API-based rerankers (Cohere) have per-call costs
- Cache reranking results - Results for the same query/docs can be reused
Hybrid Search
- Start with RRF - Robust to score scale differences, less tuning needed
- Tune alpha for linear fusion - Start at 0.5, adjust based on evaluation
- Keep indices in sync - Update both BM25 and vector indices together
- Filter early - Apply metadata filters before fusion when possible
- Monitor retrieval quality - Track precision/recall metrics in production
Troubleshooting
| Problem | Solution |
|---|---|
| Empty chunks returned | Check if maxSize is too small for your content; try increasing to 500+ |
| Duplicate content in chunks | Reduce overlap parameter or use a structure-aware chunker |
| Missing context at boundaries | Increase overlap to 15-20% of maxSize |
| Slow reranking performance | Switch to simple reranker or reduce topK before reranking |
| Poor search quality | Tune BM25 parameters (k1, b) or adjust fusion alpha weight |
| Out of memory with large docs | Process documents in batches; use streaming where available |
| Reranker API timeouts | Use CircuitBreaker wrapper; reduce batch size |
| Inconsistent chunk metadata | Ensure documentId is set consistently across processing runs |
Debug Logging
# Enable verbose logging for RAG operations
DEBUG=neurolink:rag:* npx tsx your-script.ts
# Log specific components
DEBUG=neurolink:rag:chunker npx tsx your-script.ts
DEBUG=neurolink:rag:reranker npx tsx your-script.ts
DEBUG=neurolink:rag:hybrid npx tsx your-script.ts
API Reference
Core Exports
Document Processing:
loadDocument(path)- Load a single documentloadDocuments(paths)- Load multiple documentsMDocument- Fluent document processing classprocessDocument(text, options)- Process text through chunking and metadata extraction
Chunking:
createChunker(strategy, config)- Create a chunker instanceChunkerFactory- Factory for chunker creationChunkerRegistry- Registry with all chunker implementationsgetAvailableStrategies()- List available chunking strategiesgetRecommendedStrategy(contentType)- Get recommended strategy for content type
Reranking:
createReranker(type, config)- Create a reranker instanceRerankerFactory- Factory for reranker creationRerankerRegistry- Registry with all reranker implementationsgetAvailableRerankerTypes()- List available reranker typesrerank(results, query, model)- Direct reranking functionbatchRerank(results, query, options)- Batch reranking
Retrieval:
createHybridSearch(config)- Create hybrid search instanceInMemoryBM25Index- In-memory BM25 indexInMemoryVectorStore- In-memory vector storereciprocalRankFusion(rankings, k)- RRF score fusionlinearCombination(vectorScores, bm25Scores, alpha)- Linear score fusioncreateVectorQueryTool(vectorStore, options)- Create vector query tool
Metadata:
createMetadataExtractor(type, config)- Create metadata extractorLLMMetadataExtractor- LLM-powered extractor classextractMetadata(chunks, params)- Extract metadata from chunks
Pipeline:
RAGPipeline- Full RAG pipeline classcreateRAGPipeline(config)- Create pipeline instanceassembleContext(chunks, options)- Assemble context from chunksformatContextWithCitations(chunks, format)- Format with citations
Resilience:
RAGCircuitBreaker- Circuit breaker pattern for RAG operationsRAGRetryHandler- Retry with exponential backoff and jitter
Types:
Chunk,ChunkMetadata,ChunkerConfigReranker,RerankerConfig,RerankerTypeHybridSearchOptions,BM25ConfigRAGPipelineConfig,RAGResponseMetadataExtractor,MetadataExtractorConfig
See Also
- RAG Configuration Guide - Detailed configuration reference
- RAG Testing Guide - Testing RAG pipelines
- Observability Guide - Tracing and monitoring
- Guardrails Guide - Input/output validation
- Vector Store Integrations - Production vector stores