RAG Document Processing Guide

Since: v8.44.0 | Status: Stable | Availability: SDK + CLI

Provider Defaults: When --provider (CLI) or provider (SDK) is not specified, NeuroLink defaults to Vertex AI with gemini-2.5-flash (see src/lib/rag/ragIntegration.ts). Set the NEUROLINK_PROVIDER or AI_PROVIDER environment variable to change the default provider, or pass an explicit embeddingModel / generationModel config.

Overview

NeuroLink provides enterprise-grade RAG (Retrieval-Augmented Generation) capabilities for building production AI applications:

10 Chunking Strategies: Character, recursive, sentence, token, markdown, HTML, JSON, LaTeX, semantic, and semantic-markdown chunking for any content type
Hybrid Search: Combine BM25 keyword search with vector embeddings using RRF or linear fusion
Multi-Factor Reranking: LLM, cross-encoder, Cohere API, and simple position-based reranking options
Factory + Registry Patterns: Extensible architecture with lazy loading, aliases, and full TypeScript support
Resilience Built-In: Circuit breakers, retry handlers, and comprehensive error handling

Quick Start

Basic Document Processing

import { loadDocument, createChunker, createReranker } from "@juspay/neurolink";

// Load and chunk a document
const doc = await loadDocument("/path/to/document.md");
const chunker = await createChunker("markdown", {
  maxSize: 1000,
  overlap: 100,
});
const chunks = await chunker.chunk(doc.content);

// Each chunk includes metadata
console.log(chunks[0]);
// {
//   id: "chunk-abc123",
//   text: "## Introduction\n\nThis document covers...",
//   metadata: {
//     documentId: "doc-xyz",
//     chunkIndex: 0,
//     startOffset: 0,
//     endOffset: 847
//   }
// }

Full RAG Pipeline

import { RAGPipeline, createRAGPipeline } from "@juspay/neurolink";

const pipeline = new RAGPipeline({
  embeddingModel: { provider: "vertex", modelName: "gemini-3-flash-preview" },
  generationModel: { provider: "vertex", modelName: "gemini-3-flash-preview" },
});

// Ingest documents
await pipeline.ingest(["./docs/*.md", "./knowledge/**/*.txt"]);

// Query with automatic retrieval and generation
const response = await pipeline.query("What are the key features?");
console.log(response.answer);
console.log(response.sources); // Retrieved chunks with citations

Integration with generate() and stream()

The RAG system integrates seamlessly with NeuroLink's generate() and stream() APIs through the createVectorQueryTool. This allows AI models to automatically query your knowledge base during generation.

Using RAG with generate()

import {
  NeuroLink,
  createVectorQueryTool,
  InMemoryVectorStore,
} from "@juspay/neurolink";

// 1. Set up vector store with your data
const vectorStore = new InMemoryVectorStore();
await vectorStore.upsert("knowledge-base", [
  {
    id: "doc1",
    vector: embedding1,
    metadata: { text: "Your document content..." },
  },
  // ... more documents
]);

// 2. Create the RAG tool
const ragTool = createVectorQueryTool(
  {
    id: "knowledge-search",
    description: "Search the knowledge base for relevant information",
    indexName: "knowledge-base",
    embeddingModel: { provider: "vertex", modelName: "gemini-3-flash-preview" },
    topK: 5,
    reranker: {
      model: { provider: "vertex", modelName: "gemini-3-flash-preview" },
      topK: 3,
    },
  },
  vectorStore,
);

// 3. Use with generate()
const neurolink = new NeuroLink();
const result = await neurolink.generate({
  input: { text: "What are the key features of our product?" },
  tools: { [ragTool.name]: ragTool },
  provider: "vertex",
  model: "gemini-3-flash-preview",
});

console.log(result.content);
console.log(result.toolExecutions); // See RAG tool results

Using RAG with stream()

// Same setup as above, then:
const result = await neurolink.stream({
  input: { text: "Explain our pricing model in detail" },
  tools: { [ragTool.name]: ragTool },
  provider: "vertex",
  model: "gemini-3-flash-preview",
});

for await (const chunk of result.stream) {
  if ("content" in chunk) {
    process.stdout.write(chunk.content);
  }
}

Complete RAG Pipeline Example

This example demonstrates a full RAG pipeline from document loading to AI-powered retrieval:

import {
  NeuroLink,
  createVectorQueryTool,
  InMemoryVectorStore,
} from "@juspay/neurolink";
import {
  loadDocument,
  createChunker,
  createMetadataExtractor,
} from "@juspay/neurolink";

// Step 1: Load and chunk documents
const doc = await loadDocument("./docs/product-guide.md");
const chunker = await createChunker("markdown", {
  maxSize: 1000,
  overlap: 100,
  preserveHeaders: true,
});
const chunks = await chunker.chunk(doc.content);

// Step 2: Extract metadata for better retrieval (optional)
const extractor = await createMetadataExtractor("llm", {
  provider: "vertex",
  modelName: "gemini-3-flash-preview",
});
const enrichedChunks = await extractor.extract(chunks, {
  summary: true,
  keywords: true,
});

// Step 3: Generate embeddings using the NeuroLink provider
const neurolink = new NeuroLink();

// Helper function to generate embeddings
async function generateEmbeddings(texts: string[]): Promise<number[][]> {
  const embeddings: number[][] = [];
  for (const text of texts) {
    const result = await neurolink.generate({
      input: { text },
      provider: "vertex",
      model: "gemini-3-flash-preview",
    });
    // Extract embedding from result (provider-specific)
    embeddings.push(result.embedding || []);
  }
  return embeddings;
}

const embeddings = await generateEmbeddings(enrichedChunks.map((c) => c.text));

// Step 4: Store in vector store
const vectorStore = new InMemoryVectorStore();
await vectorStore.upsert(
  "product-docs",
  enrichedChunks.map((chunk, i) => ({
    id: chunk.id,
    vector: embeddings[i],
    metadata: {
      text: chunk.text,
      summary: chunk.metadata.summary,
      keywords: chunk.metadata.keywords,
      source: "product-guide.md",
    },
  })),
);

// Step 5: Create RAG tool
const ragTool = createVectorQueryTool(
  {
    id: "product-search",
    description: "Search product documentation for answers to user questions",
    indexName: "product-docs",
    embeddingModel: { provider: "vertex", modelName: "gemini-3-flash-preview" },
    topK: 5,
    includeSources: true,
    reranker: {
      model: { provider: "vertex", modelName: "gemini-3-flash-preview" },
      topK: 3,
      weights: { semantic: 0.6, vector: 0.3, position: 0.1 },
    },
  },
  vectorStore,
);

// Step 6: Use with generate()
const response = await neurolink.generate({
  input: { text: "How do I configure the billing settings?" },
  tools: { [ragTool.name]: ragTool },
  provider: "vertex",
  model: "gemini-3-flash-preview",
  systemPrompt: `You are a helpful product assistant. Use the product-search tool
    to find relevant information before answering questions. Always cite your sources.`,
});

console.log("Answer:", response.content);
console.log(
  "Sources used:",
  response.toolExecutions?.map((t) => t.result?.sources),
);

Configuration Options for createVectorQueryTool

Option	Type	Default	Description
`id`	`string`	`vector-query-{uuid}`	Unique identifier for the tool
`description`	`string`	Default description	Description shown to AI for tool selection
`indexName`	`string`	Required	Name of the index in the vector store
`embeddingModel`	`{ provider: string, modelName: string }`	Required	Embedding model configuration
`enableFilter`	`boolean`	`false`	Enable metadata filtering in queries
`includeVectors`	`boolean`	`false`	Include raw vectors in results
`includeSources`	`boolean`	`true`	Include source documents in response
`topK`	`number`	`10`	Number of results to retrieve
`reranker`	`RerankerConfig`	`undefined`	Optional reranker configuration
`providerOptions`	`VectorProviderOptions`	`undefined`	Provider-specific options (Pinecone, pgVector, Chroma)

Reranker Configuration

Option	Type	Default	Description
`model`	`{ provider: string, modelName: string }`	Required	Model for semantic reranking
`weights`	`{ semantic?: number, vector?: number, position?: number }`	`{ semantic: 0.5, vector: 0.3, position: 0.2 }`	Score weights (must sum to 1.0)
`topK`	`number`	Same as tool `topK`	Results to return after reranking

Event Handling

Listen for tool events during RAG operations to monitor and debug:

const neurolink = new NeuroLink();

// Listen for tool execution events
neurolink.on("tool:start", (event) => {
  console.log(`Tool started: ${event.toolName}`);
  console.log(`Parameters:`, event.parameters);
});

neurolink.on("tool:end", (event) => {
  console.log(`Tool completed: ${event.toolName}`);
  console.log(`Success: ${event.success}`);
  console.log(`Response time: ${event.responseTime}ms`);
  if (event.result) {
    console.log(`Results found: ${event.result.totalResults}`);
  }
  if (event.error) {
    console.error(`Error:`, event.error.message);
  }
});

// Listen for generation events
neurolink.on("generation:start", (event) => {
  console.log(`Generation started with provider: ${event.provider}`);
});

neurolink.on("generation:end", (event) => {
  console.log(`Generation completed in ${event.responseTime}ms`);
  console.log(`Tools used: ${event.toolsUsed?.join(", ") || "none"}`);
});

// Execute RAG query with event monitoring
const result = await neurolink.generate({
  input: { text: "What are the system requirements?" },
  tools: { [ragTool.name]: ragTool },
  provider: "vertex",
  model: "gemini-3-flash-preview",
});

Dynamic Vector Store Resolution

For multi-tenant applications, you can provide a resolver function instead of a static vector store:

const ragTool = createVectorQueryTool(
  {
    id: "tenant-search",
    description: "Search tenant-specific knowledge base",
    indexName: "documents",
    embeddingModel: { provider: "vertex", modelName: "gemini-3-flash-preview" },
    topK: 5,
  },
  (context) => {
    // Return different vector stores based on request context
    const tenantId = context.tenantId || "default";
    return getVectorStoreForTenant(tenantId);
  },
);

// The context is passed from generate options
const result = await neurolink.generate({
  input: { text: "Search query" },
  tools: { [ragTool.name]: ragTool },
  context: { tenantId: "tenant-123", userId: "user-456" },
});

Metadata Filtering

Enable metadata filtering for more precise retrieval:

const ragTool = createVectorQueryTool(
  {
    id: "filtered-search",
    description: "Search with metadata filters",
    indexName: "knowledge-base",
    embeddingModel: { provider: "vertex", modelName: "gemini-3-flash-preview" },
    enableFilter: true, // Enable filter parameter
    topK: 10,
  },
  vectorStore,
);

// The AI can now use filters in its queries
// Example filter syntax supported:
// { category: 'billing' }                    - Exact match
// { date: { $gte: '2024-01-01' } }          - Comparison operators
// { tags: { $in: ['feature', 'guide'] } }   - Array membership
// { $and: [{ type: 'doc' }, { status: 'published' }] } - Logical operators

Chunking Strategies

NeuroLink provides 10 chunking strategies optimized for different content types.

Available Strategies

Strategy	Best For	Key Config
`character`	Simple text, logs	`maxSize`, `separator`
`recursive`	General documents (default)	`maxSize`, `overlap`, `separators`
`sentence`	Natural language, Q&A	`maxSize`, `minSentences`
`token`	LLM context optimization	`maxSize` (tokens), `tokenizer`
`markdown`	Documentation, READMEs	`preserveHeaders`, `codeBlockHandling`
`html`	Web content	`preserveTags`, `removeTags`
`json`	API responses, config	`preserveStructure`, `flattenDepth`
`latex`	Academic papers	`sectionCommands`, `preserveMath`
`semantic`	Context-aware splitting	`similarityThreshold`, `embedder`
`semantic-markdown`	Knowledge bases	`semanticThreshold`, `embedder`

Strategy Configuration

import { createChunker, getAvailableStrategies } from "@juspay/neurolink";

// List all available strategies
const strategies = getAvailableStrategies();
// ['character', 'recursive', 'sentence', 'token', 'markdown', 'html', 'json', 'latex', 'semantic', 'semantic-markdown']

// Recursive chunker (recommended for general use)
const recursiveChunker = await createChunker("recursive", {
  maxSize: 1000,
  overlap: 200,
  separators: ["\n\n", "\n", ". ", " ", ""],
  keepSeparator: true,
});

// Markdown chunker (for documentation)
const markdownChunker = await createChunker("markdown", {
  maxSize: 1000,
  overlap: 100,
  preserveHeaders: true,
  codeBlockHandling: "preserve", // 'preserve' | 'split' | 'remove'
});

// Token chunker (for LLM optimization)
const tokenChunker = await createChunker("token", {
  maxSize: 512, // Max tokens per chunk
  overlap: 50, // Token overlap
  tokenizer: "cl100k_base", // OpenAI tokenizer
});

Content-Type Recommendations

import { getRecommendedStrategy } from "@juspay/neurolink";

// Get strategy based on content type
getRecommendedStrategy("text/markdown"); // 'markdown'
getRecommendedStrategy("text/html"); // 'html'
getRecommendedStrategy("application/json"); // 'json'
getRecommendedStrategy("text/x-latex"); // 'latex'
getRecommendedStrategy("text/plain"); // 'recursive'

Hybrid Search

Hybrid search combines BM25 keyword matching with vector similarity for improved retrieval quality.

How It Works

BM25 Search: Traditional keyword matching using term frequency and document length normalization
Vector Search: Semantic similarity using embeddings
Score Fusion: Combine rankings using RRF or linear combination

Fusion Methods

Reciprocal Rank Fusion (RRF)

RRF is robust to score scale differences and works well in most cases:

import { reciprocalRankFusion } from "@juspay/neurolink";

// Combine rankings from multiple sources
const fusedScores = reciprocalRankFusion(
  [vectorRankings, bm25Rankings],
  60, // k parameter (default: 60)
);

// RRF formula: score(d) = sum(1 / (k + rank(d)))

Linear Combination

Linear combination allows fine-tuning the balance between vector and keyword scores:

import { linearCombination } from "@juspay/neurolink";

const combinedScores = linearCombination(
  vectorScores, // Map<string, number>
  bm25Scores, // Map<string, number>
  0.5, // alpha: weight for vector scores (0-1)
);

// Linear formula: score(d) = alpha * vectorScore(d) + (1 - alpha) * bm25Score(d)

Hybrid Search Pipeline

import {
  createHybridSearch,
  InMemoryBM25Index,
  InMemoryVectorStore,
} from "@juspay/neurolink";

// Create indices
const bm25Index = new InMemoryBM25Index({ k1: 1.2, b: 0.75 });
const vectorStore = new InMemoryVectorStore();

// Add documents to both indices
const documents = [
  {
    id: "doc1",
    text: "Machine learning fundamentals...",
    metadata: { topic: "ml" },
  },
  {
    id: "doc2",
    text: "Deep learning architectures...",
    metadata: { topic: "dl" },
  },
];

await bm25Index.addDocuments(documents);
await vectorStore.addDocuments(documents);

// Create hybrid search
const hybridSearch = createHybridSearch({
  bm25Index,
  vectorStore,
  fusionMethod: "rrf", // 'rrf' | 'linear'
  alpha: 0.5, // Vector weight (for linear fusion)
  k: 60, // RRF parameter
});

// Execute search
const results = await hybridSearch.search("neural network training", {
  topK: 10,
  filter: { topic: "ml" },
});

BM25 Configuration

type BM25Config = {
  k1: number; // Term frequency saturation (default: 1.2)
  b: number; // Document length normalization (default: 0.75)
  lowercase: boolean; // Normalize to lowercase (default: true)
  stemming: boolean; // Apply stemming (default: false)
  stopwords: string[]; // Words to ignore (default: English stopwords)
};

Reranking

Reranking re-scores initial search results for improved relevance.

Available Reranker Types

Type	Description	Requires Model	Best For
`simple`	Position + vector score combination	No	Fast, cost-free baseline
`llm`	LLM semantic relevance scoring	Yes	High-quality semantic
`cross-encoder`	Cross-encoder model scoring	Yes	Accuracy-focused tasks
`cohere`	Cohere Rerank API	API Key	Production-grade results
`batch`	Batch LLM processing	Yes	Large result sets

Reranker Configuration

import { createReranker, getAvailableRerankerTypes } from "@juspay/neurolink";

// List available types
const types = getAvailableRerankerTypes();
// ['simple', 'llm', 'cross-encoder', 'cohere', 'batch']

// Simple reranker (no model required)
const simpleReranker = await createReranker("simple", {
  topK: 10,
  positionWeight: 0.3,
  scoreWeight: 0.7,
});

// LLM reranker (requires model)
const llmReranker = await createReranker("llm", {
  topK: 5,
  model: "gemini-3-flash-preview",
  temperature: 0.0,
  batchSize: 5,
});

// Cohere reranker (requires API key)
const cohereReranker = await createReranker("cohere", {
  topK: 10,
  model: "rerank-v3.5",
  maxChunksPerDoc: 10,
});

// Rerank results
const reranked = await simpleReranker.rerank(searchResults, query, { topK: 5 });

Batch Reranking for Large Sets

import { batchRerank } from "@juspay/neurolink";

// Process large result sets efficiently
const reranked = await batchRerank(searchResults, query, {
  batchSize: 10,
  parallelBatches: 3,
  model: "gemini-3-flash-preview",
  topK: 20,
});

Metadata Extraction

Extract structured metadata from chunks using LLMs.

Extraction Types

Type	Description	Output
`title`	Document/section title	`string`
`summary`	Brief content summary	`string`
`keywords`	Relevant keywords	`string[]`
`questions`	Q&A pairs for the content	`{question, answer}[]`
`custom`	Custom schema extraction	`Record<string, unknown>`

Usage

import {
  createMetadataExtractor,
  extractMetadata,
  LLMMetadataExtractor,
} from "@juspay/neurolink";

// Using factory
const extractor = await createMetadataExtractor("llm", {
  provider: "vertex",
  modelName: "gemini-3-flash-preview",
});

// Extract metadata from chunks
const results = await extractor.extract(chunks, {
  title: true,
  summary: true,
  keywords: true,
  questions: { maxQuestions: 3 },
});

// Results include extracted metadata per chunk
console.log(results[0]);
// {
//   title: "Introduction to Machine Learning",
//   summary: "This section covers the fundamentals...",
//   keywords: ["machine learning", "supervised learning", "classification"],
//   questions: [
//     { question: "What is supervised learning?", answer: "..." }
//   ]
// }

Configuration Reference

Chunker Configuration

Option	Type	Default	Description
`maxSize`	`number`	`1000`	Maximum chunk size (chars/tokens)
`overlap`	`number`	`200`	Overlap between chunks
`minSize`	`number`	`50`	Minimum chunk size
`documentId`	`string`	auto-UUID	Document identifier for metadata
`metadata`	`Record<string, unknown>`	`{}`	Additional metadata for all chunks

Reranker Configuration

Option	Type	Default	Description
`topK`	`number`	`10`	Number of top results to return
`minScore`	`number`	`0.0`	Minimum score threshold
`includeOriginalScores`	`boolean`	`false`	Include original scores

Hybrid Search Configuration

Option	Type	Default	Description
`fusionMethod`	`'rrf' \| 'linear'`	`'rrf'`	Score fusion method
`alpha`	`number`	`0.5`	Vector weight (linear only)
`k`	`number`	`60`	RRF k parameter
`topK`	`number`	`10`	Results to return

Environment Variables

Variable	Description	Required
`GOOGLE_APPLICATION_CREDENTIALS`	For Vertex AI (service account JSON path)	Yes
`OPENAI_API_KEY`	For OpenAI provider	Optional
`COHERE_API_KEY`	For Cohere reranker	Optional
`ANTHROPIC_API_KEY`	For Claude-based reranking	Optional

Advanced Usage

Integration with Observability

Track RAG operations with Langfuse for debugging and optimization:

import { setLangfuseContext } from "@juspay/neurolink";
import { RAGPipeline } from "@juspay/neurolink";

const pipeline = new RAGPipeline(config);

await setLangfuseContext(
  {
    userId: "user-123",
    sessionId: "session-456",
    operationName: "rag-query",
    metadata: {
      pipeline: "customer-support",
      chunkingStrategy: "markdown",
    },
  },
  async () => {
    const response = await pipeline.query("How do I reset my password?");
    return response;
  },
);

Integration with Guardrails

Validate RAG inputs and outputs with guardrails:

import {
  createGuardrail,
  validateInput,
  validateOutput,
} from "@juspay/neurolink";
import { RAGPipeline } from "@juspay/neurolink";

// Create guardrails for RAG
const inputGuardrail = createGuardrail({
  type: "input",
  rules: [
    { type: "maxLength", value: 1000 },
    { type: "noPersonalInfo", enabled: true },
  ],
});

const outputGuardrail = createGuardrail({
  type: "output",
  rules: [
    { type: "factualOnly", enabled: true },
    { type: "noPII", enabled: true },
  ],
});

// Apply guardrails to RAG pipeline
const validatedQuery = await validateInput(inputGuardrail, query);
const response = await pipeline.query(validatedQuery);
const validatedResponse = await validateOutput(
  outputGuardrail,
  response.answer,
);

Custom Chunker Registration

Extend the chunker registry with custom implementations:

import { ChunkerRegistry } from "@juspay/neurolink";
import type { Chunker, ChunkerConfig } from "@juspay/neurolink";

// Define custom chunker
class CustomChunker implements Chunker {
  constructor(private config?: ChunkerConfig) {}

  async chunk(text: string, options?: ChunkerConfig) {
    // Custom chunking logic
    const maxSize = options?.maxSize ?? this.config?.maxSize ?? 500;
    // ... implementation
  }
}

// Register with the registry
ChunkerRegistry.register("custom", CustomChunker, {
  name: "Custom Chunker",
  description: "My custom chunking strategy",
  aliases: ["my-chunker"],
  defaultConfig: { maxSize: 500 },
});

// Now use it
const chunker = await createChunker("custom", { maxSize: 800 });

Graph RAG

Use knowledge graphs for relationship-aware retrieval:

import { GraphRAG } from "@juspay/neurolink";

// Create graph with similarity threshold for edge creation
const graphRag = new GraphRAG({
  dimension: 1536, // Embedding dimension
  threshold: 0.7, // Similarity threshold for creating edges
});

// Build graph from chunks and their embeddings
const chunks = [
  { text: "Machine learning basics", metadata: { topic: "ml" } },
  { text: "Neural networks", metadata: { topic: "dl" } },
];
const embeddings = [
  { vector: [0.1, 0.2 /* ... */] },
  { vector: [0.15, 0.25 /* ... */] },
];

graphRag.createGraph(chunks, embeddings);

// Or add nodes incrementally
const nodeId = graphRag.addNode(
  { text: "Deep learning", metadata: { topic: "dl" } },
  { vector: [0.12, 0.22 /* ... */] },
);

// Query with embedding vector using random walk with restart
const results = graphRag.query({
  query: queryEmbedding, // Query embedding vector
  topK: 10,
  randomWalkSteps: 100,
  restartProb: 0.15,
});

// Get graph statistics
const stats = graphRag.getStats();
// { nodeCount: 3, edgeCount: 4, avgDegree: 1.33, threshold: 0.7 }

Resilience Patterns

Use circuit breakers and retry handlers for production reliability:

import { RAGCircuitBreaker, RAGRetryHandler } from "@juspay/neurolink";

// Circuit breaker for external API calls
const breaker = new RAGCircuitBreaker("reranker-api", {
  failureThreshold: 5,
  resetTimeout: 60000,
  halfOpenMaxCalls: 3,
  operationTimeout: 30000,
});

// Wrap reranker calls
const result = await breaker.execute(async () => {
  return await cohereReranker.rerank(results, query);
}, "rerank");

// Listen to circuit breaker events
breaker.on("stateChange", ({ oldState, newState, reason }) => {
  console.log(`Circuit breaker: ${oldState} -> ${newState} (${reason})`);
});

// Retry handler with exponential backoff
const retryHandler = new RAGRetryHandler({
  maxRetries: 3,
  initialDelay: 1000,
  maxDelay: 30000,
  backoffMultiplier: 2,
  jitter: true,
});

const chunks = await retryHandler.executeWithRetry(async () => {
  return await chunker.chunk(largeDocument);
});

CLI Usage

NeuroLink CLI provides commands for RAG operations.

Document Processing

# Chunk a document
neurolink rag chunk ./document.md --strategy markdown --max-size 1000 --overlap 100

# Chunk with output to file
neurolink rag chunk ./document.md -s recursive --format json --output chunks.json

# Process multiple documents (use shell loop)
for file in ./docs/*.md; do neurolink rag chunk "$file" --strategy markdown --format json; done

Index Management

# Build an index from a document
neurolink rag index ./docs/guide.md --indexName my-docs --provider vertex --model gemini-3-flash-preview

# Query an existing index
neurolink rag query "What are the main features?" --indexName my-docs --topK 5 --provider vertex --model gemini-3-flash-preview

# Index with Graph RAG enabled
neurolink rag index ./docs/guide.md --indexName my-docs --graph --provider vertex --model gemini-3-flash-preview

Simplified RAG API (`rag: { files }`)

Since: v9.2.0 | Recommended for most use cases

Instead of manually creating chunkers, vector stores, and tools, pass rag: { files } directly to generate() or stream(). NeuroLink handles the entire pipeline automatically.

SDK Usage

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

// Generate with RAG - just pass files
const result = await neurolink.generate({
  input: { text: "What are the key features described in the docs?" },
  rag: {
    files: ["./docs/guide.md", "./docs/api.md"],
    strategy: "markdown", // Optional: auto-detected from file extension
    chunkSize: 512, // Optional: default 1000
    chunkOverlap: 50, // Optional: default 200
    topK: 5, // Optional: default 5
  },
});

// Stream with RAG - identical API
const streamResult = await neurolink.stream({
  input: { text: "Summarize the architecture" },
  rag: { files: ["./docs/architecture.md"] },
});

for await (const chunk of streamResult.stream) {
  if ("content" in chunk) {
    process.stdout.write(chunk.content);
  }
}

CLI Usage

# Basic RAG with generate
neurolink generate "What is this about?" --rag-files ./docs/guide.md

# RAG with custom chunking strategy
neurolink generate "Explain the API" --rag-files ./docs/guide.md --rag-strategy markdown --rag-chunk-size 512

# RAG with streaming and multiple files
neurolink stream "Summarize everything" --rag-files ./docs/a.md ./docs/b.md --rag-top-k 10

CLI Flags Reference

Flag	Type	Default	Description
`--rag-files`	`string[]`	-	File paths to load for RAG context
`--rag-strategy`	`string`	auto-detected	Chunking strategy (character, recursive, sentence, token, markdown, html, json, latex, semantic, semantic-markdown)
`--rag-chunk-size`	`number`	1000	Maximum chunk size in characters
`--rag-chunk-overlap`	`number`	200	Overlap between adjacent chunks
`--rag-top-k`	`number`	5	Number of top results to retrieve

RAGConfig Type

type RAGConfig = {
  files: string[]; // Required: file paths to load
  strategy?: ChunkingStrategy; // Default: auto-detected from file extension
  chunkSize?: number; // Default: 1000
  chunkOverlap?: number; // Default: 200
  topK?: number; // Default: 5
  toolName?: string; // Default: "search_knowledge_base"
  toolDescription?: string; // Custom tool description
  embeddingProvider?: string; // Defaults to generation provider
  embeddingModel?: string; // Defaults to provider's default
};

How It Works

Files are loaded from disk and auto-detected for chunking strategy (.md -> markdown, .html -> html, .json -> json, etc.)
Content is chunked using the selected strategy with configurable size and overlap
Chunks are embedded using a simple character-frequency hash (128 dimensions) and stored in an in-memory vector store
A search_knowledge_base tool is created and injected into the AI model's available tools
A system prompt instructs the AI to use the search tool before answering
The AI autonomously decides when to search the knowledge base during generation/streaming

Auto-Detected Strategies by Extension

Extension	Strategy
`.md`, `.mdx`	markdown
`.html`, `.htm`	html
`.json`	json
`.tex`, `.latex`	latex
`.txt`, `.csv`, `.xml`, `.yaml`, `.yml`	recursive
`.ts`, `.js`, `.py`, `.java`, `.go`, `.rs`, `.c`, `.cpp`, `.rb`, `.php`, `.swift`, `.kt`	recursive

Best Practices

Chunking

Match chunk size to model context - Use token chunker when optimizing for specific LLM context windows
Choose strategy by content type - Markdown for docs, HTML for web content, JSON for structured data
Use 10-20% overlap - Prevents context loss at chunk boundaries
Preserve structure when possible - Format-aware chunkers maintain semantic coherence
Test with your data - Optimal settings vary by domain and use case

Reranking

Start with simple reranker - Fast, free, and often sufficient for basic use cases
Use LLM reranking for quality - When accuracy matters more than latency
Batch large result sets - Use batch reranker for 50+ results
Consider cost - API-based rerankers (Cohere) have per-call costs
Cache reranking results - Results for the same query/docs can be reused

Hybrid Search

Start with RRF - Robust to score scale differences, less tuning needed
Tune alpha for linear fusion - Start at 0.5, adjust based on evaluation
Keep indices in sync - Update both BM25 and vector indices together
Filter early - Apply metadata filters before fusion when possible
Monitor retrieval quality - Track precision/recall metrics in production

Troubleshooting

Problem	Solution
Empty chunks returned	Check if `maxSize` is too small for your content; try increasing to 500+
Duplicate content in chunks	Reduce `overlap` parameter or use a structure-aware chunker
Missing context at boundaries	Increase `overlap` to 15-20% of `maxSize`
Slow reranking performance	Switch to `simple` reranker or reduce `topK` before reranking
Poor search quality	Tune BM25 parameters (`k1`, `b`) or adjust fusion `alpha` weight
Out of memory with large docs	Process documents in batches; use streaming where available
Reranker API timeouts	Use `CircuitBreaker` wrapper; reduce batch size
Inconsistent chunk metadata	Ensure `documentId` is set consistently across processing runs

Debug Logging

# Enable verbose logging for RAG operations
DEBUG=neurolink:rag:* npx tsx your-script.ts

# Log specific components
DEBUG=neurolink:rag:chunker npx tsx your-script.ts
DEBUG=neurolink:rag:reranker npx tsx your-script.ts
DEBUG=neurolink:rag:hybrid npx tsx your-script.ts

API Reference

Core Exports

Document Processing:

loadDocument(path) - Load a single document
loadDocuments(paths) - Load multiple documents
MDocument - Fluent document processing class
processDocument(text, options) - Process text through chunking and metadata extraction

Chunking:

createChunker(strategy, config) - Create a chunker instance
ChunkerFactory - Factory for chunker creation
ChunkerRegistry - Registry with all chunker implementations
getAvailableStrategies() - List available chunking strategies
getRecommendedStrategy(contentType) - Get recommended strategy for content type

Reranking:

createReranker(type, config) - Create a reranker instance
RerankerFactory - Factory for reranker creation
RerankerRegistry - Registry with all reranker implementations
getAvailableRerankerTypes() - List available reranker types
rerank(results, query, model) - Direct reranking function
batchRerank(results, query, options) - Batch reranking

Retrieval:

createHybridSearch(config) - Create hybrid search instance
InMemoryBM25Index - In-memory BM25 index
InMemoryVectorStore - In-memory vector store
reciprocalRankFusion(rankings, k) - RRF score fusion
linearCombination(vectorScores, bm25Scores, alpha) - Linear score fusion
createVectorQueryTool(vectorStore, options) - Create vector query tool

Metadata:

createMetadataExtractor(type, config) - Create metadata extractor
LLMMetadataExtractor - LLM-powered extractor class
extractMetadata(chunks, params) - Extract metadata from chunks

Pipeline:

RAGPipeline - Full RAG pipeline class
createRAGPipeline(config) - Create pipeline instance
assembleContext(chunks, options) - Assemble context from chunks
formatContextWithCitations(chunks, format) - Format with citations

Resilience:

RAGCircuitBreaker - Circuit breaker pattern for RAG operations
RAGRetryHandler - Retry with exponential backoff and jitter

Types:

Chunk, ChunkMetadata, ChunkerConfig
Reranker, RerankerConfig, RerankerType
HybridSearchOptions, BM25Config
RAGPipelineConfig, RAGResponse
MetadataExtractor, MetadataExtractorConfig

Overview​

Quick Start​

Basic Document Processing​

Full RAG Pipeline​

Integration with generate() and stream()​

Using RAG with generate()​

Using RAG with stream()​

Complete RAG Pipeline Example​

Configuration Options for createVectorQueryTool​

Reranker Configuration​

Event Handling​

Dynamic Vector Store Resolution​

Metadata Filtering​

Chunking Strategies​

Available Strategies​

Strategy Configuration​

Content-Type Recommendations​

Hybrid Search​

How It Works​

Fusion Methods​

Reciprocal Rank Fusion (RRF)​

Linear Combination​

Hybrid Search Pipeline​

BM25 Configuration​

Reranking​

Available Reranker Types​

Reranker Configuration​

Batch Reranking for Large Sets​

Metadata Extraction​

Extraction Types​

Usage​

Configuration Reference​

Chunker Configuration​

Reranker Configuration​

Hybrid Search Configuration​

Environment Variables​

Advanced Usage​

Integration with Observability​

Integration with Guardrails​

Custom Chunker Registration​

Graph RAG​

Resilience Patterns​

CLI Usage​

Document Processing​

Index Management​

Simplified RAG API (rag: { files })​

SDK Usage​

CLI Usage​

CLI Flags Reference​

RAGConfig Type​

How It Works​

Auto-Detected Strategies by Extension​

Best Practices​

Chunking​

Reranking​

Hybrid Search​

Troubleshooting​

Debug Logging​

API Reference​

Core Exports​

See Also​

Overview

Quick Start

Basic Document Processing

Full RAG Pipeline

Integration with generate() and stream()

Using RAG with generate()

Using RAG with stream()

Complete RAG Pipeline Example

Configuration Options for createVectorQueryTool

Reranker Configuration

Event Handling

Dynamic Vector Store Resolution

Metadata Filtering

Chunking Strategies

Available Strategies

Strategy Configuration

Content-Type Recommendations

Hybrid Search

How It Works

Fusion Methods

Reciprocal Rank Fusion (RRF)

Linear Combination

Hybrid Search Pipeline

BM25 Configuration

Reranking

Available Reranker Types

Reranker Configuration

Batch Reranking for Large Sets

Metadata Extraction

Extraction Types

Usage

Configuration Reference

Chunker Configuration

Reranker Configuration

Hybrid Search Configuration

Environment Variables

Advanced Usage

Integration with Observability

Integration with Guardrails

Custom Chunker Registration

Graph RAG

Resilience Patterns

CLI Usage

Document Processing

Index Management

Simplified RAG API (`rag: { files }`)

SDK Usage

CLI Usage

CLI Flags Reference

RAGConfig Type

How It Works

Auto-Detected Strategies by Extension

Best Practices

Chunking

Reranking

Hybrid Search

Troubleshooting

Debug Logging

API Reference

Core Exports

See Also