RAG Processing - Configuration Guide
This document provides comprehensive configuration options for the RAG (Retrieval-Augmented Generation) processing system in NeuroLink.
Overview
The RAG processing system consists of three main components:
- Chunkers - Split documents into smaller, processable segments
- Rerankers - Re-score and re-order search results for relevance
- Hybrid Search - Combine BM25 and vector search for improved retrieval
Chunker Configuration
Available Chunking Strategies
| Strategy | Description | Best For |
|---|---|---|
character | Fixed-size character splits | Simple text, logs |
recursive | Paragraph/sentence-aware splits | General documents |
sentence | Sentence boundary splitting | Natural language text |
token | Token-based (GPT tokenizer) | LLM context optimization |
markdown | Header-aware markdown parsing | Documentation, README files |
html | HTML tag-aware splitting | Web content |
json | JSON structure-aware | API responses, config files |
latex | LaTeX section-aware | Academic papers |
semantic-markdown | Semantic markdown with embeddings | Technical documentation |
Common Configuration Options
type ChunkerConfig = {
// Maximum chunk size (characters or tokens)
maxSize: number; // Default: 1000
// Overlap between chunks (characters or tokens)
overlap: number; // Default: 100
// Minimum chunk size (avoid tiny chunks)
minSize?: number; // Default: 10
// Document ID for metadata tracking
documentId?: string; // Default: auto-generated UUID
// Additional metadata to attach to chunks
metadata?: Record<string, unknown>;
// Whether to preserve metadata from source document
preserveMetadata?: boolean; // Default: true
};
Strategy-Specific Configuration
Character Chunker
const config = {
maxSize: 1000, // Max characters per chunk
overlap: 100, // Character overlap between chunks
separator: "", // No separator (split by character count)
};
Recursive Chunker
const config = {
maxSize: 1000,
overlap: 100,
separators: ["\n\n", "\n", ". ", " ", ""], // Priority order
keepSeparators: true, // Keep separators in output chunks
};
Sentence Chunker
const config = {
maxSize: 1000, // Max characters per chunk
overlap: 1, // Overlap in sentences (not characters)
minSentences: 1, // Minimum sentences per chunk
maxSentences: 10, // Maximum sentences per chunk
};
Token Chunker
const config = {
maxSize: 512, // Max tokens per chunk
overlap: 50, // Token overlap
tokenizer: "cl100k_base", // OpenAI tokenizer
};
Markdown Chunker
const config = {
maxSize: 1000,
overlap: 100,
preserveHeaders: true, // Include parent headers in chunks
codeBlockHandling: "preserve", // 'preserve' | 'split' | 'remove'
};
HTML Chunker
const config = {
maxSize: 1000,
overlap: 100,
preserveTags: ["p", "div", "section", "article"],
removeTags: ["script", "style", "nav", "footer"],
extractText: true, // Strip HTML tags from output
};
JSON Chunker
const config = {
maxSize: 500,
preserveStructure: true, // Keep valid JSON in chunks
flattenDepth: 2, // Max nesting depth before flattening
arrayHandling: "split", // 'split' | 'preserve'
};
LaTeX Chunker
const config = {
maxSize: 1000,
overlap: 100,
sectionCommands: ["\\section", "\\subsection", "\\chapter"],
preserveMath: true, // Keep math environments intact
includeComments: false, // Strip LaTeX comments
};
Semantic Markdown Chunker
const config = {
maxSize: 500,
overlap: 100,
semanticThreshold: 0.7, // Similarity threshold for merging
embedder: "openai", // Embedding provider
};
Usage Examples
import { createChunker, getAvailableStrategies } from "@juspay/neurolink";
// List available strategies
const strategies = getAvailableStrategies();
console.log(strategies); // ['character', 'recursive', ...]
// Create a chunker with configuration
const chunker = await createChunker("recursive", {
maxSize: 500,
overlap: 50,
});
// Chunk a document
const chunks = await chunker.chunk(documentText, {
maxSize: 500,
overlap: 50,
});
// Each chunk has structure:
// {
// id: string,
// text: string,
// metadata: {
// documentId: string,
// chunkIndex: number,
// startOffset: number,
// endOffset: number,
// ...customMetadata
// }
// }
Reranker Configuration
Available Reranker Types
| Type | Description | Requires Model | Use Case |
|---|---|---|---|
simple | Position + vector score combo | No | Fast, no-cost reranking |
llm | LLM semantic scoring | Yes | High-quality semantic |
cross-encoder | Cross-encoder model | Yes | Accuracy-focused |
cohere | Cohere Rerank API | Yes (API key) | Production-grade |
batch | Batch LLM reranking | Yes | Large result sets |
Common Configuration Options
type RerankerConfig = {
// Number of top results to return
topK: number; // Default: 10
// Minimum score threshold
minScore?: number; // Default: 0.0
// Include original scores in output
includeOriginalScores?: boolean; // Default: false
};
Type-Specific Configuration
Simple Reranker
const config = {
topK: 10,
positionWeight: 0.3, // Weight for position in results
scoreWeight: 0.7, // Weight for original vector score
};
LLM Reranker
const config = {
topK: 5,
model: "gpt-4",
temperature: 0.0,
prompt: "Rate relevance of this passage to the query (0-1):",
batchSize: 5, // Process in batches
};
Cross-Encoder Reranker
const config = {
topK: 10,
model: "cross-encoder/ms-marco-MiniLM-L-12-v2",
normalize: true, // Normalize scores to 0-1
};
Cohere Reranker
const config = {
topK: 10,
model: "rerank-english-v2.0",
maxChunksPerDoc: 10,
returnDocuments: false,
};
Batch Reranker
const config = {
topK: 20,
batchSize: 10, // Documents per LLM call
parallelBatches: 3, // Concurrent batches
model: "gpt-3.5-turbo",
};
Usage Examples
import { createReranker, getAvailableRerankerTypes } from "@juspay/neurolink";
// List available types
const types = getAvailableRerankerTypes();
console.log(types); // ['simple', 'llm', 'cross-encoder', 'cohere', 'batch']
// Create a simple reranker (no model required)
const reranker = await createReranker("simple", { topK: 5 });
// Rerank search results
const reranked = await reranker.rerank(searchResults, query, { topK: 5 });
// Each result has structure:
// {
// id: string,
// text: string,
// score: number,
// originalScore?: number,
// metadata?: Record<string, unknown>
// }
Hybrid Search Configuration
BM25 Index Configuration
type BM25Config = {
// BM25 parameters
k1: number; // Default: 1.2 (term frequency saturation)
b: number; // Default: 0.75 (document length normalization)
// Preprocessing
lowercase: boolean; // Default: true
stemming: boolean; // Default: false
stopwords: string[]; // Default: English stopwords
};
Fusion Methods
Reciprocal Rank Fusion (RRF)
import { reciprocalRankFusion } from "@juspay/neurolink";
const fusedScores = reciprocalRankFusion(
[vectorRankings, bm25Rankings],
60, // k parameter (default: 60)
);
Linear Combination
import { linearCombination } from "@juspay/neurolink";
const combinedScores = linearCombination(
vectorScores, // Map<string, number>
bm25Scores, // Map<string, number>
0.5, // alpha: weight for vector scores (0-1)
);
Hybrid Search Pipeline
import { createHybridSearch, InMemoryBM25Index } from "@juspay/neurolink";
// Create BM25 index
const bm25Index = new InMemoryBM25Index({ k1: 1.2, b: 0.75 });
// Add documents
await bm25Index.addDocuments([
{ id: "doc1", text: "Document content...", metadata: {} },
// ...
]);
// Create hybrid search
const hybridSearch = createHybridSearch({
bm25Index,
vectorStore, // Your vector store instance
fusionMethod: "rrf", // 'rrf' | 'linear'
alpha: 0.5, // Vector weight (for linear fusion)
k: 60, // RRF parameter
});
// Execute hybrid search
const results = await hybridSearch.search(query, {
topK: 10,
filter: { category: "technical" },
});
Resilience Configuration
The RAG system includes resilience patterns to handle failures gracefully.
Circuit Breaker Configuration
Circuit breakers prevent cascading failures by stopping operations when error rates are too high.
type RAGCircuitBreakerConfig = {
// Number of failures before opening circuit
failureThreshold: number; // Default: 5
// Time in ms before attempting reset
resetTimeout: number; // Default: 60000 (1 minute)
// Max calls allowed in half-open state
halfOpenMaxCalls: number; // Default: 3
// Operation timeout in ms
operationTimeout: number; // Default: 30000 (30 seconds)
// Minimum calls before calculating failure rate
minimumCallsBeforeCalculation: number; // Default: 10
// Time window for statistics in ms
statisticsWindowSize: number; // Default: 300000 (5 minutes)
};
Circuit Breaker Usage
import {
getCircuitBreaker,
executeWithCircuitBreaker,
} from "@juspay/neurolink";
// Create a circuit breaker for vector queries
const breaker = getCircuitBreaker("vector-queries", {
failureThreshold: 3,
resetTimeout: 30000,
});
// Execute operation with circuit breaker protection
const result = await breaker.execute(async () => {
return await vectorStore.query(embedding, { topK: 10 });
}, "vector-query");
// Or use the convenience function
const result = await executeWithCircuitBreaker(
"embedding-service",
() => embeddingProvider.embed(text),
"embedding",
{ failureThreshold: 5 },
);
// Get circuit breaker statistics
const stats = breaker.getStats();
// {
// state: 'closed' | 'open' | 'half-open',
// totalCalls: number,
// failureRate: number,
// averageLatency: number,
// p95Latency: number,
// ...
// }
Retry Handler Configuration
Retry handlers provide automatic retries with exponential backoff for transient failures.
type RAGRetryConfig = {
// Maximum number of retry attempts
maxRetries: number; // Default: 3
// Initial delay in ms
initialDelay: number; // Default: 1000
// Maximum delay in ms
maxDelay: number; // Default: 30000
// Backoff multiplier
backoffMultiplier: number; // Default: 2
// Whether to add jitter
jitter: boolean; // Default: true
// Retryable HTTP status codes
retryableStatusCodes?: number[]; // Default: [408, 429, 500, 502, 503, 504]
};
Retry Handler Usage
import {
withRAGRetry,
RAGRetryHandler,
embeddingRetryHandler,
vectorStoreRetryHandler,
} from "@juspay/neurolink";
// Simple retry wrapper
const result = await withRAGRetry(() => embeddingProvider.embed(text), {
maxRetries: 5,
initialDelay: 2000,
});
// Use specialized retry handlers
const embedding = await embeddingRetryHandler.executeWithRetry(() =>
embeddingProvider.embed(text),
);
const queryResult = await vectorStoreRetryHandler.executeWithRetry(() =>
vectorStore.query(embedding),
);
// Batch operations with retry
const handler = new RAGRetryHandler({ maxRetries: 3 });
const results = await handler.executeBatch(
documents,
async (doc, index) => await processDocument(doc),
{ concurrency: 5, continueOnError: true },
);
// Returns: { successful: [...], failed: [...], successRate: number }
Specialized Retry Handlers
| Handler | maxRetries | initialDelay | Use Case |
|---|---|---|---|
embeddingRetryHandler | 5 | 2000ms | Embedding API rate limits |
vectorStoreRetryHandler | 3 | 1000ms | Vector store operations |
metadataExtractionRetryHandler | 3 | 1500ms | LLM-based metadata extraction |
Metadata Extraction Configuration
The RAG system supports extracting metadata from document chunks using LLMs.
Extractor Types
| Type | Description | Output |
|---|---|---|
title | Extract document title | string |
summary | Generate chunk summary | string |
keywords | Extract relevant keywords | string[] |
questions | Generate Q&A pairs for retrieval | {question, answer}[] |
custom | Custom schema extraction with Zod | Record<string, unknown> |
Base Extractor Configuration
type BaseExtractorConfig = {
// Language model to use
modelName?: string; // e.g., "gpt-4", "claude-3-sonnet"
// Provider for the model
provider?: string; // e.g., "openai", "anthropic"
// Custom prompt template
promptTemplate?: string;
// Maximum tokens for LLM response
maxTokens?: number;
// Temperature for LLM generation
temperature?: number;
};
Title Extractor
const titleConfig = {
modelName: "gpt-4",
nodes: 5, // Number of nodes to analyze
nodeTemplate: "Extract the main topic from: {text}",
combineTemplate: "Combine these topics into a title: {topics}",
};
Summary Extractor
const summaryConfig = {
modelName: "gpt-3.5-turbo",
summaryTypes: ["current", "previous", "next"], // Context-aware summaries
maxWords: 100, // Maximum summary length
};
Keyword Extractor
const keywordConfig = {
modelName: "gpt-3.5-turbo",
maxKeywords: 10, // Maximum keywords to extract
minRelevance: 0.5, // Minimum relevance score (0-1)
};
Question-Answer Extractor
const questionConfig = {
modelName: "gpt-4",
numQuestions: 5, // Number of Q&A pairs
includeAnswers: true, // Include answers in output
embeddingOnly: false, // Generate full questions vs embedding-optimized
};
Usage Example
import { MDocument } from "@juspay/neurolink";
const doc = new MDocument(content, { type: "markdown" });
// Chunk with metadata extraction
const chunks = await doc.chunk({
strategy: "recursive",
config: { maxSize: 1000, overlap: 100 },
extract: {
title: true,
summary: { maxWords: 50 },
keywords: { maxKeywords: 5 },
questions: { numQuestions: 3 },
},
});
// Each chunk now includes extracted metadata:
// {
// id: string,
// text: string,
// metadata: {
// title: "Extracted Title",
// summary: "Brief summary...",
// keywords: ["keyword1", "keyword2"],
// ...
// }
// }
Pipeline Configuration
Full RAG Pipeline
import {
createChunker,
createReranker,
createHybridSearch,
} from "@juspay/neurolink";
// 1. Configure chunker
const chunker = await createChunker("recursive", {
maxSize: 500,
overlap: 50,
});
// 2. Configure reranker
const reranker = await createReranker("simple", {
topK: 5,
});
// 3. Configure hybrid search
const hybridSearch = createHybridSearch({
bm25Index,
vectorStore,
fusionMethod: "rrf",
});
// 4. Process documents
const chunks = await chunker.chunk(document);
// 5. Index chunks (implementation depends on your vector store)
await vectorStore.addDocuments(chunks);
await bm25Index.addDocuments(chunks);
// 6. Search and rerank
const searchResults = await hybridSearch.search(query, { topK: 20 });
const finalResults = await reranker.rerank(searchResults, query, { topK: 5 });
Environment Variables
| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY | For LLM/semantic reranking | Optional |
COHERE_API_KEY | For Cohere reranker | Optional |
ANTHROPIC_API_KEY | For Claude-based reranking | Optional |
Best Practices
Chunking
- Match chunk size to context window - Use token chunker for LLMs
- Choose strategy by content type - Markdown for docs, HTML for web
- Use overlap for continuity - 10-20% overlap prevents context loss
- Preserve structure - Use format-aware chunkers when possible
Reranking
- Start simple - Simple reranker is fast and often sufficient
- Use LLM reranking for quality - When accuracy matters more than speed
- Batch for efficiency - Use batch reranker for large result sets
- Consider cost - API-based rerankers have per-call costs
Hybrid Search
- Balance weights - Start with 0.5 alpha and tune based on results
- RRF is robust - Less sensitive to score scale differences
- Index incrementally - Update both BM25 and vector indices together
- Filter early - Apply metadata filters before fusion when possible
Troubleshooting
Common Issues
- Empty chunks - Check if maxSize is too small for content
- Overlapping content - Reduce overlap parameter
- Missing context - Increase chunk size or overlap
- Slow reranking - Use simple reranker or reduce topK
- Poor search quality - Tune BM25 parameters (k1, b)
Debug Logging
# Enable verbose logging
DEBUG=neurolink:rag:* npx tsx your-script.ts
API Reference
For complete API documentation, see the TypeScript definitions in:
src/lib/rag/types.ts- Core type definitionssrc/lib/rag/ChunkerFactory.ts- Chunker factory APIsrc/lib/rag/reranker/RerankerFactory.ts- Reranker factory APIsrc/lib/rag/retrieval/hybridSearch.ts- Hybrid search API
See Also
- RAG Feature Guide - Main RAG documentation with quick start and overview
- RAG Testing Guide - How to run RAG tests
- RAG API Reference - API documentation