Design Doc: Large Context Handling via Map-Reduce Summarization
Note: The map-reduce approach described in this design document is a proposed architecture that has not been implemented in the codebase. None of the artifacts it specifies (
_summarizeLargeText,largeTextHandlingoption,textUtils.ts) exist in production code. The production implementation uses a different approach — see the Context Compaction System andsrc/lib/context/contextCompactor.ts.
1. Overview
This document outlines the design and implementation plan for adding large context handling capabilities to the NeuroLink SDK. The core of this proposal is a map-reduce summarization strategy to process text inputs that exceed the context window limits of underlying Large Language Models (LLMs).
2. Problem Statement
The NeuroLink SDK's generate() method currently sends the entire input prompt directly to the AI provider. This design fails when the input text is very large (e.g., a 1MB file), as it surpasses the model's maximum token limit, resulting in an API error and a complete failure of the operation.
The existing conversation summarization feature is designed for managing the history of a dialogue and does not address the challenge of processing a single, oversized document.
Use Cases
This feature is critical for enabling new, high-value use cases, such as:
- Document Summarization: Summarizing large PDF, DOCX, or text files.
- Data Analysis: Analyzing long reports, transcripts, or logs to extract key insights.
- Question Answering over Documents: Allowing users to ask questions about a large document that is provided as context.
3. Challenges and Mitigations
3.1. Latency
- Challenge: Making multiple sequential calls to an LLM will significantly increase the total response time.
- Mitigation:
- Parallel Processing: The "Map" step, where individual chunks are summarized, will be executed in parallel using
Promise.all. This reduces the time for this step to the duration of the single longest-running chunk summarization, rather than the sum of all of them. - Model Flexibility: The system will be designed to allow for the use of faster, more cost-effective models (e.g.,
gemini-2.5-flash) for the intermediate chunk summarization, while a more powerful model can be used for the final, high-quality summary.
- Parallel Processing: The "Map" step, where individual chunks are summarized, will be executed in parallel using
3.2. Context Loss Between Chunks
- Challenge: Splitting the text into independent chunks can cause the loss of context that spans across chunk boundaries.
- Mitigation:
- Chunk Overlap: The chunking utility will support an
overlapparameter. A portion of text from the end of one chunk will be included at the beginning of the next, ensuring a smoother contextual transition. - Intelligent Splitting: The utility will prioritize splitting text at natural boundaries like sentences (
.,!,?) or paragraphs to keep related ideas together within a single chunk.
- Chunk Overlap: The chunking utility will support an
3.3. Cost
- Challenge: Multiple LLM calls will be more expensive than a single call.
- Mitigation: This is an inherent trade-off for gaining this new capability. The ability to use smaller, cheaper models for the initial chunking step will help manage costs effectively. The feature will be opt-in, so users only incur costs when they explicitly need to process large documents.
4. Proposed Solution & Architecture
We will implement a Map-Reduce Summarization workflow.
High-Level Flow Diagram
graph TD
A[Start: generate() called with large text] --> B{Text > Threshold?};
B -->|No| C[Normal Generation Flow];
B -->|Yes| D[Chunk Text into Pieces];
D --> E[Map: Summarize Each Chunk in Parallel];
E --> F[Reduce: Combine Chunk Summaries];
F --> G[Generate Final Summary from Combined Text];
G --> H[End: Return Final Summary];
C --> H;
5. Detailed Design and Implementation
5.1. Sequence Diagram
This diagram shows the interaction between the different components of the system.
sequenceDiagram
participant User
participant NeuroLink as NeuroLink.generate()
participant TextUtils as textUtils.chunkText()
participant Summarizer as _summarizeLargeText()
participant LLM
User->>NeuroLink: generate({ input: largeText, mode: 'summarize' })
NeuroLink->>Summarizer: _summarizeLargeText(options)
Summarizer->>TextUtils: chunkText(largeText)
TextUtils-->>Summarizer: returns [chunk1, chunk2, ...]
par Summarize Chunk 1
Summarizer->>LLM: Summarize chunk1
LLM-->>Summarizer: returns summary1
and Summarize Chunk 2
Summarizer->>LLM: Summarize chunk2
LLM-->>Summarizer: returns summary2
and Summarize ...
Summarizer->>LLM: Summarize chunkN
LLM-->>Summarizer: returns summaryN
end
Summarizer->>Summarizer: Combine summaries
Summarizer->>LLM: Generate final summary from combined text
LLM-->>Summarizer: returns finalSummary
Summarizer-->>NeuroLink: returns finalSummary
NeuroLink-->>User: returns finalSummary
5.2. New Utility: textUtils.ts
A new file will be created at src/lib/utils/textUtils.ts to contain the logic for splitting large texts into manageable pieces.
Detailed Explanation of chunkText
This function is the foundation of our solution. It intelligently divides a large string into an array of smaller strings (chunks) based on a target size, while trying to maintain the contextual integrity of the original text.
// src/lib/utils/textUtils.ts
// Defines the structure for a single piece of the divided text.
// `content` holds the text itself.
// `index` tracks the original position of the chunk.
export type TextChunk = {
content: string;
index: number;
};
// Defines the configuration for the chunking process.
// `chunkSize`: The target maximum size for each chunk in characters.
// `overlap`: How many characters from the end of one chunk to include at the start of the next. This is crucial for maintaining context across chunk boundaries.
export type ChunkingOptions = {
chunkSize: number;
overlap: number;
};
export function chunkText(text: string, options: ChunkingOptions): TextChunk[] {
const { chunkSize, overlap } = options;
// Early exit for empty or invalid input.
if (!text || text.length === 0) {
return [];
}
// If the text is already smaller than the desired chunk size, no chunking is needed.
// It's returned as a single chunk.
if (text.length <= chunkSize) {
return [{ content: text, index: 0 }];
}
const chunks: TextChunk[] = [];
let currentIndex = 0; // This pointer tracks our position in the original text.
// Loop through the text until the entire string has been chunked.
while (currentIndex < text.length) {
// Get the remaining part of the text to be processed.
const remainingText = text.substring(currentIndex);
// Determine the maximum possible end point for the current chunk.
let endIndex = Math.min(chunkSize, remainingText.length);
// If the remaining text is small enough to be the final chunk, add it and exit.
if (remainingText.length <= chunkSize) {
chunks.push({ content: remainingText, index: chunks.length });
break;
}
// --- Intelligent Splitting Logic ---
// We try to find the best possible place to split the text to avoid breaking sentences.
let splitPosition = -1;
const potentialSplitArea = remainingText.substring(0, endIndex);
// 1. Prioritize splitting at the end of a sentence.
for (const boundary of [".", "!", "?"]) {
const pos = potentialSplitArea.lastIndexOf(boundary);
if (pos > splitPosition) {
splitPosition = pos;
}
}
// 2. If no sentence ending is found, try to split at a space.
if (splitPosition === -1) {
splitPosition = potentialSplitArea.lastIndexOf(" ");
}
// 3. If no space is found (e.g., a very long word or URL), split at the character limit.
if (splitPosition === -1) {
splitPosition = endIndex - 1;
}
// The actual end of the chunk is one character after the split point.
endIndex = splitPosition + 1;
// Create the chunk from the start of the remaining text to the calculated end point.
const chunkContent = remainingText.substring(0, endIndex);
chunks.push({ content: chunkContent, index: chunks.length });
// Move the main pointer forward for the next iteration.
// We subtract the `overlap` to ensure context is carried over to the next chunk.
currentIndex += Math.max(1, endIndex - overlap);
}
return chunks;
}
5.3. New Workflow: _summarizeLargeText()
This new private method orchestrates the entire map-reduce workflow. It will be added to the NeuroLink class in src/lib/neurolink.ts.
Detailed Explanation of _summarizeLargeText
This function acts as the controller for the large context handling process. It chunks the text, manages the parallel summarization of each chunk, combines the results, and generates the final summary.
// Inside the NeuroLink class in src/lib/neurolink.ts
private async _summarizeLargeText(options: GenerateOptions): Promise<GenerateResult> {
// Destructure all necessary properties from the original options.
const { input, largeTextHandling, provider, model } = options;
const text = input.text;
// --- Step 1: Chunk the Text ---
// The large input text is passed to our utility function to be broken down.
// We use the configuration provided in `largeTextHandling` or fall back to sensible defaults.
const chunks = chunkText(text, {
chunkSize: largeTextHandling?.chunkSize || 4000, // Default to 4000 characters per chunk.
overlap: largeTextHandling?.overlap || 200, // Default to 200 characters of overlap.
});
// --- Step 2: The "Map" Step ---
// We process all chunks concurrently for maximum efficiency.
// `Promise.all` sends all summarization requests to the LLM at the same time.
const chunkSummaries = await Promise.all(
// `chunks.map` creates an array of promises, one for each chunk.
chunks.map(chunk => this.generate({
// Each chunk is wrapped in a new prompt asking for a concise summary.
input: { text: `Summarize the following text concisely: ${chunk.content}` },
// Use a specific, fast model for this intermediate step to reduce latency and cost.
// This can be configured by the user.
provider: largeTextHandling?.chunkingProvider || provider,
model: largeTextHandling?.chunkingModel || 'gemini-2.5-flash',
// CRITICAL: This recursive call to `this.generate` must have large text handling
// disabled to prevent an infinite loop.
largeTextHandling: { mode: 'none' }
}))
);
// --- Step 3: The "Reduce" Step ---
// All the individual chunk summaries are collected and joined together.
// A separator is used to clearly distinguish between the different summaries.
const combinedSummaries = chunkSummaries.map(result => result.content).join('\n\n---\n\n');
// This combined text of summaries is sent to the LLM for the final processing step.
const finalSummaryResult = await this.generate({
input: { text: `The following are summaries of sequential parts of a large document. Create a single, cohesive, and detailed final summary from them:\n\n${combinedSummaries}` },
// For this final step, we use the powerful provider and model the user originally requested
// to ensure the highest quality output.
provider: provider,
model: model,
// Again, disable large text handling to prevent loops.
largeTextHandling: { mode: 'none' }
});
// --- Step 4: Return the Final Result ---
// The result from the final summarization is returned.
// We enrich the metadata to indicate that large text processing was performed
// and include how many chunks were created.
return {
...finalSummaryResult,
metadata: {
...finalSummaryResult.metadata,
largeTextProcessed: true,
chunks: chunks.length,
}
};
}
5.4. Integration into generate()
The main generate() method will be modified to delegate to the new workflow when appropriate.
// Modified generate() method in src/lib/neurolink.ts
async generate(optionsOrPrompt: GenerateOptions | string): Promise<GenerateResult> {
const options: GenerateOptions =
typeof optionsOrPrompt === 'string'
? { input: { text: optionsOrPrompt } }
: optionsOrPrompt;
// New Logic: Check for large text handling
const largeTextConfig = options.largeTextHandling;
const textLength = options.input.text.length;
// Use a default threshold, but allow it to be overridden
const threshold = largeTextConfig?.chunkSize || 4000;
if (largeTextConfig?.mode === 'summarize' && textLength > threshold) {
return this._summarizeLargeText(options);
}
// ... existing generate() logic continues here for normal processing
}
6. Configuration and API Changes
The GenerateOptions interface in src/lib/types/generateTypes.ts will be updated.
// src/lib/types/generateTypes.ts
export type GenerateOptions = {
// ... existing options
largeTextHandling?: {
mode: "none" | "summarize";
chunkSize?: number;
overlap?: number;
chunkingProvider?: AIProviderName;
chunkingModel?: string;
};
};
mode:'none'(default) or'summarize'.chunkSize: Target size for each text chunk (in characters). Defaults to4000.overlap: Character overlap between chunks. Defaults to200.chunkingProvider/chunkingModel: Optional. Allows specifying a faster/cheaper model for the intermediate "Map" step, enhancing performance and cost-effectiveness.
7. Testing Strategy
- Unit Tests (
test/textUtils.test.ts):- Test
chunkTextwith empty, short, and long strings. - Verify that
overlapis handled correctly. - Ensure splitting prioritizes sentence boundaries.
- Test
- Integration Tests (
test/largeContext.test.ts):- Test the main
generate()method with a string larger than thechunkSizethreshold. - Mock the
_summarizeLargeTextmethod to confirm it's called whenmodeis'summarize'. - Mock the internal
generatecalls to verify the map-reduce logic is working as expected (i.e., multiple parallel calls followed by one final call). - Confirm that the normal workflow is used when
modeis'none'.
- Test the main
- End-to-End (E2E) Test (
examples/summarize-large-file.js):- Create a script that reads a large text file from the disk.
- Calls
neurolink.generate()with the file content andlargeTextHandling: { mode: 'summarize' }. - Prints the final summary to the console for manual validation of quality.
Section 8: Production Implementation
The map-reduce design described in this document has been complemented by a production context compaction system. See the Context Compaction Guide for the full specification.
The production implementation adds:
- ContextCompactor (
src/lib/context/contextCompactor.ts) -- a multi-stage compaction orchestrator with four sequential stages: tool-output pruning, file-read deduplication, LLM summarization (structured 9-section summaries with iterative merging), and sliding-window truncation. - BudgetChecker (
src/lib/context/budgetChecker.ts) -- pre-generation validation that checks token usage against per-model context windows (maintained insrc/lib/constants/contextWindows.ts) and triggers auto-compaction at 80 % usage. - Error Detection (
src/lib/context/errorDetection.ts) -- cross-provider detection of context-overflow errors so compaction can be retried transparently. getContextStats()API -- returns live token estimates, remaining capacity, and per-stage reduction metrics for runtime observability.
Distinguishing This Design Doc from the Context Compaction System
These two systems address fundamentally different problems:
| Aspect | This Design Doc (Map-Reduce) | Context Compaction System |
|---|---|---|
| Problem | A single input document exceeds the model's context window before generation even begins. | Conversation history grows beyond the context window over the course of a multi-turn session. |
| Trigger | User opts in via largeTextHandling: { mode: 'summarize' } on a generate() call. | Automatic — BudgetChecker fires before every LLM call when token usage exceeds 80% of the model's context window. |
| Technique | Map-reduce chunking: split the document into overlapping pieces, summarize each piece in parallel, then reduce the summaries into one final output. | A 4-stage pipeline applied to the message history: (1) tool-output pruning, (2) file-read deduplication, (3) LLM summarization with iterative merging, (4) sliding-window truncation. |
| Scope | One-shot — processes the large text and returns a result. | Ongoing — continuously manages history as the conversation evolves. |
| Implementation status | Proposed only — no code exists in the repository. | Fully implemented in src/lib/context/. |
For full details on the production context compaction system, see docs/features/context-compaction.md.