CSV File Support

NeuroLink provides seamless CSV file support as a multimodal input type - attach CSV files directly to your AI prompts for data analysis, insights, and processing.

Overview

CSV support in NeuroLink works just like image support - it's a multimodal input that gets automatically processed and injected into your prompts. The system:

Auto-detects CSV files using FileDetector (magic bytes, MIME types, extensions, content heuristics)
Parses CSV data using streaming parser for memory efficiency
Formats CSV content into LLM-optimized text (markdown/json)
Injects formatted CSV data into your prompt text
Works with ALL AI providers (not limited to vision models)

Quick Start

SDK Usage

import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink();

// Basic CSV analysis
const result = await neurolink.generate({
  input: {
    text: "What are the key trends in this sales data?",
    csvFiles: ["sales-2024.csv"],
  },
});

// Multiple CSV files
const comparison = await neurolink.generate({
  input: {
    text: "Compare Q1 vs Q2 performance and identify growth areas",
    csvFiles: ["q1-sales.csv", "q2-sales.csv"],
  },
});

// Auto-detect file types (mix CSV and images)
const multimodal = await neurolink.generate({
  input: {
    text: "Analyze this data and compare with the chart",
    files: ["data.csv", "chart.png"], // Auto-detects which is CSV vs image
  },
});

// Customize CSV processing
const custom = await neurolink.generate({
  input: {
    text: "Summarize the top 100 customers by revenue",
    csvFiles: ["customers.csv"],
  },
  csvOptions: {
    maxRows: 100, // Limit to first 100 rows
    formatStyle: "markdown", // Use markdown table format
    includeHeaders: true, // Include CSV headers
  },
});

CLI Usage

# Attach CSV files to your prompt
neurolink generate "Analyze this sales data" --csv sales.csv

# Multiple CSV files
neurolink generate "Compare these datasets" --csv q1.csv --csv q2.csv

# Auto-detect file types
neurolink generate "Analyze data and image" --file data.csv --file chart.png

# Customize CSV processing
neurolink generate "Summarize trends" \
  --csv large-dataset.csv \
  --csv-max-rows 500 \
  --csv-format json

# Stream mode also supports CSV
neurolink stream "Explain this data in detail" --csv data.csv

# Batch processing with CSV
echo "Summarize sales data" > prompts.txt
echo "Find top performers" >> prompts.txt
neurolink batch prompts.txt --csv sales.csv

API Reference

GenerateOptions

type GenerateOptions = {
  input: {
    text: string;
    images?: Array<Buffer | string>;
    csvFiles?: Array<Buffer | string>; // Explicit CSV files
    files?: Array<Buffer | string>; // Auto-detect file types
  };

  csvOptions?: {
    maxRows?: number; // Default: 1000
    formatStyle?: "raw" | "markdown" | "json"; // Default: "raw"
    includeHeaders?: boolean; // Default: true
  };

  // ... other options
};

CSV Input Types

CSV files can be provided as:

File paths: "./data.csv" or "/absolute/path/data.csv"
URLs: "https://example.com/data.csv"
Buffers: Buffer.from("name,age\nAlice,30")
Data URIs: "data:text/csv;base64,..."

// File path
await neurolink.generate({
  input: {
    text: "Analyze this",
    csvFiles: ["./data.csv"],
  },
});

// URL
await neurolink.generate({
  input: {
    text: "Analyze this",
    csvFiles: ["https://example.com/data.csv"],
  },
});

// Buffer
const csvBuffer = Buffer.from("name,age\nAlice,30\nBob,25");
await neurolink.generate({
  input: {
    text: "Analyze this",
    csvFiles: [csvBuffer],
  },
});

CSV Processing Options

maxRows

Limit the number of rows processed (default: 1000). Useful for large datasets.

csvOptions: {
  maxRows: 100; // Only process first 100 rows
}

formatStyle

Control how CSV data is formatted for the LLM:

raw (default, RECOMMENDED): Original CSV format with proper escaping
- Best for large files and minimal token usage
- Preserves original structure
- Handles commas, quotes, newlines correctly
- File size stays minimal (63KB stays 63KB, not 199KB)
json: JSON array format
- Best for structured data processing
- Easy to parse programmatically
- Higher token usage (can expand 3x for large files)
markdown: Markdown table format
- Best for small datasets (<100 rows)
- More readable for humans
- Takes most tokens

// Raw CSV (recommended for large files)
csvOptions: {
  formatStyle: "raw",
}
// Output: name,age\nAlice,30\nBob,25

// JSON array
csvOptions: {
  formatStyle: "json",
}
// Output: [{"name":"Alice","age":30},{"name":"Bob","age":25}]

// Markdown table
csvOptions: {
  formatStyle: "markdown",
}
// Output: | name | age |
//         | ---- | --- |
//         | Alice | 30 |

includeHeaders

Include CSV headers in output (default: true).

csvOptions: {
  includeHeaders: false; // Skip headers
}

File Detection System

NeuroLink uses a multi-strategy detection system with confidence scores:

Detection Strategies (in priority order)

Magic Bytes (95% confidence)
- Detects file type from binary headers
- Works for images (PNG, JPEG, GIF, WebP)
- PDFs and binary formats
MIME Type (85% confidence)
- Uses HTTP Content-Type headers for URLs
- Detects text/csv, image/*, etc.
Extension (70% confidence)
- File extension-based detection
- Supports: .csv, .tsv, .jpg, .png, etc.
Content Heuristics (75% confidence)
- Analyzes file content patterns
- Detects CSV by checking consistent comma-separated columns

The system stops at the first strategy with 80%+ confidence.

// Example: FileDetector workflow
// 1. Check magic bytes -> Not binary (0% confidence)
// 2. Check MIME type (if URL) -> text/csv (85% confidence) ✓ STOP
// Result: Detected as CSV with 85% confidence

How It Works

Internal Processing Flow

// When you call generate() with CSV files:
await neurolink.generate({
  input: {
    text: "Analyze this data",
    csvFiles: ["data.csv"],
  },
});

// Internal flow:
// 1. messageBuilder.ts detects csvFiles array
// 2. Calls FileDetector.detectAndProcess("data.csv")
// 3. FileDetector runs detection strategies
// 4. Loads file content (from path/URL/buffer)
// 5. Routes to CSVProcessor.process(buffer)
// 6. CSV parsed using streaming csv-parser library
// 7. Formatted to LLM-optimized text (raw/markdown/json)
// 8. Appends to prompt text:
//    "Analyze this data
//
//    ## CSV Data from "data.csv":
//    ```csv
//    name,age,city
//    Alice,30,New York
//    Bob,25,London
//    ```"
// 9. Sends to AI provider

Memory Efficiency

CSV files are parsed using streaming for memory efficiency:

// CSVProcessor uses Readable streams
Readable.from([csvString])
  .pipe(csvParser())
  .on("data", (row) => {
    if (count < maxRows) rows.push(row);
  });

Large CSV files are handled efficiently:

Streaming parser: Processes line-by-line
Row limit: Configurable maxRows (default: 1000)
Memory bounded: Only holds limited rows in memory

Examples

Data Analysis

const result = await neurolink.generate({
  input: {
    text: `Analyze this customer data and provide:
    1. Total customers
    2. Average age
    3. Top 5 cities by customer count
    4. Any notable patterns or insights`,
    csvFiles: ["customers.csv"],
  },
});

Data Comparison

const result = await neurolink.generate({
  input: {
    text: "Compare Q1 vs Q2 sales data. What changed? Which products improved?",
    csvFiles: ["q1-sales.csv", "q2-sales.csv"],
  },
});

Data Cleaning

const result = await neurolink.generate({
  input: {
    text: `Review this data for:
    - Missing values
    - Duplicate entries
    - Data quality issues
    - Suggested corrections`,
    csvFiles: ["raw-data.csv"],
  },
  csvOptions: {
    maxRows: 100,
    formatStyle: "markdown",
  },
});

Schema Generation

const result = await neurolink.generate({
  input: {
    text: "Generate a JSON schema for this CSV data with appropriate types and constraints",
    csvFiles: ["sample-data.csv"],
  },
  csvOptions: {
    maxRows: 50,
    formatStyle: "json",
  },
});

Multimodal Analysis

const result = await neurolink.generate({
  input: {
    text: "Compare the sales chart with the actual CSV data. Do they match?",
    files: ["sales-chart.png", "sales-data.csv"],
  },
});

TypeScript Types

Only types are exposed from the package (not classes):

import type {
  FileType,
  FileInput,
  FileSource,
  FileDetectionResult,
  FileProcessingResult,
  CSVProcessorOptions,
  FileDetectorOptions,
  CSVContent,
} from "@juspay/neurolink";

// FileType union
type FileType = "csv" | "image" | "pdf" | "text" | "unknown";

// CSV processing options
type CSVProcessorOptions = {
  maxRows?: number;
  formatStyle?: "raw" | "markdown" | "json";
  includeHeaders?: boolean;
};

// File detector options
type FileDetectorOptions = {
  maxSize?: number;
  timeout?: number;
  allowedTypes?: FileType[];
};

Best Practices

1. Use Raw Format for Large Files

The raw format is recommended for large files and best token efficiency:

csvOptions: {
  formatStyle: "raw",
} // ✅ RECOMMENDED for large files

// Use json for smaller datasets or when you need structured parsing
csvOptions: {
  formatStyle: "json",
} // ✅ Good for small-medium files

2. Limit Rows for Large Files

For large datasets, limit rows to avoid token limits:

csvOptions: {
  maxRows: 500,
} // Process first 500 rows

3. Use Markdown for Small Datasets

For <100 rows, markdown tables are more readable:

csvOptions: {
  maxRows: 50,
  formatStyle: "markdown"
}

4. Provide Clear Instructions

Give the AI clear instructions about what to analyze:

input: {
  text: `Analyze this sales data and provide:
  1. Total revenue
  2. Top 5 products
  3. Revenue trend
  4. Recommendations`,
  csvFiles: ["sales.csv"],
}

5. Use Auto-Detection

Let FileDetector handle mixed file types:

files: ["data.csv", "chart.png", "report.pdf"]; // Auto-detects each type

Limitations

Max file size: 10MB by default (configurable)
Max rows: 1000 by default (configurable)
Encoding: UTF-8 recommended (auto-detected)
Token limits: Large CSV files may exceed provider token limits
Streaming: CSV content is parsed and formatted before sending (not streamed to LLM)

Error Handling

try {
  const result = await neurolink.generate({
    input: {
      text: "Analyze this",
      csvFiles: ["data.csv"],
    },
  });
} catch (error) {
  if (error.message.includes("File too large")) {
    // Handle file size error
  } else if (error.message.includes("not allowed")) {
    // Handle file type restriction
  } else if (error.message.includes("CSV")) {
    // Handle CSV parsing error
  }
}

Office Documents: DOCX, PPTX, XLSX processing
PDF Support: PDF document processing
Image Support: Similar multimodal input for images
File Detection: Auto-detect file types with confidence scores
Memory Efficient: Streaming parser for large files
Provider Agnostic: Works with all AI providers
CLI Integration: Full CLI support with options

Summary

CSV support is multimodal input (like images)
Use csvFiles array or files array (auto-detect)
Customize with csvOptions (maxRows, formatStyle, includeHeaders)
Works with ALL providers (not just vision models)
Memory efficient streaming parser
CLI support with --csv, --file, --csv-max-rows, --csv-format
Only types exposed from package (not classes)

Overview​

Quick Start​

SDK Usage​

CLI Usage​

API Reference​

GenerateOptions​

CSV Input Types​

CSV Processing Options​

maxRows​

formatStyle​

includeHeaders​

File Detection System​

Detection Strategies (in priority order)​

How It Works​

Internal Processing Flow​

Memory Efficiency​

Examples​

Data Analysis​

Data Comparison​

Data Cleaning​

Schema Generation​

Multimodal Analysis​

TypeScript Types​

Best Practices​

1. Use Raw Format for Large Files​

2. Limit Rows for Large Files​

3. Use Markdown for Small Datasets​

4. Provide Clear Instructions​

5. Use Auto-Detection​

Limitations​

Error Handling​

Related Features​

Summary​