Multimodal Capabilities Guide

NeuroLink provides comprehensive multimodal support, allowing you to combine text with various media types in a single AI interaction. This guide covers all supported input types, provider capabilities, and best practices.

Overview

Supported Input Types:

Images - JPEG, PNG, GIF, WebP, HEIC (vision-capable models)
PDFs - Document analysis and content extraction
CSV/Spreadsheets - Data analysis and tabular content processing
Audio - Transcription, analysis, and real-time voice input (Audio Input Guide)
Documents - Excel, Word, RTF, OpenDocument formats (File Processors Guide)
Data Files - JSON, YAML, XML with validation and formatting
Markup - HTML, SVG, Markdown with security sanitization
Source Code - 50+ programming languages with syntax detection

All multimodal inputs work seamlessly across both the CLI and SDK, with automatic format detection and provider-specific optimization.

New in 2026: NeuroLink now supports 17+ file types through the ProcessorRegistry system. See the File Processors Guide for comprehensive documentation.

Provider Support Matrix

Not all providers support all multimodal capabilities. Use this matrix to select the right provider for your use case.

Vision (Images)

Provider	Supported	Recommended Models	Max Images	Max Size	Notes
OpenAI	✅	`gpt-4o`, `gpt-4o-mini`, `gpt-5.2`	10	~20 MB	Best for general vision tasks
Azure OpenAI	✅	`gpt-4o`, `gpt-4o-mini`	10	~20 MB	Same as OpenAI
Google AI Studio	✅	`gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-3-flash`	16	~20 MB	Excellent for visual reasoning
Google Vertex AI	✅	`gemini-2.5-pro`, `gemini-2.5-flash`, Claude models	16/20	~20 MB	Gemini: 16 images, Claude: 20 images
Anthropic	✅	`claude-3.5-sonnet`, `claude-3.7-sonnet`	20	~20 MB	Strong visual understanding
AWS Bedrock	✅	Claude models	20	~20 MB	Same as Anthropic
Ollama	✅	`llava`, `bakllava`, `llava-phi3`	10	Varies	Local vision models
LiteLLM	✅	Depends on upstream	10	Varies	Proxy to vision-capable models
Mistral	✅	`pixtral-12b-2409`, `pixtral-large-2411`	10	~20 MB	Multimodal Mistral models
OpenRouter	✅	Depends on model	10	Varies	Routes to various vision models
Hugging Face	⚠️	Limited	Varies	Varies	Model-dependent
AWS SageMaker	❌	N/A	-	-	Not supported
OpenAI Compatible	⚠️	Depends on endpoint	Varies	Varies	Server-dependent

Legend:

✅ Full support with multiple models
⚠️ Limited or server-dependent support
❌ Not supported

PDF Documents

Provider	Supported	Max Size	Max Pages	Processing Mode	Notes
Google Vertex AI	✅	5 MB	100	Native PDF	Best for document analysis
Anthropic	✅	5 MB	100	Native PDF	Claude excels at document understanding
AWS Bedrock	✅	5 MB	100	Native PDF	Via Claude models
Google AI Studio	✅	2000 MB	100	Native PDF	Handles very large files
OpenAI	✅	10 MB	100	Files API	`gpt-4o`, `gpt-4o-mini`, `o1`
Azure OpenAI	✅	10 MB	100	Files API	Uses OpenAI Files API
LiteLLM	✅	10 MB	100	Proxy	Depends on upstream model
OpenAI Compatible	✅	10 MB	100	Varies	Server-dependent
Mistral	✅	10 MB	100	Native PDF	Native support
Hugging Face	✅	10 MB	100	Model-dependent	Varies by model
Ollama	❌	-	-	-	Not supported
OpenRouter	⚠️	Varies	Varies	Depends on model	Route-dependent
AWS SageMaker	❌	-	-	-	Not supported

CSV/Spreadsheet Data

Provider	Supported	Max Rows	Format Options	Notes
All Providers	✅	10,000	raw, json, markdown	Universal support - processed as text

CSV support works with all providers because files are converted to text before sending to the AI model. The file is parsed and formatted (raw CSV, JSON, or Markdown table) before inclusion in the prompt.

Format Recommendations:

Raw format - Best for large files (minimal token usage)
JSON format - Best for structured data processing
Markdown format - Best for small datasets (<100 rows), readable tables

Audio Input

Provider	Native Audio	Transcription	Real-time	Max Duration	Notes
Google AI Studio	✅	✅	✅	1 hour	Best for real-time voice
Google Vertex AI	✅	✅	✅	1 hour	Native Gemini audio support
OpenAI	❌	✅ Whisper	❌	25 MB	Excellent transcription accuracy
Azure OpenAI	❌	✅ Whisper	❌	25 MB	Via Whisper integration
Anthropic	❌	Via fallback	❌	-	Uses transcription approach
AWS Bedrock	❌	Via fallback	❌	-	Uses transcription approach
Others	❌	Via fallback	❌	-	Audio transcribed before processing

For comprehensive audio documentation, see the Audio Input Guide.

Image Input

Quick Start

CLI:

# Single image
npx @juspay/neurolink generate "Describe this interface" \
  --image ./designs/dashboard.png --provider google-ai

# Remote URL
npx @juspay/neurolink generate "Analyze this diagram" \
  --image https://example.com/architecture.png --provider openai

# Multiple images
npx @juspay/neurolink generate "Compare these screenshots" \
  --image ./before.png \
  --image ./after.png \
  --provider anthropic

SDK:

import { readFileSync } from "node:fs";
import { NeuroLink } from "@juspay/neurolink";

const neurolink = new NeuroLink({ enableOrchestration: true });

const result = await neurolink.generate({
  input: {
    text: "Analyze these product screenshots",
    images: [
      readFileSync("./homepage.png"), // Local file as Buffer
      "https://example.com/chart.png", // Remote URL
    ],
  },
  provider: "google-ai",
});

Image Formats Supported

Accepted formats:

JPEG (.jpg, .jpeg)
PNG (.png)
GIF (.gif)
WebP (.webp)
HEIC (.heic, .heif) - iOS photos

Input methods:

Buffer objects - readFileSync() from Node.js
Local file paths - Relative or absolute paths
HTTPS URLs - Remote images (auto-downloaded)

Image Alt Text (Accessibility)

NeuroLink supports alt text for images, improving accessibility and providing additional context to AI models.

const result = await neurolink.generate({
  input: {
    text: "Compare these revenue charts",
    images: [
      {
        data: readFileSync("./q1-revenue.png"),
        altText: "Q1 2024 revenue chart showing 15% growth",
      },
      {
        data: "https://example.com/q2-revenue.png",
        altText: "Q2 2024 revenue chart showing 22% growth",
      },
    ],
  },
  provider: "openai",
});

Alt text best practices:

Keep concise (under 125 characters ideal)
Focus on key information the image conveys
Alt text is automatically included as context in prompts

Image Size Limits

Provider-specific limits:

Most providers: ~20 MB per image
Recommended: Resize images to < 2 MP for faster processing
Token usage: ~7,000 tokens per image (varies by provider)

Optimization tips:

Compress images before sending for large batches
Use appropriate resolution (1920x1080 often sufficient)
Pre-process images to reduce unnecessary detail

PDF Document Input

Quick Start

CLI:

# Auto-detect PDF
npx @juspay/neurolink generate "Summarize this report" \
  --file ./financial-report.pdf --provider vertex

# Explicit PDF
npx @juspay/neurolink generate "Extract key terms from contract" \
  --pdf ./contract.pdf --provider anthropic

# Multiple PDFs
npx @juspay/neurolink generate "Compare these documents" \
  --pdf ./version1.pdf \
  --pdf ./version2.pdf \
  --provider vertex

SDK:

// Auto-detect (recommended)
await neurolink.generate({
  input: {
    text: "Analyze this document",
    files: ["./report.pdf", "./data.csv"], // Mixed file types
  },
  provider: "vertex",
});

// Explicit PDF
await neurolink.generate({
  input: {
    text: "Compare Q1 and Q2 reports",
    pdfFiles: ["./q1-report.pdf", "./q2-report.pdf"],
  },
  provider: "anthropic",
});

PDF Processing Modes

Provider-specific approaches:

Provider	Mode	Token Usage	Best For
Vertex AI, Anthropic, Bedrock	Native PDF	~1,000 tokens/3 pages	Visual + text extraction
Google AI Studio	Native PDF	~1,000 tokens/3 pages	Large files (up to 2 GB)
OpenAI, Azure	Files API	~1,000 tokens/3 pages	Text-only mode optimal

Visual vs. Text-only mode:

Visual mode: Preserves layout, tables, charts (~7,000 tokens/3 pages)
Text-only mode: Extracts text content only (~1,000 tokens/3 pages)

PDF Best Practices

Choose the right provider: Vertex AI or Anthropic for best results
Check file size: Most providers limit to 5 MB (AI Studio supports 2 GB)
Use streaming: For large documents, streaming provides faster initial results
Combine with other files: Mix PDFs with CSV data and images
Be specific in prompts: "Extract all monetary values" vs. "Tell me about this PDF"
Set appropriate token limits: Recommended 2000-8000 tokens for PDF analysis

CSV/Spreadsheet Input

Quick Start

CLI:

# Auto-detect CSV
npx @juspay/neurolink generate "Analyze sales trends" \
  --file ./sales_2024.csv

# Explicit CSV with options
npx @juspay/neurolink generate "Summarize data" \
  --csv ./data.csv \
  --csv-max-rows 500 \
  --csv-format raw

SDK:

// Auto-detect (recommended)
await neurolink.generate({
  input: {
    text: "Analyze this sales data",
    files: ["./sales.csv"], // Auto-detected as CSV
  },
});

// Explicit CSV with options
await neurolink.generate({
  input: {
    text: "Compare quarterly data",
    csvFiles: ["./q1.csv", "./q2.csv"],
  },
  csvOptions: {
    maxRows: 1000,
    formatStyle: "json", // or "raw", "markdown"
  },
});

CSV Format Options

Three format styles:

Raw format (default)
- Best for large files
- Minimal token usage
- Preserves original CSV structure
```
name,age,city
Alice,30,NYC
Bob,25,LA
```

JSON format

Structured data processing
Easier for AI to parse
Higher token usage

[
  { "name": "Alice", "age": 30, "city": "NYC" },
  { "name": "Bob", "age": 25, "city": "LA" }
]

Markdown format

Readable tables
Good for small datasets (<100 rows)
Moderate token usage

| name  | age | city |
| ----- | --- | ---- |
| Alice | 30  | NYC  |
| Bob   | 25  | LA   |

CSV Configuration

const result = await neurolink.generate({
  input: {
    text: "Analyze customer data",
    csvFiles: ["./customers.csv"],
  },
  csvOptions: {
    maxRows: 1000, // Limit rows (default: 1000, max: 10000)
    formatStyle: "json", // Format: "raw" | "json" | "markdown"
    includeHeaders: true, // Include header row (default: true)
  },
});

CSV Best Practices

Use raw format for large files to minimize token usage
Use JSON format for structured processing when AI needs to manipulate data
Limit to 1000 rows by default (configurable up to 10,000)
Combine CSV with visualization images for comprehensive analysis
Works with ALL providers (not just vision-capable models)

Combining Multiple Input Types

NeuroLink excels at combining different media types in a single request.

Mixed Media Example

const result = await neurolink.generate({
  input: {
    text: "Analyze this product launch: review the presentation, compare sales data, and assess the promotional materials",
    pdfFiles: ["./presentation.pdf"], // Slides
    csvFiles: ["./sales-data.csv"], // Numbers
    images: [
      readFileSync("./promo-banner.png"), // Marketing material
      "https://example.com/ad-campaign.jpg",
    ],
  },
  provider: "vertex", // Supports all input types
});

Streaming with Multimodal

const stream = await neurolink.stream({
  input: {
    text: "Analyze this floor plan and cost breakdown",
    images: ["./floor-plan.jpg"],
    csvFiles: ["./costs.csv"],
  },
  provider: "google-ai",
});

for await (const chunk of stream) {
  process.stdout.write(chunk.text ?? "");
}

Configuration & Fine-tuning

Image-Specific Options

const result = await neurolink.generate({
  input: {
    text: "Analyze these screenshots",
    images: [
      {
        data: readFileSync("./screenshot.png"),
        altText: "Product dashboard showing KPIs",
      },
    ],
  },
  provider: "openai",
  maxTokens: 2000, // Increase for detailed image analysis
});

PDF-Specific Options

const result = await neurolink.generate({
  input: {
    text: "Extract financial data from this report",
    pdfFiles: ["./annual-report.pdf"],
  },
  provider: "vertex",
  maxTokens: 8000, // Large token budget for comprehensive extraction
});

Regional Routing

Some providers require regional configuration for optimal performance:

const result = await neurolink.generate({
  input: {
    text: "Analyze this document",
    pdfFiles: ["./contract.pdf"],
  },
  provider: "vertex",
  region: "us-central1", // Vertex AI region
});

Best Practices

General Guidelines

Provide descriptive prompts - Reference specific images/files by name
Use alt text for accessibility - Helps both AI and screen readers
Combine analytics + evaluation - Benchmark multimodal quality before production
Cache remote assets locally - Avoid repeated downloads for frequently used files
Stream for user-facing apps - Use generate() for structured JSON output

Image Best Practices

Provide short captions describing each image in the prompt
Pre-compress large images to reduce processing time
Use appropriate image formats (JPEG for photos, PNG for diagrams)
Consider token limits when sending multiple images

PDF Best Practices

Choose providers with native PDF support (Vertex, Anthropic, Bedrock)
Be specific about what you need extracted
Use streaming for large documents
Set appropriate maxTokens (2000-8000 recommended)

CSV Best Practices

Use raw format for large datasets
Use JSON format when AI needs structured data manipulation
Limit rows to avoid token exhaustion
Combine with images for visual + numerical analysis

Troubleshooting

Common Issues

Issue	Solution
"Image not found"	Check file paths are relative to CWD where CLI is invoked
"Provider does not support images"	Switch to vision-capable provider (see matrix above)
"Error downloading image"	Ensure URL returns HTTP 200 and doesn't require authentication
"Large response latency"	Pre-compress images and reduce resolution to < 2 MP
"Streaming ends early"	Disable tools (`--disableTools`) to avoid tool call interruptions
"PDF too large"	Use Google AI Studio (2 GB limit) or split into smaller chunks
"CSV token overflow"	Reduce `maxRows` or use raw format instead of JSON/markdown

Provider-Specific Issues

OpenAI/Azure:

Images must be < 20 MB
PDFs processed via Files API (may take longer)

Google AI Studio/Vertex:

Best for large PDFs (AI Studio supports up to 2 GB)
Gemini models have excellent visual reasoning

Anthropic/Bedrock:

Claude excels at document understanding
Strong visual and text analysis capabilities

Ollama:

Use vision-capable models like llava, bakllava
Local processing - no cloud API required

Document Processing:

File Processors Guide - Complete guide to 17+ file types (Excel, Word, JSON, YAML, XML, HTML, SVG, code, etc.)
Office Documents - DOCX, PPTX, XLSX for Bedrock, Vertex, Anthropic
PDF Support - Detailed PDF processing guide
CSV Support - Advanced CSV processing techniques

Q4 2025 Features:

Guardrails Middleware - Content filtering for multimodal outputs
Auto Evaluation - Quality scoring for vision-based responses

Advanced Features:

Audio Input - Transcription, analysis, and real-time voice
TTS Integration - Text-to-Speech audio output
Video Generation - AI-powered video creation
PPT Generation - AI-powered PowerPoint presentations

Documentation:

CLI Commands - CLI flags and options reference
SDK API Reference - Complete API documentation
Troubleshooting - Extended error catalog

Examples & Recipes

Example 1: Product Analysis

Analyze a product page with screenshot, description, and pricing data:

const analysis = await neurolink.generate({
  input: {
    text: "Analyze this product: review the screenshot, pricing data, and provide recommendations",
    images: [readFileSync("./product-screenshot.png")],
    csvFiles: ["./pricing-tiers.csv"],
  },
  provider: "google-ai",
  maxTokens: 3000,
});

Example 2: Document Comparison

Compare two versions of a contract:

const comparison = await neurolink.generate({
  input: {
    text: "Compare these two contract versions and highlight key differences",
    pdfFiles: ["./contract-v1.pdf", "./contract-v2.pdf"],
  },
  provider: "anthropic",
  maxTokens: 5000,
});

Example 3: Data Visualization Analysis

Analyze charts and underlying data together:

const dataAnalysis = await neurolink.generate({
  input: {
    text: "Analyze these sales charts and verify against the raw data",
    images: [
      "https://example.com/q1-chart.png",
      "https://example.com/q2-chart.png",
    ],
    csvFiles: ["./sales-data.csv"],
  },
  provider: "vertex",
  enableAnalytics: true,
  enableEvaluation: true,
});

Summary

NeuroLink's multimodal capabilities provide:

✅ Universal input support - Images, PDFs, CSV files ✅ Provider flexibility - Extensive provider compatibility matrix ✅ Automatic format detection - Smart file type recognition ✅ Accessibility features - Alt text support for images ✅ Production-ready - Battle-tested at enterprise scale ✅ Developer-friendly - Works seamlessly across CLI and SDK

Next Steps:

Review the provider support matrix to select the right provider
Try the quick start examples with your use case
Explore advanced recipes for complex scenarios
Check troubleshooting if you encounter issues

Overview​

Provider Support Matrix​

Vision (Images)​

PDF Documents​

CSV/Spreadsheet Data​

Audio Input​

Image Input​

Quick Start​

Image Formats Supported​

Image Alt Text (Accessibility)​

Image Size Limits​

PDF Document Input​

Quick Start​

PDF Processing Modes​

PDF Best Practices​

CSV/Spreadsheet Input​

Quick Start​

CSV Format Options​

CSV Configuration​

CSV Best Practices​

Combining Multiple Input Types​

Mixed Media Example​

Streaming with Multimodal​

Configuration & Fine-tuning​

Image-Specific Options​

PDF-Specific Options​

Regional Routing​

Best Practices​

General Guidelines​

Image Best Practices​

PDF Best Practices​

CSV Best Practices​

Troubleshooting​

Common Issues​

Provider-Specific Issues​

Related Features​

Examples & Recipes​

Example 1: Product Analysis​

Example 2: Document Comparison​

Example 3: Data Visualization Analysis​

Summary​

Overview

Provider Support Matrix

Vision (Images)

PDF Documents

CSV/Spreadsheet Data

Audio Input

Image Input

Quick Start

Image Formats Supported

Image Alt Text (Accessibility)

Image Size Limits

PDF Document Input

Quick Start

PDF Processing Modes

PDF Best Practices

CSV/Spreadsheet Input

Quick Start

CSV Format Options

CSV Configuration

CSV Best Practices

Combining Multiple Input Types

Mixed Media Example

Streaming with Multimodal

Configuration & Fine-tuning

Image-Specific Options

PDF-Specific Options

Regional Routing

Best Practices

General Guidelines

Image Best Practices

PDF Best Practices

CSV Best Practices

Troubleshooting

Common Issues

Provider-Specific Issues

Related Features

Examples & Recipes

Example 1: Product Analysis

Example 2: Document Comparison

Example 3: Data Visualization Analysis

Summary