Deepgram Provider Guide

Fast, accurate speech-to-text with streaming, speaker diarization, and smart formatting

STT-only in NeuroLink — Deepgram is registered as the STT provider id deepgram. Deepgram's TTS product is not wired in NeuroLink today; for TTS use openai-tts, elevenlabs, azure-tts, or google-ai.

Overview

Deepgram is a speech recognition provider optimised for speed and accuracy in production environments. NeuroLink wraps Deepgram's Listen API, giving you access to the Nova-2 and Nova-3 model families through the standard generate() call. Deepgram's strengths include real-time streaming transcription over WebSocket, speaker diarization for multi-speaker audio, and smart formatting that cleans up dates, currency, and numbers automatically.

Key Facts

Property	Value
Provider ID	`deepgram`
API endpoint	`https://api.deepgram.com/v1/listen`
Streaming endpoint	`wss://api.deepgram.com/v1/listen`
Default model	`nova-2`
Formats	mp3, wav, ogg, opus
Max audio	2 hours (7,200 seconds) per request
Languages	40+ languages and dialects
Streaming	Yes (WebSocket-based real-time transcription)

Quick Start

1. Get an API Key

2. Configure Environment

Add to your .env file:

# Required
DEEPGRAM_API_KEY=your-deepgram-api-key

# Optional: default model (default: nova-2)
DEEPGRAM_MODEL=nova-2

# Optional: default language (default: en-US)
DEEPGRAM_LANGUAGE=en-US

3. Install NeuroLink

npm install @juspay/neurolink
# or
pnpm add @juspay/neurolink

4. Transcribe Your First Audio File

import { NeuroLink } from "@juspay/neurolink";
import { readFileSync } from "fs";

const ai = new NeuroLink();
const audioBuffer = readFileSync("./recording.wav");

const result = await ai.generate({
  input: { text: "Transcribe the following audio." },
  stt: {
    enabled: true,
    provider: "deepgram",
    audio: audioBuffer,
    format: "wav",
  },
});

if (result.transcription) {
  console.log("Transcript:", result.transcription.text);
  console.log("Confidence:", result.transcription.confidence);
  console.log("Duration:", result.transcription.duration, "seconds");
}

Supported Models

Model ID	Description	Best For
`nova-2` (default)	Fastest, lowest Word Error Rate in the Nova family	General transcription, production use
`nova-2-general`	General-purpose variant, same as `nova-2`	Broad use cases
`nova-2-meeting`	Optimised for multi-speaker meeting audio	Video conferences, recordings
`nova-2-phonecall`	Tuned for telephone audio quality	Call centre, PSTN audio
`nova-2-voicemail`	Handles background noise and compressed audio	Voicemail transcription
`nova-2-finance`	Finance-domain vocabulary boost	Earnings calls, financial content
`nova-2-medical`	Medical terminology	Clinical notes, consultations
`nova-3`	Next-generation model with improved accuracy	Demanding accuracy requirements
`nova`	Previous generation Nova	Legacy compatibility
`enhanced`	High accuracy, slower processing	Archival, quality-critical paths
`base`	Fastest, lower accuracy	Draft transcriptions, cost optimisation

SDK Usage

Basic Transcription

import { NeuroLink } from "@juspay/neurolink";
import { readFileSync } from "fs";

const ai = new NeuroLink();
const audio = readFileSync("./meeting.wav");

const result = await ai.generate({
  input: { text: "Transcribe this audio." },
  stt: {
    enabled: true,
    provider: "deepgram",
    audio,
    format: "wav",
    language: "en-US",
  },
});

if (result.transcription) {
  console.log(result.transcription.text);
}

Choosing a Model

import type { DeepgramSTTOptions } from "@juspay/neurolink";

const result = await ai.generate({
  input: { text: "Transcribe this meeting recording." },
  stt: {
    enabled: true,
    provider: "deepgram",
    audio,
    format: "wav",
    model: "nova-2-meeting",
  } as DeepgramSTTOptions,
});

Smart Formatting

Smart formatting cleans up numbers, currency, dates, and other structured data automatically:

import type { DeepgramSTTOptions } from "@juspay/neurolink";

const result = await ai.generate({
  input: { text: "Transcribe with formatting." },
  stt: {
    enabled: true,
    provider: "deepgram",
    audio,
    format: "wav",
    smartFormat: true, // Formats "twenty five dollars" → "$25"
  } as DeepgramSTTOptions,
});

Speaker Diarization

Identify who spoke when in multi-speaker audio:

const result = await ai.generate({
  input: { text: "Transcribe and identify speakers." },
  stt: {
    enabled: true,
    provider: "deepgram",
    audio,
    format: "wav",
    speakerDiarization: true,
  },
});

if (result.transcription) {
  console.log("Transcript:", result.transcription.text);
  console.log("Speakers found:", result.transcription.speakers);

  // Word-level speaker attribution
  for (const word of result.transcription.words ?? []) {
    console.log(
      `${word.speaker ?? "?"}: "${word.word}" [${word.startTime}s–${word.endTime}s]`,
    );
  }
}

Utterance Segmentation

Split audio into utterance-level segments with speaker and timing information:

import type { DeepgramSTTOptions } from "@juspay/neurolink";

const result = await ai.generate({
  input: { text: "Segment into utterances." },
  stt: {
    enabled: true,
    provider: "deepgram",
    audio,
    format: "wav",
    utterances: true,
    speakerDiarization: true,
  } as DeepgramSTTOptions,
});

if (result.transcription?.segments) {
  for (const seg of result.transcription.segments) {
    console.log(`[${seg.startTime}s] ${seg.speaker ?? "Speaker"}: ${seg.text}`);
  }
}

Word-Level Timestamps

const result = await ai.generate({
  input: { text: "Transcribe with word timings." },
  stt: {
    enabled: true,
    provider: "deepgram",
    audio,
    format: "wav",
    wordTimestamps: true,
  },
});

if (result.transcription?.words) {
  for (const word of result.transcription.words) {
    console.log(
      `"${word.word}" at ${word.startTime}s (confidence: ${word.confidence?.toFixed(2)})`,
    );
  }
}

Custom Vocabulary / Keyword Boosting

Improve recognition of domain-specific terms:

import type { DeepgramSTTOptions } from "@juspay/neurolink";

const result = await ai.generate({
  input: { text: "Transcribe technical content." },
  stt: {
    enabled: true,
    provider: "deepgram",
    audio,
    format: "wav",
    keywords: ["NeuroLink", "EulerHS", "Juspay", "HyperSDK"],
    keywordBoost: "high",
  } as DeepgramSTTOptions,
});

Content Redaction

Automatically redact sensitive data from transcripts:

import type { DeepgramSTTOptions } from "@juspay/neurolink";

const result = await ai.generate({
  input: { text: "Transcribe and redact PII." },
  stt: {
    enabled: true,
    provider: "deepgram",
    audio,
    format: "wav",
    redact: ["pci", "ssn"], // Redact credit card and SSN numbers
  } as DeepgramSTTOptions,
});

Real-Time Streaming Transcription

Use the DeepgramSTT handler directly for WebSocket-based streaming:

import { DeepgramSTT } from "@juspay/neurolink";
import { createReadStream } from "fs";

const handler = new DeepgramSTT(process.env.DEEPGRAM_API_KEY);

async function* readAudioStream(filePath: string): AsyncIterable<Buffer> {
  const stream = createReadStream(filePath, { highWaterMark: 4096 });
  for await (const chunk of stream) {
    yield chunk as Buffer;
  }
}

const audioStream = readAudioStream("./live-audio.wav");

for await (const segment of handler.transcribeStream(audioStream, {
  language: "en-US",
  smartFormat: true,
  speakerDiarization: true,
})) {
  const status = segment.isFinal ? "[FINAL]" : "[partial]";
  console.log(`${status} ${segment.text}`);
}

Per-Call Credential Override

const result = await ai.generate({
  input: { text: "Transcribe with a per-request key." },
  stt: {
    enabled: true,
    provider: "deepgram",
    audio,
    format: "wav",
  },
  credentials: {
    deepgram: {
      apiKey: "user-specific-deepgram-key",
    },
  },
});

CLI Usage

Basic Transcription

# Transcribe an audio file
neurolink generate "Respond to audio" \
  --stt --stt-provider deepgram \
  --input-audio recording.wav

# Specify model
neurolink generate "Transcribe this meeting" \
  --stt --stt-provider deepgram \
  --stt-model nova-2-meeting \
  --input-audio meeting.mp3

Language Selection

neurolink generate "Transcribe Spanish audio" \
  --stt --stt-provider deepgram \
  --stt-language es \
  --input-audio audio-es.wav

Smart Formatting

neurolink generate "Transcribe with smart formatting" \
  --stt --stt-provider deepgram \
  --stt-smart-format \
  --input-audio recording.wav

Speaker Diarization

neurolink generate "Identify speakers" \
  --stt --stt-provider deepgram \
  --stt-diarize \
  --input-audio meeting.wav

Supported Languages

Deepgram supports 40+ languages and regional dialects. Key languages available with diarization and punctuation:

Code	Language
`en`	English
`en-US`	English (US)
`en-GB`	English (UK)
`es`	Spanish
`fr`	French
`de`	German
`it`	Italian
`pt`	Portuguese
`nl`	Dutch
`ja`	Japanese
`ko`	Korean
`zh`	Chinese
`hi`	Hindi
`ru`	Russian

For the full language list, see the Deepgram language support docs.

Configuration Reference

Environment Variable	Required	Default	Description
`DEEPGRAM_API_KEY`	Yes	—	Deepgram API key
`DEEPGRAM_MODEL`	No	`nova-2`	Default transcription model
`DEEPGRAM_LANGUAGE`	No	`en-US`	Default transcription language

Feature Support Matrix

Feature	Supported	Notes
Batch transcription	Yes	Up to 2 hours per request
Real-time streaming	Yes	WebSocket via `transcribeStream()`
Speaker diarization	Yes	`speakerDiarization: true`
Word-level timestamps	Yes	Included by default when words are returned
Smart formatting	Yes	`smartFormat: true` — numbers, dates, currency
Utterance segmentation	Yes	`utterances: true`
Keyword boosting	Yes	`keywords` + `keywordBoost`
Content redaction	Yes	PCI, SSN number redaction
Profanity filter	Yes	`profanityFilter: true`
Custom vocabulary	Yes	`keywords` array
Multi-format input	Yes	mp3, wav, ogg, opus
Confidence scores	Yes	Per-transcript and per-word
40+ languages	Yes	`language` option

Troubleshooting

"deepgram provider not configured"

The DEEPGRAM_API_KEY environment variable is missing or not loaded.

echo $DEEPGRAM_API_KEY

export DEEPGRAM_API_KEY=your-key-here

Create or rotate keys at https://console.deepgram.com.

"HTTP 401" — Invalid API key

Your key is invalid or has been revoked. Generate a new one from the Deepgram console.

"HTTP 402" — Insufficient credits

Your account balance is exhausted. Top up at https://console.deepgram.com/billing.

"HTTP 429" — Rate limit exceeded

Too many concurrent requests. Implement exponential backoff or reduce concurrency. Rate limits are documented in the Deepgram API docs.

Empty transcript returned

Audio may be silent, below detection threshold, or in the wrong language. Verify:

The audio buffer is not empty (audioBuffer.length > 0).
The format matches the actual audio encoding.
The language matches the audio's spoken language.

"Deepgram STT request timed out after 30 seconds"

The request took longer than 30 seconds — typically due to very long audio or network issues. For audio over 30 minutes, consider splitting into chunks.

Streaming WebSocket disconnects

Check that DEEPGRAM_API_KEY is valid and that your network allows outbound WebSocket connections to wss://api.deepgram.com. Firewall or proxy configurations may block WebSocket upgrades.

Diarization not appearing in results

Diarization requires multi-speaker audio with clearly separated voices. Single-speaker audio will return no speaker labels. Also confirm speakerDiarization: true is set, and that you are using a model that supports it (Nova-2 and above).

Overview​

Key Facts​

Quick Start​

1. Get an API Key​

2. Configure Environment​

3. Install NeuroLink​

4. Transcribe Your First Audio File​

Supported Models​

SDK Usage​

Basic Transcription​

Choosing a Model​

Smart Formatting​

Speaker Diarization​

Utterance Segmentation​

Word-Level Timestamps​

Custom Vocabulary / Keyword Boosting​

Content Redaction​

Real-Time Streaming Transcription​

Per-Call Credential Override​

CLI Usage​

Basic Transcription​

Language Selection​

Smart Formatting​

Speaker Diarization​

Supported Languages​

Configuration Reference​

Feature Support Matrix​

Troubleshooting​

"deepgram provider not configured"​

"HTTP 401" — Invalid API key​

"HTTP 402" — Insufficient credits​

"HTTP 429" — Rate limit exceeded​

Empty transcript returned​

"Deepgram STT request timed out after 30 seconds"​

Streaming WebSocket disconnects​

Diarization not appearing in results​

See Also​

Overview

Key Facts

Quick Start

1. Get an API Key

2. Configure Environment

3. Install NeuroLink

4. Transcribe Your First Audio File

Supported Models

SDK Usage

Basic Transcription

Choosing a Model

Smart Formatting

Speaker Diarization

Utterance Segmentation

Word-Level Timestamps

Custom Vocabulary / Keyword Boosting

Content Redaction

Real-Time Streaming Transcription

Per-Call Credential Override

CLI Usage

Basic Transcription

Language Selection

Smart Formatting

Speaker Diarization

Supported Languages

Configuration Reference

Feature Support Matrix

Troubleshooting

"deepgram provider not configured"

"HTTP 401" — Invalid API key

"HTTP 402" — Insufficient credits

"HTTP 429" — Rate limit exceeded

Empty transcript returned

"Deepgram STT request timed out after 30 seconds"

Streaming WebSocket disconnects

Diarization not appearing in results

See Also