Skip to main content

Deepgram Provider Guide

Fast, accurate speech-to-text with streaming, speaker diarization, and smart formatting

STT-only in NeuroLink — Deepgram is registered as the STT provider id deepgram. Deepgram's TTS product is not wired in NeuroLink today; for TTS use openai-tts, elevenlabs, azure-tts, or google-ai.


Overview

Deepgram is a speech recognition provider optimised for speed and accuracy in production environments. NeuroLink wraps Deepgram's Listen API, giving you access to the Nova-2 and Nova-3 model families through the standard generate() call. Deepgram's strengths include real-time streaming transcription over WebSocket, speaker diarization for multi-speaker audio, and smart formatting that cleans up dates, currency, and numbers automatically.

Key Facts

PropertyValue
Provider IDdeepgram
API endpointhttps://api.deepgram.com/v1/listen
Streaming endpointwss://api.deepgram.com/v1/listen
Default modelnova-2
Formatsmp3, wav, ogg, opus
Max audio2 hours (7,200 seconds) per request
Languages40+ languages and dialects
StreamingYes (WebSocket-based real-time transcription)

Quick Start

1. Get an API Key

Sign up at https://console.deepgram.com and create an API key under Settings → API Keys.

2. Configure Environment

Add to your .env file:

# Required
DEEPGRAM_API_KEY=your-deepgram-api-key

# Optional: default model (default: nova-2)
DEEPGRAM_MODEL=nova-2

# Optional: default language (default: en-US)
DEEPGRAM_LANGUAGE=en-US
npm install @juspay/neurolink
# or
pnpm add @juspay/neurolink

4. Transcribe Your First Audio File

import { NeuroLink } from "@juspay/neurolink";
import { readFileSync } from "fs";

const ai = new NeuroLink();
const audioBuffer = readFileSync("./recording.wav");

const result = await ai.generate({
input: { text: "Transcribe the following audio." },
stt: {
enabled: true,
provider: "deepgram",
audio: audioBuffer,
format: "wav",
},
});

if (result.transcription) {
console.log("Transcript:", result.transcription.text);
console.log("Confidence:", result.transcription.confidence);
console.log("Duration:", result.transcription.duration, "seconds");
}

Supported Models

Model IDDescriptionBest For
nova-2 (default)Fastest, lowest Word Error Rate in the Nova familyGeneral transcription, production use
nova-2-generalGeneral-purpose variant, same as nova-2Broad use cases
nova-2-meetingOptimised for multi-speaker meeting audioVideo conferences, recordings
nova-2-phonecallTuned for telephone audio qualityCall centre, PSTN audio
nova-2-voicemailHandles background noise and compressed audioVoicemail transcription
nova-2-financeFinance-domain vocabulary boostEarnings calls, financial content
nova-2-medicalMedical terminologyClinical notes, consultations
nova-3Next-generation model with improved accuracyDemanding accuracy requirements
novaPrevious generation NovaLegacy compatibility
enhancedHigh accuracy, slower processingArchival, quality-critical paths
baseFastest, lower accuracyDraft transcriptions, cost optimisation

SDK Usage

Basic Transcription

import { NeuroLink } from "@juspay/neurolink";
import { readFileSync } from "fs";

const ai = new NeuroLink();
const audio = readFileSync("./meeting.wav");

const result = await ai.generate({
input: { text: "Transcribe this audio." },
stt: {
enabled: true,
provider: "deepgram",
audio,
format: "wav",
language: "en-US",
},
});

if (result.transcription) {
console.log(result.transcription.text);
}

Choosing a Model

import type { DeepgramSTTOptions } from "@juspay/neurolink";

const result = await ai.generate({
input: { text: "Transcribe this meeting recording." },
stt: {
enabled: true,
provider: "deepgram",
audio,
format: "wav",
model: "nova-2-meeting",
} as DeepgramSTTOptions,
});

Smart Formatting

Smart formatting cleans up numbers, currency, dates, and other structured data automatically:

import type { DeepgramSTTOptions } from "@juspay/neurolink";

const result = await ai.generate({
input: { text: "Transcribe with formatting." },
stt: {
enabled: true,
provider: "deepgram",
audio,
format: "wav",
smartFormat: true, // Formats "twenty five dollars" → "$25"
} as DeepgramSTTOptions,
});

Speaker Diarization

Identify who spoke when in multi-speaker audio:

const result = await ai.generate({
input: { text: "Transcribe and identify speakers." },
stt: {
enabled: true,
provider: "deepgram",
audio,
format: "wav",
speakerDiarization: true,
},
});

if (result.transcription) {
console.log("Transcript:", result.transcription.text);
console.log("Speakers found:", result.transcription.speakers);

// Word-level speaker attribution
for (const word of result.transcription.words ?? []) {
console.log(
`${word.speaker ?? "?"}: "${word.word}" [${word.startTime}s–${word.endTime}s]`,
);
}
}

Utterance Segmentation

Split audio into utterance-level segments with speaker and timing information:

import type { DeepgramSTTOptions } from "@juspay/neurolink";

const result = await ai.generate({
input: { text: "Segment into utterances." },
stt: {
enabled: true,
provider: "deepgram",
audio,
format: "wav",
utterances: true,
speakerDiarization: true,
} as DeepgramSTTOptions,
});

if (result.transcription?.segments) {
for (const seg of result.transcription.segments) {
console.log(`[${seg.startTime}s] ${seg.speaker ?? "Speaker"}: ${seg.text}`);
}
}

Word-Level Timestamps

const result = await ai.generate({
input: { text: "Transcribe with word timings." },
stt: {
enabled: true,
provider: "deepgram",
audio,
format: "wav",
wordTimestamps: true,
},
});

if (result.transcription?.words) {
for (const word of result.transcription.words) {
console.log(
`"${word.word}" at ${word.startTime}s (confidence: ${word.confidence?.toFixed(2)})`,
);
}
}

Custom Vocabulary / Keyword Boosting

Improve recognition of domain-specific terms:

import type { DeepgramSTTOptions } from "@juspay/neurolink";

const result = await ai.generate({
input: { text: "Transcribe technical content." },
stt: {
enabled: true,
provider: "deepgram",
audio,
format: "wav",
keywords: ["NeuroLink", "EulerHS", "Juspay", "HyperSDK"],
keywordBoost: "high",
} as DeepgramSTTOptions,
});

Content Redaction

Automatically redact sensitive data from transcripts:

import type { DeepgramSTTOptions } from "@juspay/neurolink";

const result = await ai.generate({
input: { text: "Transcribe and redact PII." },
stt: {
enabled: true,
provider: "deepgram",
audio,
format: "wav",
redact: ["pci", "ssn"], // Redact credit card and SSN numbers
} as DeepgramSTTOptions,
});

Real-Time Streaming Transcription

Use the DeepgramSTT handler directly for WebSocket-based streaming:

import { DeepgramSTT } from "@juspay/neurolink";
import { createReadStream } from "fs";

const handler = new DeepgramSTT(process.env.DEEPGRAM_API_KEY);

async function* readAudioStream(filePath: string): AsyncIterable<Buffer> {
const stream = createReadStream(filePath, { highWaterMark: 4096 });
for await (const chunk of stream) {
yield chunk as Buffer;
}
}

const audioStream = readAudioStream("./live-audio.wav");

for await (const segment of handler.transcribeStream(audioStream, {
language: "en-US",
smartFormat: true,
speakerDiarization: true,
})) {
const status = segment.isFinal ? "[FINAL]" : "[partial]";
console.log(`${status} ${segment.text}`);
}

Per-Call Credential Override

const result = await ai.generate({
input: { text: "Transcribe with a per-request key." },
stt: {
enabled: true,
provider: "deepgram",
audio,
format: "wav",
},
credentials: {
deepgram: {
apiKey: "user-specific-deepgram-key",
},
},
});

CLI Usage

Basic Transcription

# Transcribe an audio file
neurolink generate "Respond to audio" \
--stt --stt-provider deepgram \
--input-audio recording.wav

# Specify model
neurolink generate "Transcribe this meeting" \
--stt --stt-provider deepgram \
--stt-model nova-2-meeting \
--input-audio meeting.mp3

Language Selection

neurolink generate "Transcribe Spanish audio" \
--stt --stt-provider deepgram \
--stt-language es \
--input-audio audio-es.wav

Smart Formatting

neurolink generate "Transcribe with smart formatting" \
--stt --stt-provider deepgram \
--stt-smart-format \
--input-audio recording.wav

Speaker Diarization

neurolink generate "Identify speakers" \
--stt --stt-provider deepgram \
--stt-diarize \
--input-audio meeting.wav

Supported Languages

Deepgram supports 40+ languages and regional dialects. Key languages available with diarization and punctuation:

CodeLanguage
enEnglish
en-USEnglish (US)
en-GBEnglish (UK)
esSpanish
frFrench
deGerman
itItalian
ptPortuguese
nlDutch
jaJapanese
koKorean
zhChinese
hiHindi
ruRussian

For the full language list, see the Deepgram language support docs.


Configuration Reference

Environment VariableRequiredDefaultDescription
DEEPGRAM_API_KEYYesDeepgram API key
DEEPGRAM_MODELNonova-2Default transcription model
DEEPGRAM_LANGUAGENoen-USDefault transcription language

Feature Support Matrix

FeatureSupportedNotes
Batch transcriptionYesUp to 2 hours per request
Real-time streamingYesWebSocket via transcribeStream()
Speaker diarizationYesspeakerDiarization: true
Word-level timestampsYesIncluded by default when words are returned
Smart formattingYessmartFormat: true — numbers, dates, currency
Utterance segmentationYesutterances: true
Keyword boostingYeskeywords + keywordBoost
Content redactionYesPCI, SSN number redaction
Profanity filterYesprofanityFilter: true
Custom vocabularyYeskeywords array
Multi-format inputYesmp3, wav, ogg, opus
Confidence scoresYesPer-transcript and per-word
40+ languagesYeslanguage option

Troubleshooting

"deepgram provider not configured"

The DEEPGRAM_API_KEY environment variable is missing or not loaded.

echo $DEEPGRAM_API_KEY

export DEEPGRAM_API_KEY=your-key-here

Create or rotate keys at https://console.deepgram.com.

"HTTP 401" — Invalid API key

Your key is invalid or has been revoked. Generate a new one from the Deepgram console.

"HTTP 402" — Insufficient credits

Your account balance is exhausted. Top up at https://console.deepgram.com/billing.

"HTTP 429" — Rate limit exceeded

Too many concurrent requests. Implement exponential backoff or reduce concurrency. Rate limits are documented in the Deepgram API docs.

Empty transcript returned

Audio may be silent, below detection threshold, or in the wrong language. Verify:

  1. The audio buffer is not empty (audioBuffer.length > 0).
  2. The format matches the actual audio encoding.
  3. The language matches the audio's spoken language.

"Deepgram STT request timed out after 30 seconds"

The request took longer than 30 seconds — typically due to very long audio or network issues. For audio over 30 minutes, consider splitting into chunks.

Streaming WebSocket disconnects

Check that DEEPGRAM_API_KEY is valid and that your network allows outbound WebSocket connections to wss://api.deepgram.com. Firewall or proxy configurations may block WebSocket upgrades.

Diarization not appearing in results

Diarization requires multi-speaker audio with clearly separated voices. Single-speaker audio will return no speaker labels. Also confirm speakerDiarization: true is set, and that you are using a model that supports it (Nova-2 and above).


See Also


Need Help? Join the GitHub Discussions or open an issue.