Azure Speech Services

Azure Cognitive Services Speech provides both TTS and STT capabilities.

Setup

# Required environment variables
AZURE_SPEECH_KEY=your-speech-key
AZURE_SPEECH_REGION=eastus

Get credentials from: Azure Portal > Cognitive Services > Speech > Keys and Endpoint

Usage

Text-to-Speech

const result = await neurolink.generate({
  input: { text: "Summarize this document" },
  tts: {
    enabled: true,
    provider: "azure-tts",
    voice: "en-US-JennyNeural",
    format: "mp3",
  },
});
// result.audio contains the synthesized speech

Speech-to-Text

const result = await neurolink.generate({
  input: { text: "" },
  stt: {
    enabled: true,
    provider: "azure-stt",
    audio: audioBuffer,
    language: "en-US",
  },
});
// result.transcription.text contains the transcribed text

CLI

# TTS
neurolink generate "Hello world" --tts --tts-provider azure-tts

# STT
neurolink generate --stt --stt-provider azure-stt --input-audio ./recording.wav

Supported Voices

Azure Speech supports 400+ neural voices across 140+ languages. Common voices:

Voice	Language	Style
`en-US-JennyNeural`	English (US)	General
`en-US-GuyNeural`	English (US)	General
`en-GB-SoniaNeural`	English (UK)	General
`de-DE-KatjaNeural`	German	General
`fr-FR-DeniseNeural`	French	General
`ja-JP-NanamiNeural`	Japanese	General

Supported Audio Formats

TTS output: mp3, wav, ogg
STT input: wav (16kHz PCM mono recommended), ogg, opus
- Azure's short-audio REST endpoint does not decode MP3 — convert to WAV first or use a different STT provider for MP3 input.

Limits

TTS: 10,000 characters per request
STT: Batch mode (streaming not yet supported)

Setup​

Usage​

Text-to-Speech​

Speech-to-Text​

CLI​

Supported Voices​

Supported Audio Formats​

Limits​