Skip to main content

Fish Audio TTS Provider Guide

Low-cost text-to-speech — S2 Pro voice cloning, multilingual, ~80% cheaper than ElevenLabs


Overview

Fish Audio is a low-cost TTS provider focused on voice cloning. NeuroLink wraps it as a TTSHandler so it slots into the same generate({ tts: { provider: "fish-audio" } }) flow as OpenAI / ElevenLabs / Azure / Google AI TTS.

  • Latest model: s1 (default) — best quality
  • speech-1.6, speech-1.5 — older / cheaper models
  • Voice cloning: 15s of reference audio → custom voice id
  • Languages: 14 (English, Mandarin, Cantonese, Japanese, Korean, French, German, Spanish, Italian, Portuguese, Russian, Arabic, Hindi, Indonesian)

Key Facts

  • Protocol: Native REST API (POST /v1/tts)
  • Default base URL: https://api.fish.audio
  • Default model: s1
  • Default voice (reference_id): fb6c0e1ea91e427fb9a93b9bbf0a1e4d (Generic Female / English)
  • Max text length: 5000 characters
  • Output formats: mp3 (default), wav, pcm16 (raw 16-bit PCM @ 44.1 kHz)
  • Streaming: Not implemented in this handler (synchronous synthesis only)

Quick Start

1. Get an API Key

Sign up at https://fish.audio/ and create an API key from the dashboard.

2. Configure Environment

# Required
FISH_AUDIO_API_KEY=...

# Optional: override the default voice (any reference_id from the Fish library)
# FISH_AUDIO_VOICE_ID=...

# Optional: override the base URL
# FISH_AUDIO_BASE_URL=https://api.fish.audio

3. Synthesize Your First Audio

import { NeuroLink } from "@juspay/neurolink";
import { writeFileSync } from "node:fs";

const ai = new NeuroLink();

const result = await ai.generate({
// The LLM provider doesn't matter — TTS-only flows skip the LLM call
// when `tts.useAiResponse` is false (default)
provider: "openai",
input: { text: "Hello world from Fish Audio" },
tts: {
enabled: true,
provider: "fish-audio",
format: "mp3",
},
});

if (result.audio) {
writeFileSync("./hello.mp3", result.audio.buffer);
}

SDK Usage

Basic Synthesis (Default Voice)

const result = await ai.generate({
provider: "openai",
input: { text: "The quick brown fox jumps over the lazy dog." },
tts: { enabled: true, provider: "fish-audio" },
});

Custom Voice (Voice Cloning)

Get a reference_id from your Fish Audio dashboard after uploading 15s of reference audio:

const result = await ai.generate({
provider: "openai",
input: { text: "..." },
tts: {
enabled: true,
provider: "fish-audio",
voice: "your-custom-reference-id",
format: "mp3",
},
});

TTS-Augmented LLM Response

When useAiResponse: true, NeuroLink first calls the LLM, then synthesizes the LLM output through Fish Audio:

const result = await ai.generate({
provider: "openai", // generates the text
input: { text: "Tell me a 2-sentence joke" },
tts: {
enabled: true,
provider: "fish-audio",
useAiResponse: true,
format: "mp3",
},
});

console.log(result.content); // joke text
writeFileSync("joke.mp3", result.audio.buffer);

WAV / PCM16 Output

// WAV (44.1 kHz)
const wav = await ai.generate({
provider: "openai",
input: { text: "..." },
tts: { enabled: true, provider: "fish-audio", format: "wav" },
});

// PCM16 — raw 16-bit signed-LE PCM, NO RIFF header. Useful for
// realtime streaming pipelines that want raw samples.
const pcm = await ai.generate({
provider: "openai",
input: { text: "..." },
tts: { enabled: true, provider: "fish-audio", format: "pcm16" },
});

Per-Call Credentials

const result = await ai.generate({
provider: "openai",
input: { text: "..." },
tts: { enabled: true, provider: "fish-audio" },
credentials: {
/* ... no Fish Audio per-call slice yet — set FISH_AUDIO_API_KEY env */
},
});

CLI Usage

# Basic — default voice + mp3
pnpm run cli generate "Hello world" \
--tts --tts-provider fish-audio \
--output ./hello.mp3

# With a custom voice
pnpm run cli generate "Hello world" \
--tts --tts-provider fish-audio \
--tts-voice your-custom-reference-id \
--output ./hello.mp3

# WAV format
pnpm run cli generate "Hello world" \
--tts --tts-provider fish-audio \
--tts-format wav --output ./hello.wav

Configuration Reference

Environment VariableRequiredDefaultDescription
FISH_AUDIO_API_KEYYesFish Audio API key
FISH_AUDIO_VOICE_IDNofb6c0e1ea91e427fb9a93b9bbf0a1e4dDefault reference_id voice
FISH_AUDIO_BASE_URLNohttps://api.fish.audioBase URL

Voice Models

ModelNotes
s1Latest, best quality (default)
speech-1.6Previous flagship
speech-1.5Older / cheaper

Voices are identified by reference_id strings from the Fish library. Browse and clone voices in your Fish Audio dashboard.


Feature Support Matrix

FeatureFish Audio
Text-to-speechYes
Voice cloningYes (15s reference)
MultilingualYes (14 languages)
MP3 outputYes
WAV outputYes (44.1 kHz)
PCM16 outputYes (raw, no RIFF)
OPUS outputFalls back to MP3
StreamingNo (synchronous only)
getVoices()Not implemented

Troubleshooting

"Invalid Fish Audio API key"

echo $FISH_AUDIO_API_KEY
export FISH_AUDIO_API_KEY=...

Get / rotate at https://fish.audio/.

"Fish Audio rate limit exceeded"

Free tier has hourly limits. Upgrade your plan or implement exponential backoff. The handler maps 408 / 429 / 5xx to retriable errors.

"Fish Audio synthesis failed: 422"

Usually means the reference_id is invalid or the text is too long (>5000 chars). Truncate the text or check the voice id in your dashboard.

Audio sounds robotic / low quality

Try a higher-quality model (model: "s1" if you're on speech-1.5) or clone your own reference voice from a 15s clean audio sample. Default voices are generic — voice-cloned reference_ids almost always sound better.

"PCM16 output is unplayable in audio players"

pcm16 is RAW samples, not WAV — players need a header. To play, write a WAV header yourself (44 bytes) before the PCM data, or use the wav format which produces a complete RIFF/WAV file.


See Also


Need Help? Open a GitHub Discussion or issue.