API Documentation

Integrate speech-to-speech AI into your applications with our simple REST API. SDKs available for Python, JavaScript, and more.

Quickstart

Get started with the Speech to Speech AI API in under 5 minutes.

Base URL
https://api.speechtospeechai.com/v1
1. Install the SDK
Python
pip install speechtospeechai
2. Generate Speech
Python
from speechtospeechai import Client

client = Client(api_key="your-api-key")

# Generate speech from text
audio = client.audio.speech.create(
    model="fish-speech-v1.5",
    voice="nova",
    input="Hello! Welcome to Speech to Speech AI."
)

# Save to file
audio.save("output.mp3")
3. Clone a Voice
Python
# Clone a voice from an audio sample
voice = client.voices.clone(
    name="My Custom Voice",
    files=["sample.mp3"]
)

# Use the cloned voice
audio = client.audio.speech.create(
    model="fish-speech-v1.5",
    voice=voice.id,
    input="This is my cloned voice speaking!"
)

audio.save("cloned_output.mp3")

Authentication

All API requests require authentication via an API key in the Authorization header.

Header
Authorization: Bearer YOUR_API_KEY
Get your API key from your account dashboard.

Voice Cloning

POST /v1/voices/clone

Clone a voice from an audio sample.

Request Body (multipart/form-data)
Parameter Type Required Description
name string Yes Name for the cloned voice
file file Yes Audio file (10-30 seconds recommended)
model string No Model to use (default: fish-speech-v1.5)
language string No Language code (auto-detected if not specified)
Response
{
  "id": "voice_abc123xyz",
  "name": "My Custom Voice",
  "model": "fish-speech-v1.5",
  "language": "en",
  "status": "ready",
  "created_at": "2026-02-03T12:00:00Z"
}

Text to Speech

POST /v1/audio/speech

Convert text to speech audio.

Request Body (JSON)
{
  "model": "fish-speech-v1.5",
  "voice": "nova",
  "input": "Hello, world!",
  "speed": 1.0,
  "format": "mp3",
  "emotion": "neutral"
}
Parameter Type Required Description
model string Yes TTS model (fish-speech-v1.5, orpheus-3b, etc.)
voice string Yes Voice ID (preset or cloned)
input string Yes Text to convert (max 5000 chars)
speed float No Speed multiplier (0.5-2.0, default 1.0)
format string No Output format (mp3, wav, ogg, flac)
emotion string No Emotion tag (neutral, happy, sad, angry)
Response

Returns audio file binary with appropriate Content-Type header.

Speech to Text

POST /v1/audio/transcriptions

Transcribe audio to text with speaker diarization and timestamps.

Request Body (multipart/form-data)
Parameter Type Required Description
file file Yes Audio/video file to transcribe
model string No Model (whisper-large-v3, whisper-turbo)
language string No Language code (auto-detected if not set)
diarization boolean No Enable speaker diarization
timestamps string No Timestamp granularity (word, segment)
Response
{
  "text": "Hello, how are you today?",
  "language": "en",
  "duration": 3.5,
  "segments": [
    {
      "start": 0.0,
      "end": 1.2,
      "text": "Hello,",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 1.3,
      "end": 3.5,
      "text": "how are you today?",
      "speaker": "SPEAKER_00"
    }
  ]
}

Available Models

Text-to-Speech Models
Model ID Description Best For
fish-speech-v1.5 State-of-the-art multilingual TTS General use, highest quality
orpheus-3b Expressive Llama-based TTS Emotional content, streaming
openvoice-v2 Fast voice cloning and conversion Quick turnaround, voice conversion
xtts-v2 Versatile multilingual TTS Long-form content, fine-tuning
Speech-to-Text Models
Model ID Description Best For
whisper-large-v3 OpenAI Whisper Large v3 Highest accuracy, multilingual
whisper-turbo Optimized Whisper for speed Fast transcription
canary-qwen-2.5b NVIDIA Canary Qwen Lowest WER, English focus

Error Codes

Code Status Description
invalid_api_key 401 Invalid or missing API key
insufficient_credits 402 Not enough credits for this operation
rate_limit_exceeded 429 Too many requests
invalid_request 400 Malformed request body
model_not_found 404 Requested model doesn't exist
voice_not_found 404 Requested voice doesn't exist
file_too_large 413 Uploaded file exceeds size limit
server_error 500 Internal server error