API Documentation
Integrate speech-to-speech AI into your applications with our simple REST API. SDKs available for Python, JavaScript, and more.
Quickstart
Get started with the Speech to Speech AI API in under 5 minutes.
Base URL
https://api.speechtospeechai.com/v1
1. Install the SDK
Python
pip install speechtospeechai
2. Generate Speech
Python
from speechtospeechai import Client
client = Client(api_key="your-api-key")
# Generate speech from text
audio = client.audio.speech.create(
model="fish-speech-v1.5",
voice="nova",
input="Hello! Welcome to Speech to Speech AI."
)
# Save to file
audio.save("output.mp3")
3. Clone a Voice
Python
# Clone a voice from an audio sample
voice = client.voices.clone(
name="My Custom Voice",
files=["sample.mp3"]
)
# Use the cloned voice
audio = client.audio.speech.create(
model="fish-speech-v1.5",
voice=voice.id,
input="This is my cloned voice speaking!"
)
audio.save("cloned_output.mp3")
Authentication
All API requests require authentication via an API key in the Authorization header.
Header
Authorization: Bearer YOUR_API_KEY
Get your API key from your account dashboard.
Voice Cloning
POST
/v1/voices/clone
Clone a voice from an audio sample.
Request Body (multipart/form-data)
| Parameter | Type | Required | Description |
|---|---|---|---|
name |
string | Yes | Name for the cloned voice |
file |
file | Yes | Audio file (10-30 seconds recommended) |
model |
string | No | Model to use (default: fish-speech-v1.5) |
language |
string | No | Language code (auto-detected if not specified) |
Response
{
"id": "voice_abc123xyz",
"name": "My Custom Voice",
"model": "fish-speech-v1.5",
"language": "en",
"status": "ready",
"created_at": "2026-02-03T12:00:00Z"
}
Text to Speech
POST
/v1/audio/speech
Convert text to speech audio.
Request Body (JSON)
{
"model": "fish-speech-v1.5",
"voice": "nova",
"input": "Hello, world!",
"speed": 1.0,
"format": "mp3",
"emotion": "neutral"
}
| Parameter | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | TTS model (fish-speech-v1.5, orpheus-3b, etc.) |
voice |
string | Yes | Voice ID (preset or cloned) |
input |
string | Yes | Text to convert (max 5000 chars) |
speed |
float | No | Speed multiplier (0.5-2.0, default 1.0) |
format |
string | No | Output format (mp3, wav, ogg, flac) |
emotion |
string | No | Emotion tag (neutral, happy, sad, angry) |
Response
Returns audio file binary with appropriate Content-Type header.
Speech to Text
POST
/v1/audio/transcriptions
Transcribe audio to text with speaker diarization and timestamps.
Request Body (multipart/form-data)
| Parameter | Type | Required | Description |
|---|---|---|---|
file |
file | Yes | Audio/video file to transcribe |
model |
string | No | Model (whisper-large-v3, whisper-turbo) |
language |
string | No | Language code (auto-detected if not set) |
diarization |
boolean | No | Enable speaker diarization |
timestamps |
string | No | Timestamp granularity (word, segment) |
Response
{
"text": "Hello, how are you today?",
"language": "en",
"duration": 3.5,
"segments": [
{
"start": 0.0,
"end": 1.2,
"text": "Hello,",
"speaker": "SPEAKER_00"
},
{
"start": 1.3,
"end": 3.5,
"text": "how are you today?",
"speaker": "SPEAKER_00"
}
]
}
Available Models
Text-to-Speech Models
| Model ID | Description | Best For |
|---|---|---|
fish-speech-v1.5 |
State-of-the-art multilingual TTS | General use, highest quality |
orpheus-3b |
Expressive Llama-based TTS | Emotional content, streaming |
openvoice-v2 |
Fast voice cloning and conversion | Quick turnaround, voice conversion |
xtts-v2 |
Versatile multilingual TTS | Long-form content, fine-tuning |
Speech-to-Text Models
| Model ID | Description | Best For |
|---|---|---|
whisper-large-v3 |
OpenAI Whisper Large v3 | Highest accuracy, multilingual |
whisper-turbo |
Optimized Whisper for speed | Fast transcription |
canary-qwen-2.5b |
NVIDIA Canary Qwen | Lowest WER, English focus |
Error Codes
| Code | Status | Description |
|---|---|---|
invalid_api_key |
401 | Invalid or missing API key |
insufficient_credits |
402 | Not enough credits for this operation |
rate_limit_exceeded |
429 | Too many requests |
invalid_request |
400 | Malformed request body |
model_not_found |
404 | Requested model doesn't exist |
voice_not_found |
404 | Requested voice doesn't exist |
file_too_large |
413 | Uploaded file exceeds size limit |
server_error |
500 | Internal server error |