Quickstart

Get started with the Speech to Speech AI API in under 5 minutes.

Base URL

https://api.speechtospeechai.com/v1

1. Install the SDK

Python

pip install speechtospeechai

2. Generate Speech

Python

from speechtospeechai import Client

client = Client(api_key="your-api-key")

# Generate speech from text
audio = client.audio.speech.create(
    model="fish-speech-v1.5",
    voice="nova",
    input="Hello! Welcome to Speech to Speech AI."
)

# Save to file
audio.save("output.mp3")

3. Clone a Voice

Python

# Clone a voice from an audio sample
voice = client.voices.clone(
    name="My Custom Voice",
    files=["sample.mp3"]
)

# Use the cloned voice
audio = client.audio.speech.create(
    model="fish-speech-v1.5",
    voice=voice.id,
    input="This is my cloned voice speaking!"
)

audio.save("cloned_output.mp3")

Authentication

All API requests require authentication via an API key in the Authorization header.

Header

Authorization: Bearer YOUR_API_KEY

Get your API key from your account dashboard.

Voice Cloning

POST /v1/voices/clone

Clone a voice from an audio sample.

Request Body (multipart/form-data)

Parameter	Type	Required	Description
`name`	string	Yes	Name for the cloned voice
`file`	file	Yes	Audio file (10-30 seconds recommended)
`model`	string	No	Model to use (default: fish-speech-v1.5)
`language`	string	No	Language code (auto-detected if not specified)

Response

{
  "id": "voice_abc123xyz",
  "name": "My Custom Voice",
  "model": "fish-speech-v1.5",
  "language": "en",
  "status": "ready",
  "created_at": "2026-02-03T12:00:00Z"
}

Text to Speech

POST /v1/audio/speech

Convert text to speech audio.

Request Body (JSON)

{
  "model": "fish-speech-v1.5",
  "voice": "nova",
  "input": "Hello, world!",
  "speed": 1.0,
  "format": "mp3",
  "emotion": "neutral"
}

Parameter	Type	Required	Description
`model`	string	Yes	TTS model (fish-speech-v1.5, orpheus-3b, etc.)
`voice`	string	Yes	Voice ID (preset or cloned)
`input`	string	Yes	Text to convert (max 5000 chars)
`speed`	float	No	Speed multiplier (0.5-2.0, default 1.0)
`format`	string	No	Output format (mp3, wav, ogg, flac)
`emotion`	string	No	Emotion tag (neutral, happy, sad, angry)

Response

Returns audio file binary with appropriate Content-Type header.

Speech to Text

POST /v1/audio/transcriptions

Transcribe audio to text with speaker diarization and timestamps.

Request Body (multipart/form-data)

Parameter	Type	Required	Description
`file`	file	Yes	Audio/video file to transcribe
`model`	string	No	Model (whisper-large-v3, whisper-turbo)
`language`	string	No	Language code (auto-detected if not set)
`diarization`	boolean	No	Enable speaker diarization
`timestamps`	string	No	Timestamp granularity (word, segment)

Response

{
  "text": "Hello, how are you today?",
  "language": "en",
  "duration": 3.5,
  "segments": [
    {
      "start": 0.0,
      "end": 1.2,
      "text": "Hello,",
      "speaker": "SPEAKER_00"
    },
    {
      "start": 1.3,
      "end": 3.5,
      "text": "how are you today?",
      "speaker": "SPEAKER_00"
    }
  ]
}

Available Models

Text-to-Speech Models

Model ID	Description	Best For
`fish-speech-v1.5`	State-of-the-art multilingual TTS	General use, highest quality
`orpheus-3b`	Expressive Llama-based TTS	Emotional content, streaming
`openvoice-v2`	Fast voice cloning and conversion	Quick turnaround, voice conversion
`xtts-v2`	Versatile multilingual TTS	Long-form content, fine-tuning

Speech-to-Text Models

Model ID	Description	Best For
`whisper-large-v3`	OpenAI Whisper Large v3	Highest accuracy, multilingual
`whisper-turbo`	Optimized Whisper for speed	Fast transcription
`canary-qwen-2.5b`	NVIDIA Canary Qwen	Lowest WER, English focus

Error Codes

Code	Status	Description
`invalid_api_key`	401	Invalid or missing API key
`insufficient_credits`	402	Not enough credits for this operation
`rate_limit_exceeded`	429	Too many requests
`invalid_request`	400	Malformed request body
`model_not_found`	404	Requested model doesn't exist
`voice_not_found`	404	Requested voice doesn't exist
`file_too_large`	413	Uploaded file exceeds size limit
`server_error`	500	Internal server error

API Documentation

Quickstart

Base URL

1. Install the SDK

2. Generate Speech

3. Clone a Voice

Authentication

Voice Cloning

Request Body (multipart/form-data)

Response

Text to Speech

Request Body (JSON)

Response

Speech to Text

Request Body (multipart/form-data)

Response

Available Models

Text-to-Speech Models

Speech-to-Text Models

Error Codes