Voice Cloning
10-30 sec
29 languages
Real-time
Drag & drop your audio file here
or click to browse
Supports MP3, WAV, FLAC, M4A, OGG (max 100MB)
Click to start recording
Record 10-30 seconds of clear speech
Audio Preview
Duration: --
Voice Settings
Tips for Best Results
- Use 10-30 seconds of clear audio
- Minimal background noise
- Natural speaking pace
- Varied intonation helps quality
- Single speaker only
My Cloned Voices
0
No voices cloned yet.
Upload audio to create your first voice.
How Voice Cloning Works
1
Upload Sample
Provide 10-30 seconds of clear speech from the voice you want to clone.
2
AI Analysis
Our AI extracts voice characteristics, tone, accent, and speaking patterns.
3
Voice Model
A personalized voice model is created and saved to your account.
4
Generate Speech
Use your cloned voice to generate speech from any text, in any language.
Available Models
Fish Speech v1.5
Recommended- Lowest WER
- 13 languages
- Zero-shot cloning
- Best for accuracy
OpenVoice v2
Fast- Fastest processing
- Tone control
- Style transfer
- Lightweight
XTTS v2
Versatile- 17 languages
- Emotion support
- Long-form audio
- Fine-tunable
Orpheus TTS 3B
Expressive- Most expressive
- Emotion tags
- 100ms streaming
- Llama backbone
Clone Voices via API
Integrate voice cloning into your applications with our simple REST API.
API Documentationcurl -X POST https://api.speechtospeechai.com/v1/voices/clone \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "name=My Voice" \
-F "file=@sample.mp3" \
-F "model=fish-speech-v1.5"
# Response
{
"id": "voice_abc123",
"name": "My Voice",
"model": "fish-speech-v1.5",
"status": "ready",
"created_at": "2026-02-03T12:00:00Z"
}