Voice Cloning

10-30 sec 29 languages Real-time
Drag & drop your audio file here

or click to browse

Supports MP3, WAV, FLAC, M4A, OGG (max 100MB)

Click to start recording

Record 10-30 seconds of clear speech


Audio Preview
Duration: --
Voice Settings
Your Credits
10
Get more credits
Tips for Best Results
  • Use 10-30 seconds of clear audio
  • Minimal background noise
  • Natural speaking pace
  • Varied intonation helps quality
  • Single speaker only
My Cloned Voices
0

No voices cloned yet.
Upload audio to create your first voice.

How Voice Cloning Works

1
Upload Sample

Provide 10-30 seconds of clear speech from the voice you want to clone.

2
AI Analysis

Our AI extracts voice characteristics, tone, accent, and speaking patterns.

3
Voice Model

A personalized voice model is created and saved to your account.

4
Generate Speech

Use your cloned voice to generate speech from any text, in any language.

Available Models

Fish Speech v1.5
Recommended
  • Lowest WER
  • 13 languages
  • Zero-shot cloning
  • Best for accuracy
OpenVoice v2
Fast
  • Fastest processing
  • Tone control
  • Style transfer
  • Lightweight
XTTS v2
Versatile
  • 17 languages
  • Emotion support
  • Long-form audio
  • Fine-tunable
Orpheus TTS 3B
Expressive
  • Most expressive
  • Emotion tags
  • 100ms streaming
  • Llama backbone

Clone Voices via API

Integrate voice cloning into your applications with our simple REST API.

API Documentation
curl -X POST https://api.speechtospeechai.com/v1/voices/clone \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "name=My Voice" \
  -F "file=@sample.mp3" \
  -F "model=fish-speech-v1.5"

# Response
{
  "id": "voice_abc123",
  "name": "My Voice",
  "model": "fish-speech-v1.5",
  "status": "ready",
  "created_at": "2026-02-03T12:00:00Z"
}