Voice Cloning

10-30 sec 29 languages Real-time

Drag & drop your audio file here

or click to browse

Supports MP3, WAV, FLAC, M4A, OGG (max 100MB)

Click to start recording

Record 10-30 seconds of clear speech

Audio URL

Voice Settings

Voice Name

Model

Language

Quality

Your Credits

Get more credits

Tips for Best Results

Use 10-30 seconds of clear audio
Minimal background noise
Natural speaking pace
Varied intonation helps quality
Single speaker only

My Cloned Voices

No voices cloned yet.
Upload audio to create your first voice.

How Voice Cloning Works

Upload Sample

Provide 10-30 seconds of clear speech from the voice you want to clone.

AI Analysis

Our AI extracts voice characteristics, tone, accent, and speaking patterns.

Voice Model

A personalized voice model is created and saved to your account.

Generate Speech

Use your cloned voice to generate speech from any text, in any language.

Available Models

Fish Speech v1.5

Recommended

Lowest WER
13 languages
Zero-shot cloning
Best for accuracy

OpenVoice v2

Fast

Fastest processing
Tone control
Style transfer
Lightweight

XTTS v2

Versatile

17 languages
Emotion support
Long-form audio
Fine-tunable

Orpheus TTS 3B

Expressive

Most expressive
Emotion tags
100ms streaming
Llama backbone

Clone Voices via API

Integrate voice cloning into your applications with our simple REST API.

API Documentation

curl -X POST https://api.speechtospeechai.com/v1/voices/clone \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "name=My Voice" \
  -F "file=@sample.mp3" \
  -F "model=fish-speech-v1.5"

# Response
{
  "id": "voice_abc123",
  "name": "My Voice",
  "model": "fish-speech-v1.5",
  "status": "ready",
  "created_at": "2026-02-03T12:00:00Z"
}