AI Models
We integrate the best open-source speech AI models. All models are available via API and web interface.
Text-to-Speech Models
Fish Speech v1.5
RecommendedState-of-the-art multilingual TTS model with the lowest word error rate. Supports zero-shot voice cloning from 10-30 second samples.
Orpheus TTS 3B
Llama-based expressive TTS with emotion control. Produces human-like speech with natural intonation and the ability to add emotion tags.
OpenVoice v2
Lightweight voice cloning with tone and style control. Great for voice conversion and quick cloning with minimal resources.
XTTS v2
Coqui's versatile multilingual TTS. Supports 17 languages with voice cloning and fine-tuning capabilities.
Speech-to-Text Models
Whisper Large v3
Most UsedOpenAI's industry-standard speech recognition. Excellent accuracy across 100+ languages with robust noise handling.
Canary Qwen 2.5B
NVIDIA's speech-augmented language model. Currently tops the Open ASR Leaderboard with lowest WER.
Conversational Models
PersonaPlex 7B
FeaturedNVIDIA's full-duplex conversational AI. Listens and speaks simultaneously with natural interruptions and backchannels.
Moshi (Kyutai)
The original full-duplex speech model that PersonaPlex is built upon. Real-time voice conversations with natural turn-taking.