AI Models

We integrate the best open-source speech AI models. All models are available via API and web interface.

Text-to-Speech Models

Fish Speech v1.5
Recommended

State-of-the-art multilingual TTS model with the lowest word error rate. Supports zero-shot voice cloning from 10-30 second samples.

Languages 13
Parameters 4B
VRAM ~6GB
Latency ~300ms
Zero-shot Cloning Multilingual Streaming
GitHub
Orpheus TTS 3B

Llama-based expressive TTS with emotion control. Produces human-like speech with natural intonation and the ability to add emotion tags.

Languages English
Parameters 3B
VRAM ~8GB
Latency ~100ms
Emotion Control Real-time Streaming Voice Cloning
GitHub
OpenVoice v2

Lightweight voice cloning with tone and style control. Great for voice conversion and quick cloning with minimal resources.

Languages 4
Parameters ~500M
VRAM ~3GB
Latency ~150ms
Fast Voice Conversion Style Control
GitHub
XTTS v2

Coqui's versatile multilingual TTS. Supports 17 languages with voice cloning and fine-tuning capabilities.

Languages 17
Parameters ~1.5B
VRAM ~4GB
Latency ~400ms
Multilingual Fine-tunable Long-form
GitHub

Speech-to-Text Models

Whisper Large v3
Most Used

OpenAI's industry-standard speech recognition. Excellent accuracy across 100+ languages with robust noise handling.

Languages 100+
Parameters 1.5B
VRAM ~10GB
WER (English) ~4%
High Accuracy Multilingual Timestamps
Canary Qwen 2.5B

NVIDIA's speech-augmented language model. Currently tops the Open ASR Leaderboard with lowest WER.

Languages English focus
Parameters 2.5B
VRAM ~8GB
WER (English) ~5.6%
Lowest WER LLM-powered

Conversational Models

PersonaPlex 7B
Featured

NVIDIA's full-duplex conversational AI. Listens and speaks simultaneously with natural interruptions and backchannels.

Parameters 7B
Base Model Moshi
Audio Rate 24kHz
Latency <200ms
Full-Duplex Custom Personas Voice Conditioning
GitHub
Moshi (Kyutai)

The original full-duplex speech model that PersonaPlex is built upon. Real-time voice conversations with natural turn-taking.

Parameters 7B
LLM Helium
Codec Mimi
License CC-BY-4.0
Full-Duplex Foundation Model
GitHub