Miso TTS 8B
Text-to-speech with the MisoLabs/MisoTTS model — an 8B Sesame CSM-style model that generates Mimi audio codes from text, with optional voice continuation from a reference clip.
Provide a reference audio + its transcript to clone a voice, or leave them empty for a default voice. Outputs carry an imperceptible watermark identifying the audio as AI-generated.