Hacker News | Show HN: Real-time local TTS (31M params, 5.6x CPU, voice cloning, ONNX)

Show HN: Real-time local TTS (31M params, 5.6x CPU, voice cloning, ONNX)(github.com/ZDisket)

4 points by ZDisket 3 days ago | 4 comments

chenglin97 3 days ago
Does it support other languages than English? I’m Chinese speaker
[-]
- ZDisket 2 days ago
  No multilingual capabilities yet, although that is planned for next iteration.
popalchemist 3 days ago
given the architecture, is there a way to force the use of specific phonemes for hard-to-pronounce words? If so that's big
[-]
- ZDisket 2 days ago
  Yes. Specifically, the pipeline is text -> phonemizer -> phonemized text -> TTS model -> audio You just have to modify the phonemizer's dictionary.
adriencr81 3 days ago
[dead]