โ† Back to docs

๐Ÿ”Š Voice / TTS Architecture

Four-tier text-to-speech fallback chain. Best-quality cloud voices when online; bundled neural models when not. Used by Prism AAC, in-call notifications, accessibility narration, and any app feature that speaks.

---

๐ŸŽš๏ธ Four-Tier Fallback Chain

| Tier | Engine | Quality | Offline | Cost | When |

|---|---|---|---|---|---|

| 1 | Inworld TTS-2 (cloud) | Best-in-class โ€” natural prosody, 60+ voices, voice cloning | โ€” | Per-character billing; free for ro/uk/ru/de/ko/ar (Synalux absorbs cost) | Default for paid tiers; default for free tier in subsidized languages |

| 1.5 | Kokoro-82M neural (WASM) | Very good โ€” locally-run neural voice | โœ… en/es/fr/pt/ja/zh | $0 | Free tier non-subsidized langs; offline; Tier 1 unavailable |

| 2 | OS Web Speech API premium voices | Good โ€” varies by OS (Apple voices best) | โœ… | $0 | Tier 1.5 unavailable in user's language; bandwidth-saver mode |

| 3 | WASM espeak-ng | Acceptable โ€” robotic but always works | โœ… | $0 | Last resort; covers 100+ languages |

!Voice Dictation Interface

The chain is automatic โ€” no user configuration required. The voice picker (paid) lets users override their default with any Tier 1 voice.

---

๐ŸŽค Voice Cloning (Paid)

"Speak in MY voice โ€” I trained it last week."

* 3-minute training sample uploaded once via the Voice Picker UI.

* Inworld voice-clone training takes ~10 minutes; the cloned voice is then available alongside the standard 60+ voices.

* Workspace-scoped โ€” clones are bound to the workspace; staff voices auto-shared, patient voices private.

* Use cases: AAC users speak in a parent's voice; clinicians who prefer to hear notes in their own voice; demo videos for parents/caregivers.

!Clinical Voice Dictation

---

๐ŸŒ Subsidized Languages (Free Tier)

Synalux absorbs Inworld TTS-2 cost for these languages on the free tier so users in regions where the platform serves underserved populations get premium voices at no cost:

ro (Romanian) ยท uk (Ukrainian) ยท ru (Russian) ยท de (German) ยท ko (Korean) ยท ar (Arabic)

---

๐Ÿฉบ Why This Matters Clinically

* Children with speech impairments depend on AAC's TTS for their actual voice โ€” robotic Tier-3 fallbacks aren't acceptable for daily use; Tier 1 / 1.5 must succeed.

* Bilingual households need accurate prosody in both languages โ€” Tier 1 is critical for non-English.

* Offline reliability โ€” a child at school without Wi-Fi must still have their device speak. Tier 1.5 + 2 + 3 are bundled; the device always talks.

---

๐Ÿ—๏ธ Architecture

``

POST /api/v1/tts Generate TTS audio (auto tier-route based on lang + tier + connectivity)

POST /api/v1/tts/public Anonymous TTS for AAC widgets (rate-limited per IP)

GET /api/v1/tts/voices List available voices for the user's tier + language

POST /api/v1/voices/clone Submit a voice-clone training sample (paid)

`

Routing logic:

`

synthesize(text, lang, voice?) โ†’

if voice && Tier1.available(voice): return Tier1.speak(text, voice)

if Tier1.available(lang) && (paid || Tier1.subsidizes(lang)): return Tier1.speak(text)

if Tier1_5.supports(lang): return Tier1_5.speak(text) // Kokoro

if Tier2.has(lang): return Tier2.speak(text) // Web Speech API

return Tier3.speak(text) // espeak-ng

``

---

๐Ÿ’ณ Plans

| | Free | Standard | Advanced | Enterprise |

|---|---|---|---|---|

| Default Inworld voice | โœ… | โœ… | โœ… | โœ… |

| All 60+ Inworld voices | โ€” | โœ… | โœ… | โœ… |

| Voice cloning | โ€” | โ€” | โœ… | โœ… |

| In-call clinical dictation TTS feedback | โ€” | โœ… | โœ… | โœ… |

| Custom voice library (workspace-curated) | โ€” | โ€” | โ€” | โœ… |

See full pricing โ†’

---

๐Ÿ”„ Where TTS Is Used

* Prism AAC โ€” picture tiles โ†’ speech, keyboard speak button, math panel AI tutor.

* In-call notifications โ€” meeting started, patient joined, recording started.

* Accessibility narration โ€” for users with low vision; reads notifications + form errors.

* AAC chat โ€” assistant responses can be auto-spoken with the user's selected voice.