← Back to docs

🎙️ Transcription

Audio → text, in-browser, HIPAA-grade. Whisper WASM for clinical sessions; cloud Whisper for high-fidelity batch jobs. Audio never leaves the device unless the user explicitly opts into cloud upgrade.

---

🎚️ Three-Path Transcription

| Path | Engine | When | Quality | Privacy |

|---|---|---|---|---|

| In-browser | Whisper WASM (Whisper.cpp compiled) | Default for live dictation, AAC voice input | Excellent for clear English/Spanish/French; noise-sensitive | ✅ Audio never leaves device |

| Server cloud | OpenAI Whisper-large-v3 via /api/v1/transcribe | Long-form batch jobs (1h+ session recordings); user opts in | Best — handles noise + accents + medical terminology | Audit-logged; HIPAA BAA in place |

| Live SOAP dictation | WhisperX (word-aligned timestamps) | Clinician dictating SOAP notes during session | Excellent + word timestamps for speaker-attribution | In-browser by default |

!Voice Dictation & Transcription

---

🩺 Live SOAP Dictation

The headline transcription use case. A clinician opens the session note, presses dictate, speaks the session out loud, and Synalux:

  • Transcribes locally (WASM) with word timestamps.
  • Identifies sections (Subjective / Objective / Assessment / Plan) by pattern.
  • Extracts ABC data (Antecedent / Behavior / Consequence) from natural-language descriptions.
  • Drafts the structured note for one-click sign-off.
  • !Voice Dictation UI

    See Applied Behavior Analysis and Clinical Notes Documentation for the downstream flow.

    ---

    🗣️ AAC Voice Input

    For Prism AAC users who can speak some words but use AAC for harder utterances:

    * Whisper WASM transcribes speech in-browser, populates the keyboard input.

    * Combined with autocorrect (Gemini 2.5 Flash-Lite) for typo recovery.

    * Locale-aware: matches the user's chosen language; supports code-switching (e.g. EN words inside a RO sentence).

    ---

    🏗️ Architecture

    ``

    POST /api/v1/transcribe Server-side cloud Whisper (long-form, audit-logged)

    body: { audio_url | audio_b64, lang?, model?='whisper-large-v3' }

    returns: { text, segments[], language, duration_ms }

    `

    In-browser path (services/whisperService.ts):

  • Whisper WASM model loaded lazily on first dictation use (~30MB cached in IndexedDB).
  • WhisperX add-on (~10MB) loaded for word timestamps when user enables "speaker tracking".
  • Audio captured via MediaRecorder API → fed in chunks to the WASM transcoder.
  • ---

    ⚖️ HIPAA + Privacy

    * In-browser default — audio bytes never traverse Synalux infrastructure.

    * Cloud upgrade requires explicit consent — UI shows a one-time consent gate per session before audio uploads.

    * Audit logging — every cloud transcription writes to transcription_audit` with user, session, audio duration, model used.

    * No retention — server-side audio bytes deleted within 24h of transcription; only the text result + audit row persist.

    ---

    💳 Plans

    | | Free | Standard | Advanced | Enterprise |

    |---|---|---|---|---|

    | In-browser Whisper (live dictation) | ✅ | ✅ | ✅ | ✅ |

    | AAC voice input (in-browser) | ✅ | ✅ | ✅ | ✅ |

    | Cloud Whisper-large-v3 (long-form) | — | ✅ 30 min/mo | ✅ 5 hr/mo | ✅ unlimited |

    | Speaker-attribution (WhisperX) | — | — | ✅ | ✅ |

    | Custom medical vocabulary boost | — | — | — | ✅ |

    See full pricing →

    ---

    🔄 Inter-Module Integration

    * SOAP / Clinical Notes — primary consumer; live dictation flow.

    * Prism AAC — voice input → keyboard pre-fill.

    * Telehealth — in-call live captions + recording-time transcription.

    * Mail — voice replies (record → transcribe → edit → send).

    * Translation — transcribed text can pipe directly into the Translation module.