MediaSFU Translate
Live spoken translation for meetings and calls
Let speakers use their preferred language while listeners choose the voice language they hear. MediaSFU runs the real-time STT to LLM to TTS pipeline with curated language modes, optional AI notes, transcripts, and bring-your-own or system AI provider options.
Perfect For
Global Teams
Daily meetings where each participant speaks their native language and hears everyone else in their preferred language — automatically. Per-participant selection means no one-size-fits-all compromise.
Learn moreEdTech
Lectures and workshops accessible to international students in real-time. Use allowlist mode to restrict to class-relevant languages. Transcripts included for study notes.
Learn moreHealthcare
Telehealth consultations with patients who speak different languages. Blocklist mode ensures only medically-relevant languages are available. System pool credentials simplify setup.
Learn moreWhat You Can Do
Feature-rich tools designed for real-world workflows
Complete STT → LLM → TTS Pipeline
Three-stage real-time pipeline: speech is transcribed, translated by an LLM, then spoken back as natural-sounding audio.
- STT (Speech-to-Text) — Deepgram for real-time transcription in the speaker's language
- LLM (Translation) — OpenAI GPT-4, Claude, or other supported models for accurate translation
- TTS (Text-to-Speech) — ElevenLabs for natural voice output in the listener's language
- Sub-1-second end-to-end latency for conversational flow
- Transcripts generated automatically as a side product of STT
Three Language Modes
Control exactly which languages are available in your rooms with flexible mode options.
- Any mode — participants can speak or listen in any ISO 639-1 language (AI handles it)
- Allowlist mode — restrict to specific languages (e.g., only en, es, fr, de)
- Blocklist mode — allow everything except specific languages
- Separate controls for spoken vs. listen languages
- Per-participant language selection — each user picks their own
Translation Config Management
Create named translation configurations with your preferred AI credentials and language settings.
- Named configs with nicknames — create, update, and delete via REST API
- Link your STT, LLM, and TTS credentials by nickname
- Set translationOutputMode for audio or text-only room output
- Add aiNoteTakerConfig defaults for reusable AI summaries and transcripts
- Per-language voice overrides — choose specific TTS voices for specific languages
- Extra config fields for provider-specific parameters
- Attach configs to rooms by nickname at room creation
Optional AI Notes & Notes-Only Rooms
Use the same translation runtime to generate meeting summaries and transcripts, with or without translated audio playback.
- enableAiNotes adds summaries and transcript artifacts to translated rooms
- aiNotesOnly runs note capture without presenting translated audio playback to participants
- Notes-only rooms normalize to text-only output while keeping the translation runtime available
- Reusable note behavior belongs on the Translation Config via aiNoteTakerConfig
- Public room and event-setting POST routes accept the optional room flags
Flexible Credential Options
Use your own AI provider accounts or the MediaSFU system pool — no setup friction for getting started.
- System pool — turn on "Use System Translation Configs" in Lite Dashboard Settings or Dashboard Settings.
- Own credentials — bring your own Deepgram, OpenAI, and ElevenLabs accounts
- System pool uses credits — STT, LLM, and TTS usage deducted from your balance
- Own credentials — provider charges apply directly to your accounts
- Top up system credits via the dashboard with credit packages
Room Integration & Constraints
Translation integrates directly into MediaSFU meeting and voice rooms.
- Enable via room creation — set supportTranslation: true and attach your config
- Add enableAiNotes for summaries, or aiNotesOnly for transcripts and notes without translated playback
- Room capacity constraints — supportMaxRoom and supportFlexRoom disabled when translation is active
- Works with both video meetings and voice-only SIP calls
- Translation settings persist for the room duration
- System translation config uses reserved nicknames (MediaSFUSystemTranslationConfig)
Voice Quality & Customization
Natural-sounding output with per-language voice selection.
- ElevenLabs voices — high-quality, natural-sounding speech synthesis
- Per-language overrides — assign specific voices to specific languages
- Voice stability and similarity settings for fine-tuning
- Multiple voice options per language for gender and tone variety
- 50+ curated languages tested for accuracy — more available in "any" mode
Usage Scenarios
Real-world workflows, step by step
Setting Up Translation for a Global Team Meeting
Create a translation config and host a meeting where each participant speaks their native language.
Using the System Pool (No AI Setup Required)
Start translating immediately using MediaSFU's system AI credentials — no provider accounts needed.
Restricting Languages for a Compliance Use Case
A healthcare provider needs to ensure only medically-validated languages are available in telehealth sessions.
Use your own STT, LLM, and TTS credentials where supported, or use the MediaSFU system pool with credits.
- 50+ curated languages
- Three language modes (any/allowlist/blocklist)
- Natural ElevenLabs voice output
- Per-participant language selection
- Optional AI Notes and notes-only room mode
- Transcripts included automatically
Works With
Part of These Solutions
Ready to add live voice translation?
Try the meeting demo first, then configure translation or start free.