All Products

MediaSFU Translate

Live spoken translation for meetings and calls

Let speakers use their preferred language while listeners choose the voice language they hear. MediaSFU runs the real-time STT to LLM to TTS pipeline with curated language modes, optional AI notes, transcripts, and bring-your-own or system AI provider options.

50+ curated languages supported
Real-time voice translation (<1s latency)
Per-participant spoken & listen language selection
Three modes: any, allowlist, blocklist
Deepgram STT + OpenAI/Claude LLM + ElevenLabs TTS
System pool or bring-your-own AI credentials
Optional AI Notes and notes-only room mode
Voice overrides per language
Works with Meetings & Voice calls

Perfect For

Global Teams

Daily meetings where each participant speaks their native language and hears everyone else in their preferred language — automatically. Per-participant selection means no one-size-fits-all compromise.

Learn more

EdTech

Lectures and workshops accessible to international students in real-time. Use allowlist mode to restrict to class-relevant languages. Transcripts included for study notes.

Learn more

Healthcare

Telehealth consultations with patients who speak different languages. Blocklist mode ensures only medically-relevant languages are available. System pool credentials simplify setup.

Learn more

What You Can Do

Feature-rich tools designed for real-world workflows

Complete STT → LLM → TTS Pipeline

Three-stage real-time pipeline: speech is transcribed, translated by an LLM, then spoken back as natural-sounding audio.

  • STT (Speech-to-Text) — Deepgram for real-time transcription in the speaker's language
  • LLM (Translation) — OpenAI GPT-4, Claude, or other supported models for accurate translation
  • TTS (Text-to-Speech) — ElevenLabs for natural voice output in the listener's language
  • Sub-1-second end-to-end latency for conversational flow
  • Transcripts generated automatically as a side product of STT

Three Language Modes

Control exactly which languages are available in your rooms with flexible mode options.

  • Any mode — participants can speak or listen in any ISO 639-1 language (AI handles it)
  • Allowlist mode — restrict to specific languages (e.g., only en, es, fr, de)
  • Blocklist mode — allow everything except specific languages
  • Separate controls for spoken vs. listen languages
  • Per-participant language selection — each user picks their own

Translation Config Management

Create named translation configurations with your preferred AI credentials and language settings.

  • Named configs with nicknames — create, update, and delete via REST API
  • Link your STT, LLM, and TTS credentials by nickname
  • Set translationOutputMode for audio or text-only room output
  • Add aiNoteTakerConfig defaults for reusable AI summaries and transcripts
  • Per-language voice overrides — choose specific TTS voices for specific languages
  • Extra config fields for provider-specific parameters
  • Attach configs to rooms by nickname at room creation

Optional AI Notes & Notes-Only Rooms

Use the same translation runtime to generate meeting summaries and transcripts, with or without translated audio playback.

  • enableAiNotes adds summaries and transcript artifacts to translated rooms
  • aiNotesOnly runs note capture without presenting translated audio playback to participants
  • Notes-only rooms normalize to text-only output while keeping the translation runtime available
  • Reusable note behavior belongs on the Translation Config via aiNoteTakerConfig
  • Public room and event-setting POST routes accept the optional room flags

Flexible Credential Options

Use your own AI provider accounts or the MediaSFU system pool — no setup friction for getting started.

  • System pool — turn on "Use System Translation Configs" in Lite Dashboard Settings or Dashboard Settings.
  • Own credentials — bring your own Deepgram, OpenAI, and ElevenLabs accounts
  • System pool uses credits — STT, LLM, and TTS usage deducted from your balance
  • Own credentials — provider charges apply directly to your accounts
  • Top up system credits via the dashboard with credit packages

Room Integration & Constraints

Translation integrates directly into MediaSFU meeting and voice rooms.

  • Enable via room creation — set supportTranslation: true and attach your config
  • Add enableAiNotes for summaries, or aiNotesOnly for transcripts and notes without translated playback
  • Room capacity constraints — supportMaxRoom and supportFlexRoom disabled when translation is active
  • Works with both video meetings and voice-only SIP calls
  • Translation settings persist for the room duration
  • System translation config uses reserved nicknames (MediaSFUSystemTranslationConfig)

Voice Quality & Customization

Natural-sounding output with per-language voice selection.

  • ElevenLabs voices — high-quality, natural-sounding speech synthesis
  • Per-language overrides — assign specific voices to specific languages
  • Voice stability and similarity settings for fine-tuning
  • Multiple voice options per language for gender and tone variety
  • 50+ curated languages tested for accuracy — more available in "any" mode

Usage Scenarios

Real-world workflows, step by step

1

Setting Up Translation for a Global Team Meeting

Create a translation config and host a meeting where each participant speaks their native language.

1
Create AI credentials — In the dashboard, add your Deepgram (STT), OpenAI (LLM), and ElevenLabs (TTS) credentials
2
Create a Translation Config — POST to /v1/translationconfigs with your credential nicknames, language mode ("any" for global teams), output mode, and optional AI note-taker defaults
3
Create a room with translation — Set supportTranslation: true and translationConfigNickName in the room creation request. Add enableAiNotes for summaries, or aiNotesOnly for notes without translated playback.
4
Participants join and select languages — Each user picks their spoken and listen language from the available options
5
Speak naturally — Speech is transcribed, translated, and spoken back in each listener's language in under 1 second
2

Using the System Pool (No AI Setup Required)

Start translating immediately using MediaSFU's system AI credentials — no provider accounts needed.

1
Open Settings → System Credits — Turn on "Use System Translation Configs" in Lite Settings (or Dashboard Settings for developer accounts).
2
Top up credits — Add credits via the Top Up page — they cover STT, LLM, and TTS usage
3
Create a room — System translation config (MediaSFUSystemTranslationConfig) is automatically available
4
Translation is active — Participants select languages and speak — credits are deducted per usage
5
Monitor balance — Check your credit balance in Settings — separate tracking for translation credits
3

Restricting Languages for a Compliance Use Case

A healthcare provider needs to ensure only medically-validated languages are available in telehealth sessions.

1
Create a config with allowlist mode — Set listenLanguageMode: "allowlist" with only validated languages (e.g., en, es, fr)
2
Or use blocklist mode — Set spokenLanguageMode: "blocklist" with blockedSpokenLanguages to exclude specific languages
3
Attach to room — Rooms created with this config only offer the permitted languages to participants
4
Patients select from curated list — No risk of unsupported language selection — only validated options appear
Simple Pricing
BYO AIprovider costs stay direct

Use your own STT, LLM, and TTS credentials where supported, or use the MediaSFU system pool with credits.

  • 50+ curated languages
  • Three language modes (any/allowlist/blocklist)
  • Natural ElevenLabs voice output
  • Per-participant language selection
  • Optional AI Notes and notes-only room mode
  • Transcripts included automatically
View Full Pricing

Works With

Deepgram
OpenAI
Anthropic
ElevenLabs
Google Cloud
Azure Cognitive
AWS Transcribe

Ready to add live voice translation?

Try the meeting demo first, then configure translation or start free.