View Mode: Choose essential setup steps or full implementation details.

Introduction to SIP with MediaSFU

MediaSFU telephony is not just SIP registration plus a dial tone. It is the layer that lets one room coordinate callers, AI voice agents, human operators, IVR prompts, recordings, and backend actions in a single flow. If your product needs phone numbers, escalation logic, or AI-assisted voice experiences, this guide is where the real operating model starts.

One telephony control plane

Bring inbound PSTN, outbound dialing, browser rooms, and call recording into the same MediaSFU workflow instead of stitching together separate stacks.

AI first, human when needed

Answer with AI voice agents, play IVR prompts, then hand off to a human or operator queue without moving the caller to another system.

Programmable call behavior

Control prompts, callbacks, routing, webhooks, and provider-specific SIP behavior from the same room-aware configuration surface.

What people usually build here

Production teams use this surface for inbound support lines, AI receptionists, callback queues, operator assist, outbound campaigns, and hybrid voice workflows where PSTN callers and browser users need to share the same MediaSFU logic.

Inbound DIDsOutbound campaignsAI receptionistHuman takeoverCallback flowsPrompt playback

Read this guide as a call lifecycle

Provider credentials matter, but production telephony is the larger sequence: carrier entry, room attach, AI or IVR routing, human escalation, and audit capture. The best SIP setups are designed around that full path.

Carrier entryRoom attachAI or IVR routingHuman escalationAudit trail

Carrier entry becomes runtime state

A SIP INVITE or DID is only the first mile. Once attached, the call participates in the same MediaSFU room and routing model as your AI and operator workflows.

AI and IVR stay inside one call path

Prompt playback, transcription, model responses, and backend actions should be designed as one path instead of separate disconnected tools.

Human takeover should be warm

If the caller needs a person, pass intent, transcript summary, and prior actions so the operator continues the job instead of restarting discovery.

Observability should follow the call

Track routing decisions, AI behavior, callback state, and escalation outcomes so support and operations teams can explain every call path.

Watch MediaSFU keep PSTN callers, WebRTC participants, AI routing, and recording policy in the same operational room.

See IVR prompts, AI handling, and human escalation stay in one stack rather than bouncing the caller across disconnected systems.

Need deployable VOIP apps fast?

Start with the ready-made MediaSFU telephony apps if you want working mobile, desktop, or web call experiences before deep SIP customization.

Basic

Need the setup sequence?

Use Start Here if you need the overall flow for providers, credentials, DIDs, and whether a basic or detailed setup path fits your team.

Basic

Need the real SIP knobs?

Go straight to Critical Setup when you are managing trunks, extra fields, routing behavior, AI handoff rules, or provider-specific configuration details.

Developer
What MediaSFU telephony gives you:
  • PSTN Connectivity: Connect your application to the Public Switched Telephone Network, allowing users to dial in from and dial out to standard phone numbers worldwide.
  • Automated Voice Agents (AI & IVR): Leverage MediaSFU's AI pipeline (Speech-to-Text, Large Language Models, Text-to-Speech) to build intelligent voice bots. Implement Interactive Voice Response (IVR) systems for self-service or call routing.
  • Dynamic Audio Playback: Play pre-recorded audio files or dynamically generate Text-to-Speech messages to callers, using templates for personalized communication.
  • Advanced Call Control: Manage call recording, implement automated callback queues, and integrate with your backend systems via webhooks for real-time call events and control.
  • Unified Communications: Blend SIP calls with WebRTC sessions for comprehensive communication solutions, bridging traditional telephony with modern web-based interactions.
Ready-to-use VOIP applicationsBasic

If you need working telephony apps before you tune provider fields, MediaSFU already ships production-oriented VOIP surfaces for mobile, desktop, and web teams.

The rest of this guide walks from provider and trunk setup into DIDs, critical extra fields, custom audio assets, AI-driven call behavior, and the ready-made applications that let you move faster when you do not want to build the full telephony shell yourself.

High-level diagram of the MediaSFU SIP Gateway
Figure 1: System architecture showing the SIP Gateway.