Before You Continue

This guide is for SDK builders creating custom AI apps.

If you are not building with the SDK, you likely do not need this setup. Use the Widgets Guide for low-configuration embeds, or the Telephony Guide for phone and SIP workflows.

Overview

Welcome to the MediaSFU AI Pipeline Guide! This guide helps you build audio and vision and multimodal pipelines for creating advanced AI-powered agents. Throughout this guide, you'll learn how to:

  • Configure AI Credentials for Voice and Vision services.
  • Build pipelines with STT, TTS, LLM, and custom processing steps.
  • Manage data buffers for real-time audio and video frames.
  • Handle errors effectively and return results to the client.

By the end of this guide, you'll have a comprehensive understanding of how to integrate speech recognition, text generation, speech synthesis, and image analysis into your MediaSFU applications.

Note: Dashboard-configured AI credentials take precedence over ephemeral parameters for the same keys (unless the dashboard field is empty). Use ephemeral parameters for additional fields not already set on the dashboard.

Building custom apps? Start from these GitHub repos: