Audio Routing for Live Broadcast: Managing Multi-Channel Audio Streams

December 28, 2025 · Stephane · tutorials

Why Audio Routing Matters More Than You Think

In live broadcast engineering, audio is simultaneously the most important and most neglected element of the signal chain. Audiences will tolerate a slightly soft picture. They will not tolerate missing audio, wrong-language audio, phase-inverted stereo, or a live microphone on the wrong channel. Bad audio ends broadcasts. Good audio is invisible.

The challenge is that modern live production involves increasingly complex audio requirements: multiple languages, commentary plus ambient sound, hearing-impaired descriptive tracks, and different channel layouts for different destinations. A stream going to YouTube needs stereo. The same stream going to a broadcast partner might need 5.1 surround. The international feed needs the natural sound without commentary.

This guide covers how to handle audio routing in a streaming gateway context, with practical configurations for common broadcast scenarios.

Audio Fundamentals for Streaming Engineers

Before diving into routing, let’s establish the terminology and concepts that matter for live stream audio.

Channel Layouts

Audio streams carry one or more channels in a defined layout:

Layout	Channels	Common Use
Mono	1	Voice-only feeds, IFB
Stereo	2 (L, R)	Web streaming, social platforms
5.1 Surround	6 (FL, FR, FC, LFE, SL, SR)	Broadcast TV, premium streams
7.1 Surround	8 (FL, FR, FC, LFE, SL, SR, BL, BR)	Cinema, immersive audio

Most streaming platforms accept stereo only. Broadcast partners and OTT platforms may require multi-channel audio. Your gateway needs to handle both from the same source.

Embedded Audio vs. Separate Audio

In professional video transport (SDI, SRT, RTMP), audio is embedded in the video stream rather than carried as a separate signal. A single SRT stream typically carries the video plus multiple audio channels as interleaved PCM or compressed audio (AAC, Opus).

The number of audio channels in your transport stream depends on the encoder configuration. A common setup is 8 channels of embedded audio: channels 1-2 for program stereo, channels 3-4 for commentary, channels 5-6 for natural/ambient sound, and channels 7-8 for secondary language or talkback.

Sample Rate and Bit Depth

For broadcast streaming:

Sample rate: 48 kHz (broadcast standard). Never use 44.1 kHz (CD standard) in a broadcast chain.
Bit depth: 16-bit for compressed delivery (AAC), 24-bit for uncompressed transport (PCM in SRT/SDI).
Codec: AAC-LC for RTMP/HLS output, Opus for low-latency applications, PCM for transport between facilities.

Common Audio Routing Scenarios

Scenario 1: Multi-Language Commentary

You are distributing a live sports event with commentary in three languages. Your production creates one international sound feed (ISF) and three commentary pairs.

Source streams:

Input A (primary feed): 8 channels
- Ch 1-2: English commentary mixed with ISF
- Ch 3-4: French commentary mixed with ISF
- Ch 5-6: Spanish commentary mixed with ISF
- Ch 7-8: Clean ISF (natural sound only)

Required outputs:

Destination	Audio Needed	Channel Mapping
YouTube (English)	Stereo English	Input Ch 1-2 → Output Ch 1-2
YouTube (French)	Stereo French	Input Ch 3-4 → Output Ch 1-2
YouTube (Spanish)	Stereo Spanish	Input Ch 5-6 → Output Ch 1-2
Broadcast partner	All 8 channels	Input Ch 1-8 → Output Ch 1-8 (passthrough)
Archive recording	Clean ISF	Input Ch 7-8 → Output Ch 1-2

This is a textbook audio matrix routing problem. Without a gateway that supports channel mapping, you would need five separate encoder instances or a dedicated audio router.

Scenario 2: Downmix Surround to Stereo

Your production delivers 5.1 surround sound, but your web streaming destinations only accept stereo. You need to downmix 5.1 to stereo while preserving the surround mix for the broadcast output.

5.1 to stereo downmix formula:

L_out = FL + 0.707 * FC + 0.707 * SL
R_out = FR + 0.707 * FC + 0.707 * SR

The LFE (subwoofer) channel is typically dropped in a stereo downmix, as most consumer playback devices cannot reproduce it at the intended level.

Scenario 3: Audio-Follow-Video with Failover

You have a primary and backup video feed, each with its own embedded audio. When the gateway switches from primary to backup on video failover, the audio must switch simultaneously. Any audio/video desynchronization during the switch is immediately noticeable.

This is the audio-follow-video paradigm: audio routing decisions are tied to video routing decisions. When the failover engine switches video, it must also switch the corresponding audio channels.

Configuring the Audio Matrix in Vajra Cast

Vajra Cast includes a built-in audio matrix that handles up to 8 channels per stream. The matrix operates at the route level, sitting between your inputs and outputs.

Channel Mapping

To map specific input channels to specific output channels, configure the audio matrix for each output:

Example: Extract French commentary (channels 3-4) to a stereo output for a French YouTube stream:

Create a route from your multi-channel input to the RTMP output for YouTube
Open the audio matrix settings for that route
Map input channel 3 to output channel 1 (left)
Map input channel 4 to output channel 2 (right)
Leave all other output channels unmapped

The result: the YouTube output receives a stereo stream with only the French commentary, while the original 8-channel input continues to feed other outputs with their own mappings.

Per-Channel Gain Control

Each channel in the matrix has an independent gain control, measured in dB. Common adjustments:

+0 dB: Unity gain, no change (default)
-3 dB: Reduce by half power (subtle reduction)
-6 dB: Reduce by half perceived loudness
-inf (mute): Silence the channel completely

Use per-channel gain to balance commentary against natural sound, attenuate a hot microphone, or mute talkback channels that should not reach the audience.

Downmixing

For surround-to-stereo downmix, Vajra Cast’s audio matrix lets you route multiple input channels to the same output channel with gain adjustments. To implement the standard 5.1 downmix:

Input Channel	Maps To	Gain
FL (Ch 1)	Output L (Ch 1)	0 dB
FR (Ch 2)	Output R (Ch 2)	0 dB
FC (Ch 3)	Output L (Ch 1)	-3 dB
FC (Ch 3)	Output R (Ch 2)	-3 dB
SL (Ch 5)	Output L (Ch 1)	-3 dB
SR (Ch 6)	Output R (Ch 2)	-3 dB
LFE (Ch 4)	(unmapped)	—

When multiple inputs are summed to one output channel, the gains are additive. The -3 dB values above (-3 dB is approximately 0.707 linear) ensure the center and surround channels are mixed at the correct level relative to the front left and right.

Audio-Follow-Video

When using Vajra Cast’s failover system, audio follows video by default. The audio matrix configuration applies to whichever input is currently active. If your primary and backup feeds have the same channel layout (which they should), the audio routing remains consistent across failover events.

If your primary and backup feeds have different channel layouts (for example, the primary has 8 channels and the backup has 2), you can configure separate audio matrix profiles per input. The gateway applies the correct profile automatically when switching.

Monitoring Audio Levels

Monitoring is the difference between professional audio and guesswork. In live broadcast, you need to see audio levels in real-time and be alerted when they fall outside acceptable ranges.

Level Metering

Standard broadcast level targets:

Standard	Peak Level	Average Level	Notes
EBU R128	-1 dBTP	-23 LUFS	European broadcast standard
ATSC A/85	-2 dBTP	-24 LKFS	US broadcast standard
YouTube/Web	-1 dBTP	-14 LUFS	Platform recommendation

LUFS (Loudness Units relative to Full Scale) measures perceived loudness over time, not just peak amplitude. This is important because a signal that peaks at -1 dBFS can sound very different depending on its dynamic range.

What to Monitor

At minimum, monitor these audio parameters for every active stream:

Peak levels per channel: Ensure no channel is clipping (exceeding 0 dBFS) or dead (sustained silence)
Loudness (LUFS): Integrated loudness should stay within your target range
Phase correlation: A value near +1.0 means the stereo signal is healthy. A value near -1.0 means the channels are phase-inverted and will partially cancel on mono playback
Channel presence: Verify that all expected channels are active and carrying signal

Silent Channel Detection

One of the most common and embarrassing audio failures is a dead channel: audio that should be present but is silent. This can happen when:

An audio embed point upstream loses its feed
A microphone is muted at the source
A channel mapping error routes silence to the output
An encoder drops audio channels during a reconnection

Set up alerts for sustained silence (more than 5-10 seconds) on any output channel that should be carrying program audio. Vajra Cast’s monitoring exposes per-channel audio statistics through its Prometheus metrics endpoint, so you can integrate silence detection into your alerting pipeline with Grafana.

Advanced Audio Workflows

Multi-Language HLS with Audio Tracks

When distributing via HLS, you can include multiple audio tracks as separate renditions. The player presents a language selector to the viewer.

The workflow:

Ingest the multi-channel source (8 channels with multiple languages)
Create separate HLS audio renditions, each mapped from the appropriate source channels
The HLS manifest references all audio renditions
The video player (on web or app) lets the viewer choose their language

This is the standard approach for OTT platforms and premium live events. The video is encoded once; only the audio differs between renditions. Vajra Cast’s HLS output supports multiple audio renditions, configured through the audio matrix on each rendition’s route.

Audio-Only Streams

Some workflows require audio-only outputs: radio simulcast, podcast feeds, or audio-only web streams for bandwidth-constrained viewers. Configure a route that discards the video track and outputs only the mapped audio channels, transcoded to the appropriate codec (AAC for HLS, Opus for WebRTC, MP3 for legacy radio systems).

Talkback and IFB

In remote production, talkback (communication from the studio to the field crew) and IFB (Interruptible Foldback, the mix of program audio plus director cues fed to on-air talent) are often carried as dedicated audio channels in the transport stream.

These channels must be:

Routed only to the field monitors/earpieces, never to the program output
Excluded from the audio matrix for all audience-facing outputs
Low-latency, so the director’s cues arrive in time to be useful

A common layout:

Channel	Content	Routes To
Ch 1-2	Program stereo	All outputs
Ch 3-4	Natural sound	Broadcast partner, archive
Ch 5-6	Secondary language	Language-specific outputs
Ch 7	Talkback (studio → field)	Return feed to field only
Ch 8	IFB mix	Talent earpiece return only

In Vajra Cast, you handle this by simply not mapping channels 7-8 to any audience-facing output. Those channels exist in the transport stream for the return path to field monitors.

Troubleshooting Common Audio Issues

Audio Desynchronization (Lip Sync)

Audio arriving ahead of or behind the video is called lip sync error. Causes include:

Encoder audio delay: Some encoders process audio faster than video, introducing an offset
Transcoding pipeline: Hardware video encoding can introduce variable delay relative to audio passthrough
Network jitter: Different packet arrival timing for audio and video data

Solution: Most gateways, including Vajra Cast, maintain audio/video synchronization through timestamp-based processing. If you observe lip sync errors, check the encoder first, as that is the most common source.

Stereo Phase Inversion

If your audio sounds hollow, thin, or disappears on mono playback, one channel is likely phase-inverted. This happens when a cable is wired with inverted polarity or a digital processing stage inverts one channel.

Check the phase correlation meter. A reading near -1.0 confirms phase inversion. Fix it at the source (swap the XLR pin 2/3 on the affected channel) or apply a phase invert filter in the audio matrix.

Channel Swapping

Left and right channels are reversed. Less catastrophic than phase inversion but still wrong. Use the audio matrix to swap channels: map input channel 1 to output channel 2, and input channel 2 to output channel 1.

Missing Channels After Failover

If audio channels disappear after a failover event, the backup feed likely has a different channel layout than the primary. Standardize your channel layout across all sources, or configure per-input audio matrix profiles in your gateway.

Best Practices Summary

Standardize channel layouts across all sources in your production. Document which channels carry which content.
Monitor audio on every output, not just the input. A correct input can produce incorrect output if the matrix is misconfigured.
Test failover audio before every event. Confirm that audio-follow-video works correctly and all channels are present after switching.
Use per-channel gain to balance your mix at the gateway level, rather than asking upstream sources to adjust.
Set up silence alerts for every audience-facing output channel.
Keep talkback and IFB channels out of audience outputs by explicitly not mapping them.
Document your audio matrix configuration and save it as a template for recurring productions.

Audio routing is one of those disciplines where getting it right means nobody notices, and getting it wrong means everybody notices. A well-configured audio matrix in your streaming gateway, combined with proper monitoring, is the foundation for reliable multi-channel audio in live broadcast.

For related configuration, see the SRT Streaming Gateway guide for the full ingest-to-distribution architecture, and video failover best practices for ensuring audio continuity during input switching.

← Back to Guides