Audio Routing for Live Broadcast: Managing Multi-Channel Audio Streams
Why Audio Routing Matters More Than You Think
In live broadcast engineering, audio is simultaneously the most important and most neglected element of the signal chain. Audiences will tolerate a slightly soft picture. They will not tolerate missing audio, wrong-language audio, phase-inverted stereo, or a live microphone on the wrong channel. Bad audio ends broadcasts. Good audio is invisible.
The challenge is that modern live production involves increasingly complex audio requirements: multiple languages, commentary plus ambient sound, hearing-impaired descriptive tracks, and different channel layouts for different destinations. A stream going to YouTube needs stereo. The same stream going to a broadcast partner might need 5.1 surround. The international feed needs the natural sound without commentary.
This guide covers how to handle audio routing in a streaming gateway context, with practical configurations for common broadcast scenarios.
Audio Fundamentals for Streaming Engineers
Before diving into routing, let’s establish the terminology and concepts that matter for live stream audio.
Channel Layouts
Audio streams carry one or more channels in a defined layout:
| Layout | Channels | Common Use |
|---|---|---|
| Mono | 1 | Voice-only feeds, IFB |
| Stereo | 2 (L, R) | Web streaming, social platforms |
| 5.1 Surround | 6 (FL, FR, FC, LFE, SL, SR) | Broadcast TV, premium streams |
| 7.1 Surround | 8 (FL, FR, FC, LFE, SL, SR, BL, BR) | Cinema, immersive audio |
Most streaming platforms accept stereo only. Broadcast partners and OTT platforms may require multi-channel audio. Your gateway needs to handle both from the same source.
Embedded Audio vs. Separate Audio
In professional video transport (SDI, SRT, RTMP), audio is embedded in the video stream rather than carried as a separate signal. A single SRT stream typically carries the video plus multiple audio channels as interleaved PCM or compressed audio (AAC, Opus).
The number of audio channels in your transport stream depends on the encoder configuration. A common setup is 8 channels of embedded audio: channels 1-2 for program stereo, channels 3-4 for commentary, channels 5-6 for natural/ambient sound, and channels 7-8 for secondary language or talkback.
Sample Rate and Bit Depth
For broadcast streaming:
- Sample rate: 48 kHz (broadcast standard). Never use 44.1 kHz (CD standard) in a broadcast chain.
- Bit depth: 16-bit for compressed delivery (AAC), 24-bit for uncompressed transport (PCM in SRT/SDI).
- Codec: AAC-LC for RTMP/HLS output, Opus for low-latency applications, PCM for transport between facilities.
Common Audio Routing Scenarios
Scenario 1: Multi-Language Commentary
You are distributing a live sports event with commentary in three languages. Your production creates one international sound feed (ISF) and three commentary pairs.
Source streams:
- Input A (primary feed): 8 channels
- Ch 1-2: English commentary mixed with ISF
- Ch 3-4: French commentary mixed with ISF
- Ch 5-6: Spanish commentary mixed with ISF
- Ch 7-8: Clean ISF (natural sound only)
Required outputs:
| Destination | Audio Needed | Channel Mapping |
|---|---|---|
| YouTube (English) | Stereo English | Input Ch 1-2 → Output Ch 1-2 |
| YouTube (French) | Stereo French | Input Ch 3-4 → Output Ch 1-2 |
| YouTube (Spanish) | Stereo Spanish | Input Ch 5-6 → Output Ch 1-2 |
| Broadcast partner | All 8 channels | Input Ch 1-8 → Output Ch 1-8 (passthrough) |
| Archive recording | Clean ISF | Input Ch 7-8 → Output Ch 1-2 |
This is a textbook audio matrix routing problem. Without a gateway that supports channel mapping, you would need five separate encoder instances or a dedicated audio router.
Scenario 2: Downmix Surround to Stereo
Your production delivers 5.1 surround sound, but your web streaming destinations only accept stereo. You need to downmix 5.1 to stereo while preserving the surround mix for the broadcast output.
5.1 to stereo downmix formula:
L_out = FL + 0.707 * FC + 0.707 * SL
R_out = FR + 0.707 * FC + 0.707 * SR
The LFE (subwoofer) channel is typically dropped in a stereo downmix, as most consumer playback devices cannot reproduce it at the intended level.
Scenario 3: Audio-Follow-Video with Failover
You have a primary and backup video feed, each with its own embedded audio. When the gateway switches from primary to backup on video failover, the audio must switch simultaneously. Any audio/video desynchronization during the switch is immediately noticeable.
This is the audio-follow-video paradigm: audio routing decisions are tied to video routing decisions. When the failover engine switches video, it must also switch the corresponding audio channels.
Configuring the Audio Matrix in Vajra Cast
Vajra Cast includes a built-in audio matrix that handles up to 8 channels per stream. The matrix operates at the route level, sitting between your inputs and outputs.
Channel Mapping
To map specific input channels to specific output channels, configure the audio matrix for each output:
Example: Extract French commentary (channels 3-4) to a stereo output for a French YouTube stream:
- Create a route from your multi-channel input to the RTMP output for YouTube
- Open the audio matrix settings for that route
- Map input channel 3 to output channel 1 (left)
- Map input channel 4 to output channel 2 (right)
- Leave all other output channels unmapped
The result: the YouTube output receives a stereo stream with only the French commentary, while the original 8-channel input continues to feed other outputs with their own mappings.
Per-Channel Gain Control
Each channel in the matrix has an independent gain control, measured in dB. Common adjustments:
- +0 dB: Unity gain, no change (default)
- -3 dB: Reduce by half power (subtle reduction)
- -6 dB: Reduce by half perceived loudness
- -inf (mute): Silence the channel completely
Use per-channel gain to balance commentary against natural sound, attenuate a hot microphone, or mute talkback channels that should not reach the audience.
Downmixing
For surround-to-stereo downmix, Vajra Cast’s audio matrix lets you route multiple input channels to the same output channel with gain adjustments. To implement the standard 5.1 downmix:
| Input Channel | Maps To | Gain |
|---|---|---|
| FL (Ch 1) | Output L (Ch 1) | 0 dB |
| FR (Ch 2) | Output R (Ch 2) | 0 dB |
| FC (Ch 3) | Output L (Ch 1) | -3 dB |
| FC (Ch 3) | Output R (Ch 2) | -3 dB |
| SL (Ch 5) | Output L (Ch 1) | -3 dB |
| SR (Ch 6) | Output R (Ch 2) | -3 dB |
| LFE (Ch 4) | (unmapped) | — |
When multiple inputs are summed to one output channel, the gains are additive. The -3 dB values above (-3 dB is approximately 0.707 linear) ensure the center and surround channels are mixed at the correct level relative to the front left and right.
Audio-Follow-Video
When using Vajra Cast’s failover system, audio follows video by default. The audio matrix configuration applies to whichever input is currently active. If your primary and backup feeds have the same channel layout (which they should), the audio routing remains consistent across failover events.
If your primary and backup feeds have different channel layouts (for example, the primary has 8 channels and the backup has 2), you can configure separate audio matrix profiles per input. The gateway applies the correct profile automatically when switching.
Monitoring Audio Levels
Monitoring is the difference between professional audio and guesswork. In live broadcast, you need to see audio levels in real-time and be alerted when they fall outside acceptable ranges.
Level Metering
Standard broadcast level targets:
| Standard | Peak Level | Average Level | Notes |
|---|---|---|---|
| EBU R128 | -1 dBTP | -23 LUFS | European broadcast standard |
| ATSC A/85 | -2 dBTP | -24 LKFS | US broadcast standard |
| YouTube/Web | -1 dBTP | -14 LUFS | Platform recommendation |
LUFS (Loudness Units relative to Full Scale) measures perceived loudness over time, not just peak amplitude. This is important because a signal that peaks at -1 dBFS can sound very different depending on its dynamic range.
What to Monitor
At minimum, monitor these audio parameters for every active stream:
- Peak levels per channel: Ensure no channel is clipping (exceeding 0 dBFS) or dead (sustained silence)
- Loudness (LUFS): Integrated loudness should stay within your target range
- Phase correlation: A value near +1.0 means the stereo signal is healthy. A value near -1.0 means the channels are phase-inverted and will partially cancel on mono playback
- Channel presence: Verify that all expected channels are active and carrying signal
Silent Channel Detection
One of the most common and embarrassing audio failures is a dead channel: audio that should be present but is silent. This can happen when:
- An audio embed point upstream loses its feed
- A microphone is muted at the source
- A channel mapping error routes silence to the output
- An encoder drops audio channels during a reconnection
Set up alerts for sustained silence (more than 5-10 seconds) on any output channel that should be carrying program audio. Vajra Cast’s monitoring exposes per-channel audio statistics through its Prometheus metrics endpoint, so you can integrate silence detection into your alerting pipeline with Grafana.
Advanced Audio Workflows
Multi-Language HLS with Audio Tracks
When distributing via HLS, you can include multiple audio tracks as separate renditions. The player presents a language selector to the viewer.
The workflow:
- Ingest the multi-channel source (8 channels with multiple languages)
- Create separate HLS audio renditions, each mapped from the appropriate source channels
- The HLS manifest references all audio renditions
- The video player (on web or app) lets the viewer choose their language
This is the standard approach for OTT platforms and premium live events. The video is encoded once; only the audio differs between renditions. Vajra Cast’s HLS output supports multiple audio renditions, configured through the audio matrix on each rendition’s route.
Audio-Only Streams
Some workflows require audio-only outputs: radio simulcast, podcast feeds, or audio-only web streams for bandwidth-constrained viewers. Configure a route that discards the video track and outputs only the mapped audio channels, transcoded to the appropriate codec (AAC for HLS, Opus for WebRTC, MP3 for legacy radio systems).
Talkback and IFB
In remote production, talkback (communication from the studio to the field crew) and IFB (Interruptible Foldback, the mix of program audio plus director cues fed to on-air talent) are often carried as dedicated audio channels in the transport stream.
These channels must be:
- Routed only to the field monitors/earpieces, never to the program output
- Excluded from the audio matrix for all audience-facing outputs
- Low-latency, so the director’s cues arrive in time to be useful
A common layout:
| Channel | Content | Routes To |
|---|---|---|
| Ch 1-2 | Program stereo | All outputs |
| Ch 3-4 | Natural sound | Broadcast partner, archive |
| Ch 5-6 | Secondary language | Language-specific outputs |
| Ch 7 | Talkback (studio → field) | Return feed to field only |
| Ch 8 | IFB mix | Talent earpiece return only |
In Vajra Cast, you handle this by simply not mapping channels 7-8 to any audience-facing output. Those channels exist in the transport stream for the return path to field monitors.
Troubleshooting Common Audio Issues
Audio Desynchronization (Lip Sync)
Audio arriving ahead of or behind the video is called lip sync error. Causes include:
- Encoder audio delay: Some encoders process audio faster than video, introducing an offset
- Transcoding pipeline: Hardware video encoding can introduce variable delay relative to audio passthrough
- Network jitter: Different packet arrival timing for audio and video data
Solution: Most gateways, including Vajra Cast, maintain audio/video synchronization through timestamp-based processing. If you observe lip sync errors, check the encoder first, as that is the most common source.
Stereo Phase Inversion
If your audio sounds hollow, thin, or disappears on mono playback, one channel is likely phase-inverted. This happens when a cable is wired with inverted polarity or a digital processing stage inverts one channel.
Check the phase correlation meter. A reading near -1.0 confirms phase inversion. Fix it at the source (swap the XLR pin 2/3 on the affected channel) or apply a phase invert filter in the audio matrix.
Channel Swapping
Left and right channels are reversed. Less catastrophic than phase inversion but still wrong. Use the audio matrix to swap channels: map input channel 1 to output channel 2, and input channel 2 to output channel 1.
Missing Channels After Failover
If audio channels disappear after a failover event, the backup feed likely has a different channel layout than the primary. Standardize your channel layout across all sources, or configure per-input audio matrix profiles in your gateway.
Best Practices Summary
- Standardize channel layouts across all sources in your production. Document which channels carry which content.
- Monitor audio on every output, not just the input. A correct input can produce incorrect output if the matrix is misconfigured.
- Test failover audio before every event. Confirm that audio-follow-video works correctly and all channels are present after switching.
- Use per-channel gain to balance your mix at the gateway level, rather than asking upstream sources to adjust.
- Set up silence alerts for every audience-facing output channel.
- Keep talkback and IFB channels out of audience outputs by explicitly not mapping them.
- Document your audio matrix configuration and save it as a template for recurring productions.
Audio routing is one of those disciplines where getting it right means nobody notices, and getting it wrong means everybody notices. A well-configured audio matrix in your streaming gateway, combined with proper monitoring, is the foundation for reliable multi-channel audio in live broadcast.
For related configuration, see the SRT Streaming Gateway guide for the full ingest-to-distribution architecture, and video failover best practices for ensuring audio continuity during input switching.