Multi-Input Failover: N+1 Redundancy for Live Streams

What is Multi-Input Failover?

In broadcast engineering, N+1 redundancy means having one more backup than the minimum required. For live streaming, this translates to configuring multiple input sources for a single output route, so that if the primary feed fails, the system automatically switches to the next healthy input in the chain.

Vajra Cast implements multi-input failover as a core routing primitive. Every route can have an ordered list of inputs (a priority chain) and the engine continuously monitors all of them, switching to the highest-priority healthy source at all times.

This is not a bolt-on feature. It is built into the routing engine itself, which means failover decisions happen at the packet level, not at the application level. The result: switching times measured in milliseconds, not seconds.

Priority Chains

A priority chain is an ordered list of inputs assigned to a single route. Each input has a priority number, where lower numbers indicate higher priority:

Priority	Input	Protocol	Role
1	Main encoder	SRT	Primary
2	Backup encoder	SRT	Hot standby
3	Cloud encoder	RTMP	Warm standby
4	Slate/test pattern	File	Emergency fallback

Vajra Cast always selects the highest-priority (lowest number) input that is currently healthy. If priority 1 goes down, it switches to priority 2. If priority 1 recovers, it switches back.

You can build chains as deep as your infrastructure requires. There is no hard limit on the number of inputs in a priority chain.

Protocol Independence

Priority chains are protocol-agnostic. You can mix SRT, RTMP, HLS, UDP, and file-based inputs in the same chain. This is important for real-world deployments where your primary feed might be SRT over the internet, your backup is an RTMP encoder on the local network, and your emergency fallback is a pre-recorded slate file.

Health Monitoring

The failover engine needs to know when an input is healthy and when it is not. Vajra Cast monitors every input in the priority chain simultaneously, using multiple signals:

Connection State

The most basic check: is the input connected? For SRT, this means the handshake is complete and packets are flowing. For RTMP, the TCP connection is established and media data is being received.

SRT listener: healthy when a caller connects and media arrives
SRT caller: healthy when the connection to the remote listener succeeds
RTMP: healthy when the publisher connects and sends audio/video

Packet Flow

A connected input that stops sending packets is not healthy. Vajra Cast tracks the last-received timestamp for every input and marks it as unhealthy after a configurable timeout:

Health timeout: 500ms (default)

If no packets arrive for 500ms, the input is flagged as down, and the engine triggers a switch to the next priority.

Bitrate Threshold

Sometimes an input degrades without fully disconnecting. A camera encoder might drop to a fraction of its normal bitrate due to overheating or network congestion. You can set a minimum bitrate threshold:

Minimum bitrate: 500 kbps

If the measured bitrate drops below this floor, the input is considered unhealthy. This catches “zombie” streams that are technically alive but useless for broadcast.

SRT-Specific Metrics

For SRT inputs, Vajra Cast also monitors protocol-level statistics:

Packet loss rate: sustained loss above a configurable threshold triggers failover
RTT (round-trip time): sudden RTT increases may indicate network path failure
Retransmission rate: high retransmission can precede a full outage

These metrics give you earlier warning than simple connection state monitoring. You can catch a degrading link and fail over before it affects your output.

Switching Behavior

When the failover engine detects an unhealthy input and switches to the next priority, several things happen:

The output buffer absorbs the switch. Vajra Cast maintains a small output buffer (configurable, typically 50-200ms) that smooths the transition. For SRT outputs, this is absorbed within the SRT latency window.
Codec continuity is maintained. If both inputs use the same codec and resolution, the switch is invisible at the transport level. If the inputs differ, the transcoding engine handles the adaptation.
An event is logged. Every failover switch is recorded with a timestamp, the input that failed, the input that took over, and the reason for the switch (timeout, bitrate, packet loss, etc.).
Metrics are updated. The Prometheus metrics endpoint reports failover events, so your Grafana dashboards and alerting rules can react.

Switch Timing

The speed of a failover switch depends on the detection method:

Detection Method	Typical Switch Time
SRT disconnect	<50ms
Packet timeout (500ms)	500-600ms
Bitrate threshold	200-500ms
Packet loss threshold	100-300ms

For the fastest possible failover, use SRT for both primary and backup inputs. SRT’s connection management provides sub-50ms detection, which translates to switches that are invisible to viewers.

Recovery Behavior

When a higher-priority input recovers, Vajra Cast can automatically switch back to it. This is configurable:

Auto-recover: on. Switch back to the higher-priority input as soon as it is healthy. A configurable hold-off timer (default: 5 seconds) prevents flapping by requiring the recovered input to be stable before switching back.
Auto-recover: off. Stay on the current input until a manual switch or the current input also fails.

For most production deployments, auto-recover with a 5-10 second hold-off timer is the recommended setting. This ensures you return to your best source without risking rapid switching during an unstable recovery.

Configuration Example

A typical multi-input failover route in Vajra Cast:

Create the route with your desired output (e.g., SRT push to CDN).
Add Input 1 (Priority 1): SRT Listener on port 9000, latency 200ms, passphrase set.
Add Input 2 (Priority 2): SRT Caller to backup encoder at srt://backup:9000, latency 500ms.
Add Input 3 (Priority 3): RTMP Listener on port 1935, stream key backup-rtmp.
Set health parameters: timeout 500ms, minimum bitrate 500 kbps.
Set recovery: auto-recover on, hold-off 5 seconds.

Once configured, the route is live. Connect your primary encoder to the SRT listener, and the system handles everything else automatically.

Real-World Deployment Patterns

Live Sports Remote Production

Primary: SRT from venue encoder over dedicated fiber. Backup: SRT from venue encoder over public internet (different path). Emergency: RTMP from a cloud-based encoder receiving the same camera feed via a separate SDI-to-IP converter.

Corporate Town Hall

Primary: SRT from the AV team’s encoder. Backup: RTMP from a laptop running OBS as a redundant encoder. Emergency: pre-recorded “technical difficulties” slate.

24/7 Worship Streaming

Primary: SRT from the main production switcher. Backup: SRT from a secondary camera with direct encoding. Emergency: looped recording of the church interior with background music.

Next Steps

Return to the Video Stream Failover Guide for the complete failover architecture
Learn about Crash Recovery for automatic stream restoration after software failures
Explore SRT Redundancy for SRT-specific failover patterns

← Back to main guide