Multi-Input Failover: N+1 Redundancy for Live Streams
How multi-input failover with N+1 redundancy works in Vajra Cast. Configure priority chains, health monitoring, and automatic switching.
What is Multi-Input Failover?
In broadcast engineering, N+1 redundancy means having one more backup than the minimum required. For live streaming, this translates to configuring multiple input sources for a single output route, so that if the primary feed fails, the system automatically switches to the next healthy input in the chain.
Vajra Cast implements multi-input failover as a core routing primitive. Every route can have an ordered list of inputs (a priority chain) and the engine continuously monitors all of them, switching to the highest-priority healthy source at all times.
This is not a bolt-on feature. It is built into the routing engine itself, which means failover decisions happen at the packet level, not at the application level. The result: switching times measured in milliseconds, not seconds.
Priority Chains
A priority chain is an ordered list of inputs assigned to a single route. Each input has a priority number, where lower numbers indicate higher priority:
| Priority | Input | Protocol | Role |
|---|---|---|---|
| 1 | Main encoder | SRT | Primary |
| 2 | Backup encoder | SRT | Hot standby |
| 3 | Cloud encoder | RTMP | Warm standby |
| 4 | Slate/test pattern | File | Emergency fallback |
Vajra Cast always selects the highest-priority (lowest number) input that is currently healthy. If priority 1 goes down, it switches to priority 2. If priority 1 recovers, it switches back.
You can build chains as deep as your infrastructure requires. There is no hard limit on the number of inputs in a priority chain.
Protocol Independence
Priority chains are protocol-agnostic. You can mix SRT, RTMP, HLS, UDP, and file-based inputs in the same chain. This is important for real-world deployments where your primary feed might be SRT over the internet, your backup is an RTMP encoder on the local network, and your emergency fallback is a pre-recorded slate file.
Health Monitoring
The failover engine needs to know when an input is healthy and when it is not. Vajra Cast monitors every input in the priority chain simultaneously, using multiple signals:
Connection State
The most basic check: is the input connected? For SRT, this means the handshake is complete and packets are flowing. For RTMP, the TCP connection is established and media data is being received.
- SRT listener: healthy when a caller connects and media arrives
- SRT caller: healthy when the connection to the remote listener succeeds
- RTMP: healthy when the publisher connects and sends audio/video
Packet Flow
A connected input that stops sending packets is not healthy. Vajra Cast tracks the last-received timestamp for every input and marks it as unhealthy after a configurable timeout:
Health timeout: 500ms (default)
If no packets arrive for 500ms, the input is flagged as down, and the engine triggers a switch to the next priority.
Bitrate Threshold
Sometimes an input degrades without fully disconnecting. A camera encoder might drop to a fraction of its normal bitrate due to overheating or network congestion. You can set a minimum bitrate threshold:
Minimum bitrate: 500 kbps
If the measured bitrate drops below this floor, the input is considered unhealthy. This catches “zombie” streams that are technically alive but useless for broadcast.
SRT-Specific Metrics
For SRT inputs, Vajra Cast also monitors protocol-level statistics:
- Packet loss rate: sustained loss above a configurable threshold triggers failover
- RTT (round-trip time): sudden RTT increases may indicate network path failure
- Retransmission rate: high retransmission can precede a full outage
These metrics give you earlier warning than simple connection state monitoring. You can catch a degrading link and fail over before it affects your output.
Switching Behavior
When the failover engine detects an unhealthy input and switches to the next priority, several things happen:
-
The output buffer absorbs the switch. Vajra Cast maintains a small output buffer (configurable, typically 50-200ms) that smooths the transition. For SRT outputs, this is absorbed within the SRT latency window.
-
Codec continuity is maintained. If both inputs use the same codec and resolution, the switch is invisible at the transport level. If the inputs differ, the transcoding engine handles the adaptation.
-
An event is logged. Every failover switch is recorded with a timestamp, the input that failed, the input that took over, and the reason for the switch (timeout, bitrate, packet loss, etc.).
-
Metrics are updated. The Prometheus metrics endpoint reports failover events, so your Grafana dashboards and alerting rules can react.
Switch Timing
The speed of a failover switch depends on the detection method:
| Detection Method | Typical Switch Time |
|---|---|
| SRT disconnect | <50ms |
| Packet timeout (500ms) | 500-600ms |
| Bitrate threshold | 200-500ms |
| Packet loss threshold | 100-300ms |
For the fastest possible failover, use SRT for both primary and backup inputs. SRT’s connection management provides sub-50ms detection, which translates to switches that are invisible to viewers.
Recovery Behavior
When a higher-priority input recovers, Vajra Cast can automatically switch back to it. This is configurable:
- Auto-recover: on. Switch back to the higher-priority input as soon as it is healthy. A configurable hold-off timer (default: 5 seconds) prevents flapping by requiring the recovered input to be stable before switching back.
- Auto-recover: off. Stay on the current input until a manual switch or the current input also fails.
For most production deployments, auto-recover with a 5-10 second hold-off timer is the recommended setting. This ensures you return to your best source without risking rapid switching during an unstable recovery.
Configuration Example
A typical multi-input failover route in Vajra Cast:
- Create the route with your desired output (e.g., SRT push to CDN).
- Add Input 1 (Priority 1): SRT Listener on port 9000, latency 200ms, passphrase set.
- Add Input 2 (Priority 2): SRT Caller to backup encoder at
srt://backup:9000, latency 500ms. - Add Input 3 (Priority 3): RTMP Listener on port 1935, stream key
backup-rtmp. - Set health parameters: timeout 500ms, minimum bitrate 500 kbps.
- Set recovery: auto-recover on, hold-off 5 seconds.
Once configured, the route is live. Connect your primary encoder to the SRT listener, and the system handles everything else automatically.
Real-World Deployment Patterns
Live Sports Remote Production
Primary: SRT from venue encoder over dedicated fiber. Backup: SRT from venue encoder over public internet (different path). Emergency: RTMP from a cloud-based encoder receiving the same camera feed via a separate SDI-to-IP converter.
Corporate Town Hall
Primary: SRT from the AV team’s encoder. Backup: RTMP from a laptop running OBS as a redundant encoder. Emergency: pre-recorded “technical difficulties” slate.
24/7 Worship Streaming
Primary: SRT from the main production switcher. Backup: SRT from a secondary camera with direct encoding. Emergency: looped recording of the church interior with background music.
Next Steps
- Return to the Video Stream Failover Guide for the complete failover architecture
- Learn about Crash Recovery for automatic stream restoration after software failures
- Explore SRT Redundancy for SRT-specific failover patterns