Zero-Copy Distribution: Efficient One-to-Many Streaming
How Vajra Cast's zero-copy distribution sends one input to unlimited outputs without duplicating data, saving CPU and memory.
The Problem with Naive Distribution
Suppose you have one incoming live stream and you need to send it to 10 destinations: three CDN ingest points, two recording servers, a monitoring feed, and four regional redistribution nodes.
The naive approach is to decode the input, make 10 copies of the frame data, and encode/packetize each copy independently. This works, but it scales terribly. Every additional output multiplies CPU usage, memory consumption, and latency. By the time you reach 20 or 30 outputs, you are buying hardware to copy bytes, not to process video.
Vajra Cast takes a different approach: zero-copy distribution. The input data is read once, stored in a shared buffer, and every output reads from that same buffer without duplicating it. Adding an output costs almost nothing.
What Zero-Copy Means
In systems programming, “zero-copy” refers to techniques that move data from source to destination without copying it through intermediate buffers. The classic example is sendfile() on Linux, which moves data from a file descriptor to a network socket without copying it into user space.
Vajra Cast’s zero-copy distribution applies this principle to live stream routing:
- Packets arrive from the input (SRT, RTMP, or any supported protocol).
- Packets are stored in a single shared ring buffer in memory.
- Each output maintains its own read pointer into that ring buffer.
- Packets are sent to each output directly from the shared buffer. No per-output copies.
The result: memory usage is proportional to the buffer size, not the number of outputs. CPU usage for distribution is proportional to the number of system calls (one per output per packet), not to the data volume. And since there is no per-output encoding step (for passthrough routes), the processing cost per output is negligible.
Performance Impact
The difference between copy-based and zero-copy distribution becomes dramatic as you scale:
| Outputs | Copy-Based CPU | Zero-Copy CPU | Memory (Copy) | Memory (Zero-Copy) |
|---|---|---|---|---|
| 1 | Baseline | Baseline | Baseline | Baseline |
| 5 | ~5x | ~1.02x | ~5x | ~1x |
| 10 | ~10x | ~1.05x | ~10x | ~1x |
| 50 | ~50x | ~1.2x | ~50x | ~1x |
These numbers are for passthrough (no transcoding). The CPU column represents the distribution overhead: the cost of getting packets from the input to the output sockets. With zero-copy, this overhead is almost entirely system call overhead (sending packets), not data copying.
In practice, this means a single modest server running Vajra Cast can distribute one HD stream to 50+ outputs without breaking a sweat. The bottleneck shifts from CPU to network bandwidth, which is where it should be.
Protocol Independence
Zero-copy distribution in Vajra Cast works across protocols. The shared buffer contains the raw transport stream (typically MPEG-TS), and each output packetizes it according to its protocol:
- SRT outputs wrap the TS packets in SRT frames with encryption and error recovery.
- RTMP outputs re-mux the TS into FLV containers for RTMP delivery.
- UDP outputs send raw TS packets (for legacy infrastructure).
- HLS outputs segment the TS into chunks for HTTP delivery.
- Recording outputs write the TS to disk.
The protocol-specific work happens at the edges, at ingest and at output. The distribution layer in the middle is protocol-agnostic and operates on raw packets.
This also means you can receive a stream via SRT and distribute it simultaneously to SRT, RTMP, HLS, and a recording, all from the same shared buffer, all zero-copy.
The Ring Buffer
At the heart of zero-copy distribution is the ring buffer (also called a circular buffer). This is a fixed-size memory region that wraps around to the beginning when it reaches the end.
How It Works
Write pointer (input)
|
v
[ P1 | P2 | P3 | P4 | P5 | P6 | ... | Pn ]
^ ^
| |
Output A Output B
read pointer read pointer
- The input writes new packets at the write pointer and advances it.
- Each output reads packets at its own read pointer and advances it.
- When a pointer reaches the end of the buffer, it wraps to the beginning.
- If an output falls too far behind the write pointer (the buffer wraps around and overwrites its data), that output is flagged as overflowed.
Buffer Sizing
The ring buffer size determines how far behind an output can fall before losing data. Vajra Cast sizes the buffer based on:
- Stream bitrate: higher bitrate streams need larger buffers for the same time window
- Output latency tolerance: outputs with higher latency (e.g., SRT with large latency settings) need more buffer space
A typical buffer holds 2-5 seconds of stream data. For a 10 Mbps stream, that is 2.5-6.25 MB, trivial on modern hardware.
Overflow Handling
If an output cannot keep up (network congestion, slow destination), its read pointer falls behind the write pointer. When the write pointer laps the read pointer, the output has lost data.
Vajra Cast handles overflow gracefully:
- The output is flagged as overflowed in metrics.
- The read pointer is fast-forwarded to the current write position.
- The output resumes from the current data, skipping the gap.
- An event is logged for operational visibility.
This prevents a slow output from blocking the input or other outputs. Each output is independent. One slow destination does not affect the rest.
Unlimited Outputs
Because zero-copy distribution has near-zero per-output cost, Vajra Cast does not impose an artificial limit on the number of outputs per route. You can add as many outputs as your network bandwidth supports.
This enables workflows that would be impractical with copy-based systems:
- Fan-out to regional CDNs: One ingest, 20+ regional CDN ingest points, each receiving the same stream.
- Parallel recording: Record the same stream in multiple formats or to multiple storage backends simultaneously.
- Monitoring taps: Add monitoring outputs without affecting production traffic.
- Redundant delivery: Send the same stream via multiple paths (primary and backup) to each destination.
Adding or removing outputs is a hot operation. It happens without interrupting the stream or affecting other outputs. Connect a new output via the REST API or the web UI, and it immediately starts receiving data from the shared buffer.
When Transcoding is Involved
Zero-copy distribution applies to passthrough routes where the input codec and output codec are the same. When transcoding is required (different resolution, bitrate, or codec), the pipeline changes:
Input -> [Decode] -> [Encode Profile A] -> Zero-copy distribution -> Outputs A1, A2, A3
-> [Encode Profile B] -> Zero-copy distribution -> Outputs B1, B2, B3
Each unique transcoding profile requires its own encode step (which is CPU-intensive). But once encoded, the output of each profile is distributed to all its outputs using the same zero-copy mechanism.
So if you need 1080p and 720p variants, you pay the transcoding cost twice (once per profile), but you still distribute each variant to as many outputs as you want at near-zero additional cost.
Practical Considerations
Network Bandwidth
Zero-copy distribution removes CPU as the bottleneck, but network bandwidth remains a hard limit. If you are distributing a 10 Mbps stream to 50 outputs, you need 500 Mbps of outbound bandwidth. Plan your network accordingly.
Output Independence
Each output in a zero-copy distribution is fully independent. One output going down (disconnecting, timing out, or overflowing) has no effect on the input or on any other output. This is a fundamental design principle. Isolation between outputs ensures that a problem in one destination never cascades.
Monitoring
Vajra Cast exposes per-output metrics for every route: packets sent, bytes sent, overflow events, current buffer lag. Use these to detect outputs that are falling behind before they overflow.
Next Steps
- Return to the Live Stream Routing Guide for the complete routing architecture
- Learn about the REST API for automating output management at scale
- Explore Audio Matrix Routing for channel-level audio control in distributed streams