OTT Streaming: Build Your Own Streaming Platform Infrastructure
Build OTT streaming infrastructure with Vajra Cast. HLS output, CDN integration, multi-tenant routing, and hardware transcoding.
What is OTT Streaming?
OTT (Over-The-Top) streaming delivers video content directly to viewers over the internet, bypassing traditional broadcast and cable distribution. Netflix, Disney+, and YouTube are OTT platforms. But OTT is not limited to the giants. Sports leagues, media companies, educational institutions, houses of worship, and event organizers increasingly build their own OTT platforms to reach audiences directly.
The infrastructure challenge for OTT is different from contribution and production. Production workflows move one stream between two points. OTT distribution moves one stream to thousands or millions of viewers simultaneously, each with different network conditions and device capabilities. This requires adaptive bitrate streaming, CDN integration, and scalable origin infrastructure.
OTT Architecture with Vajra Cast
Vajra Cast serves as the origin server in an OTT architecture: the system that receives raw streams, processes them, and generates the adaptive bitrate output that CDNs distribute to viewers.
Sources Origin Delivery
-------- ------ --------
Encoder A --> SRT -| |-- CDN Edge (US) --> Viewers
Encoder B --> SRT -|--> Vajra Cast --> HLS ---|-- CDN Edge (EU) --> Viewers
Encoder C --> RTMP -| |-- CDN Edge (Asia) --> Viewers
The Three Stages
- Ingest: Receive streams via SRT (preferred for reliability) or RTMP (for legacy compatibility)
- Process: Hardware-accelerated transcoding to generate adaptive bitrate renditions
- Distribute: HLS output served to CDN edge servers for worldwide delivery
Adaptive Bitrate Transcoding
OTT viewers watch on everything from 65-inch TVs on gigabit fiber to phones on cellular connections in subway tunnels. Adaptive bitrate (ABR) streaming solves this by encoding multiple quality levels and letting the player choose:
Example ABR Ladder
Source: 1080p60 @ 12 Mbps (SRT ingest)
Vajra Cast Transcoding (Intel QSV):
--> 1080p @ 6,000 kbps (H.264 High)
--> 720p @ 3,000 kbps (H.264 Main)
--> 480p @ 1,500 kbps (H.264 Main)
--> 360p @ 800 kbps (H.264 Baseline)
--> Audio-only @ 128 kbps (AAC)
Output: HLS master playlist referencing all renditions
Vajra Cast generates all renditions using Intel QSV or VAAPI hardware acceleration. This is critical for OTT. Encoding five renditions in software would require significant CPU resources, while hardware acceleration handles it at a fraction of the power and cost.
Codec Considerations
| Codec | Efficiency | Browser Support | Recommended For |
|---|---|---|---|
| H.264 | Baseline | Universal | Maximum compatibility |
| HEVC/H.265 | ~50% better than H.264 | Safari, some Android | Native apps, smart TVs |
| AV1 | ~30% better than HEVC | Chrome, Firefox, Edge | Future-facing, VOD |
For live OTT in 2026, H.264 remains the safe choice for maximum reach. Use HEVC for dedicated app environments where you control the player.
CDN Integration
The CDN is the critical piece that scales your stream from one origin to millions of viewers. Vajra Cast generates HLS at the origin; the CDN caches and distributes.
Origin Pull Configuration
Most CDNs operate in “origin pull” mode: edge servers fetch segments from your origin (Vajra Cast) on first request, then cache them for subsequent viewers.
Viewer requests: https://cdn.example.com/live/channel-1/index.m3u8
--> CDN edge checks cache
--> If miss: pulls from Vajra Cast origin
--> Caches segment for TTL duration
--> Serves to viewer (and all subsequent viewers at that edge)
CDN Configuration Best Practices
- Segment TTL: match your HLS segment duration (e.g., 4s segments = 4s cache TTL)
- Manifest TTL: shorter than segment TTL (1-2 seconds) so viewers discover new segments promptly
- Origin shield: enable if available. Reduces origin load by adding a middle caching layer
- HTTP/2: enable for improved parallel segment downloads
- CORS headers: configure if serving HLS to web players on different domains
Cost Optimization
CDN bandwidth is typically the largest cost in OTT streaming. Strategies to reduce it:
- Efficient ABR ladder: do not offer resolutions viewers never select
- HEVC where supported: same quality at half the bitrate
- Appropriate segment duration: longer segments = slightly better compression efficiency
- Geographic routing: serve from the closest edge to reduce transit costs
- Viewer analytics: understand what renditions are actually consumed and trim the ladder accordingly
Multi-Tenant Routing
If you operate an OTT platform with multiple channels or content providers, Vajra Cast can route multiple independent streams through a single instance:
Channel A: Encoder --> SRT (Port 9001) --> Route A --> HLS /hls/channel-a/
Channel B: Encoder --> SRT (Port 9002) --> Route B --> HLS /hls/channel-b/
Channel C: Encoder --> RTMP (key: ch-c) --> Route C --> HLS /hls/channel-c/
Each channel gets its own route with independent:
- Ingest settings (protocol, encryption, latency)
- Transcoding profile (resolution ladder, bitrate targets)
- Output configuration (HLS segment duration, playlist length)
- Failover chain (backup sources per channel)
- Monitoring metrics
Vajra Cast’s REST API enables automated channel provisioning. When a new content provider signs up, your platform’s backend calls the API to create the route, configure the ingest, and provision the HLS output, with no manual intervention required.
API-Driven Channel Management
POST /api/routes
{
"name": "Channel D",
"inputs": [{
"protocol": "srt",
"mode": "listener",
"port": 9004,
"encryption": "aes-256",
"passphrase": "generated-passphrase"
}],
"outputs": [{
"protocol": "hls",
"segmentDuration": 4,
"playlistLength": 10
}]
}
Live Viewer Monitoring
Vajra Cast’s built-in HLS server tracks active viewer connections:
- Per-channel viewer count: see how many viewers are connected to each HLS stream
- Aggregate statistics: total viewers across all channels
- Prometheus export:
vajracast_hls_viewers{channel="channel-a"}for Grafana dashboards
When using a CDN, viewer counts at the origin represent CDN edge pull requests rather than individual viewers. For accurate end-user counts, correlate Vajra Cast origin metrics with CDN analytics.
Reliability for 24/7 Operation
OTT platforms run continuously. Vajra Cast is designed for unattended operation:
- Crash recovery: automatic process adoption rebuilds all routes from running processes in under 5 seconds
- Failover: automatic input switching keeps channels on-air even when sources fail
- Hot management: add, remove, or modify channels without affecting other streams
- Monitoring: Prometheus metrics and Grafana alerts catch issues before they affect viewers
- Docker/Kubernetes: container orchestration with health checks and automatic restart policies
Scaling Beyond a Single Instance
For large OTT platforms, Vajra Cast supports multi-instance deployment:
Load Balancer
--> Vajra Cast Instance 1 (Channels A-D)
--> Vajra Cast Instance 2 (Channels E-H)
--> Vajra Cast Instance 3 (Channels I-L)
With PostgreSQL as a shared state backend, multiple Vajra Cast instances coordinate routing and configuration. Kubernetes orchestration handles scaling, failover, and resource management.
Next Steps
- Explore the SRT Streaming Gateway Guide for the full architecture
- Learn about HLS output configuration for adaptive bitrate delivery
- See how hot management enables live channel modifications
- Read about live broadcast workflows for contribution-side architecture
- Compare Vajra Cast with Wowza or Nimble Streamer for platform alternatives