OTT Streaming: Build Your Own Streaming Platform Infrastructure

What is OTT Streaming?

OTT (Over-The-Top) streaming delivers video content directly to viewers over the internet, bypassing traditional broadcast and cable distribution. Netflix, Disney+, and YouTube are OTT platforms. But OTT is not limited to the giants. Sports leagues, media companies, educational institutions, houses of worship, and event organizers increasingly build their own OTT platforms to reach audiences directly.

The infrastructure challenge for OTT is different from contribution and production. Production workflows move one stream between two points. OTT distribution moves one stream to thousands or millions of viewers simultaneously, each with different network conditions and device capabilities. This requires adaptive bitrate streaming, CDN integration, and scalable origin infrastructure.

OTT Architecture with Vajra Cast

Vajra Cast serves as the origin server in an OTT architecture: the system that receives raw streams, processes them, and generates the adaptive bitrate output that CDNs distribute to viewers.

Sources                  Origin                    Delivery
--------                 ------                    --------
Encoder A --> SRT  -|                          |-- CDN Edge (US) --> Viewers
Encoder B --> SRT  -|--> Vajra Cast --> HLS ---|-- CDN Edge (EU) --> Viewers
Encoder C --> RTMP -|                          |-- CDN Edge (Asia) --> Viewers

The Three Stages

Ingest: Receive streams via SRT (preferred for reliability) or RTMP (for legacy compatibility)
Process: Hardware-accelerated transcoding to generate adaptive bitrate renditions
Distribute: HLS output served to CDN edge servers for worldwide delivery

Adaptive Bitrate Transcoding

OTT viewers watch on everything from 65-inch TVs on gigabit fiber to phones on cellular connections in subway tunnels. Adaptive bitrate (ABR) streaming solves this by encoding multiple quality levels and letting the player choose:

Example ABR Ladder

Source: 1080p60 @ 12 Mbps (SRT ingest)

Vajra Cast Transcoding (Intel QSV):
  --> 1080p @ 6,000 kbps (H.264 High)
  --> 720p  @ 3,000 kbps (H.264 Main)
  --> 480p  @ 1,500 kbps (H.264 Main)
  --> 360p  @ 800 kbps   (H.264 Baseline)
  --> Audio-only @ 128 kbps (AAC)

Output: HLS master playlist referencing all renditions

Vajra Cast generates all renditions using Intel QSV or VAAPI hardware acceleration. This is critical for OTT. Encoding five renditions in software would require significant CPU resources, while hardware acceleration handles it at a fraction of the power and cost.

Codec Considerations

Codec	Efficiency	Browser Support	Recommended For
H.264	Baseline	Universal	Maximum compatibility
HEVC/H.265	~50% better than H.264	Safari, some Android	Native apps, smart TVs
AV1	~30% better than HEVC	Chrome, Firefox, Edge	Future-facing, VOD

For live OTT in 2026, H.264 remains the safe choice for maximum reach. Use HEVC for dedicated app environments where you control the player.

CDN Integration

The CDN is the critical piece that scales your stream from one origin to millions of viewers. Vajra Cast generates HLS at the origin; the CDN caches and distributes.

Origin Pull Configuration

Most CDNs operate in “origin pull” mode: edge servers fetch segments from your origin (Vajra Cast) on first request, then cache them for subsequent viewers.

Viewer requests: https://cdn.example.com/live/channel-1/index.m3u8
  --> CDN edge checks cache
  --> If miss: pulls from Vajra Cast origin
  --> Caches segment for TTL duration
  --> Serves to viewer (and all subsequent viewers at that edge)

CDN Configuration Best Practices

Segment TTL: match your HLS segment duration (e.g., 4s segments = 4s cache TTL)
Manifest TTL: shorter than segment TTL (1-2 seconds) so viewers discover new segments promptly
Origin shield: enable if available. Reduces origin load by adding a middle caching layer
HTTP/2: enable for improved parallel segment downloads
CORS headers: configure if serving HLS to web players on different domains

Cost Optimization

CDN bandwidth is typically the largest cost in OTT streaming. Strategies to reduce it:

Efficient ABR ladder: do not offer resolutions viewers never select
HEVC where supported: same quality at half the bitrate
Appropriate segment duration: longer segments = slightly better compression efficiency
Geographic routing: serve from the closest edge to reduce transit costs
Viewer analytics: understand what renditions are actually consumed and trim the ladder accordingly

Multi-Tenant Routing

If you operate an OTT platform with multiple channels or content providers, Vajra Cast can route multiple independent streams through a single instance:

Channel A: Encoder --> SRT (Port 9001) --> Route A --> HLS /hls/channel-a/
Channel B: Encoder --> SRT (Port 9002) --> Route B --> HLS /hls/channel-b/
Channel C: Encoder --> RTMP (key: ch-c) --> Route C --> HLS /hls/channel-c/

Each channel gets its own route with independent:

Ingest settings (protocol, encryption, latency)
Transcoding profile (resolution ladder, bitrate targets)
Output configuration (HLS segment duration, playlist length)
Failover chain (backup sources per channel)
Monitoring metrics

Vajra Cast’s REST API enables automated channel provisioning. When a new content provider signs up, your platform’s backend calls the API to create the route, configure the ingest, and provision the HLS output, with no manual intervention required.

API-Driven Channel Management

POST /api/routes
{
  "name": "Channel D",
  "inputs": [{
    "protocol": "srt",
    "mode": "listener",
    "port": 9004,
    "encryption": "aes-256",
    "passphrase": "generated-passphrase"
  }],
  "outputs": [{
    "protocol": "hls",
    "segmentDuration": 4,
    "playlistLength": 10
  }]
}

Live Viewer Monitoring

Vajra Cast’s built-in HLS server tracks active viewer connections:

Per-channel viewer count: see how many viewers are connected to each HLS stream
Aggregate statistics: total viewers across all channels
Prometheus export: vajracast_hls_viewers{channel="channel-a"} for Grafana dashboards

When using a CDN, viewer counts at the origin represent CDN edge pull requests rather than individual viewers. For accurate end-user counts, correlate Vajra Cast origin metrics with CDN analytics.

Reliability for 24/7 Operation

OTT platforms run continuously. Vajra Cast is designed for unattended operation:

Crash recovery: automatic process adoption rebuilds all routes from running processes in under 5 seconds
Failover: automatic input switching keeps channels on-air even when sources fail
Hot management: add, remove, or modify channels without affecting other streams
Monitoring: Prometheus metrics and Grafana alerts catch issues before they affect viewers
Docker/Kubernetes: container orchestration with health checks and automatic restart policies

Scaling Beyond a Single Instance

For large OTT platforms, Vajra Cast supports multi-instance deployment:

Load Balancer
  --> Vajra Cast Instance 1 (Channels A-D)
  --> Vajra Cast Instance 2 (Channels E-H)
  --> Vajra Cast Instance 3 (Channels I-L)

With PostgreSQL as a shared state backend, multiple Vajra Cast instances coordinate routing and configuration. Kubernetes orchestration handles scaling, failover, and resource management.

Next Steps

Explore the SRT Streaming Gateway Guide for the full architecture
Learn about HLS output configuration for adaptive bitrate delivery
See how hot management enables live channel modifications
Read about live broadcast workflows for contribution-side architecture
Compare Vajra Cast with Wowza or Nimble Streamer for platform alternatives

← Back to main guide