Why Monitoring Matters for Live Streaming

A live stream either works or it does not. There is no “deploy and forget.” Network conditions change, encoders misbehave, CDN ingest points go down, and hardware degrades over time. Without real-time monitoring, you find out about problems when your viewers complain, or worse, when they leave.

Vajra Cast includes a built-in metrics system that exposes every meaningful measurement about your streaming infrastructure. These metrics integrate with the industry-standard Prometheus and Grafana stack, giving you dashboards, alerting, and historical analysis out of the box.

Built-In Metrics

Vajra Cast exposes metrics at the /metrics endpoint in Prometheus exposition format:

http://your-server:8080/metrics

No authentication is required for the metrics endpoint by default (configurable). Prometheus scrapes this endpoint at a regular interval (typically 15 seconds) and stores the time-series data.

Stream Metrics

Per-route, per-input, and per-output metrics:

MetricTypeDescription
vajracast_input_bitrate_bpsGaugeCurrent input bitrate in bits per second
vajracast_output_bitrate_bpsGaugeCurrent output bitrate in bits per second
vajracast_input_packets_totalCounterTotal packets received per input
vajracast_output_packets_totalCounterTotal packets sent per output
vajracast_input_connectedGauge1 if input is connected, 0 if not
vajracast_output_connectedGauge1 if output is connected, 0 if not
vajracast_failover_switches_totalCounterTotal failover switch events per route
vajracast_active_input_priorityGaugePriority number of the currently active input

SRT-Specific Metrics

For SRT inputs and outputs, additional protocol-level metrics:

MetricTypeDescription
vajracast_srt_rtt_msGaugeRound-trip time in milliseconds
vajracast_srt_packet_loss_percentGaugeCurrent packet loss percentage
vajracast_srt_retransmit_rateGaugeRetransmission rate
vajracast_srt_recv_buffer_msGaugeReceive buffer level in milliseconds
vajracast_srt_send_buffer_msGaugeSend buffer level in milliseconds
vajracast_srt_bandwidth_mbpsGaugeEstimated available bandwidth

System Metrics

Infrastructure-level measurements:

MetricTypeDescription
vajracast_cpu_usage_percentGaugeApplication CPU usage
vajracast_memory_usage_bytesGaugeApplication memory usage
vajracast_gpu_encoder_usage_percentGaugeHardware encoder utilization
vajracast_routes_activeGaugeNumber of active routes
vajracast_uptime_secondsGaugeApplication uptime

All metrics include labels for route name, input/output ID, and protocol, allowing you to filter and aggregate across your entire deployment.

Prometheus Configuration

Add Vajra Cast to your Prometheus scrape configuration:

# prometheus.yml
scrape_configs:
  - job_name: 'vajracast'
    scrape_interval: 10s
    static_configs:
      - targets: ['vajracast-host:8080']
        labels:
          environment: 'production'
          location: 'studio-a'

For multiple Vajra Cast instances, add each as a target:

    static_configs:
      - targets:
          - 'vajracast-studio-a:8080'
          - 'vajracast-studio-b:8080'
          - 'vajracast-remote:8080'

If you are running Vajra Cast in Docker or Kubernetes, use service discovery instead of static targets. Prometheus supports Docker and Kubernetes service discovery natively.

Grafana Dashboards

Grafana transforms raw Prometheus metrics into visual dashboards. A well-designed streaming dashboard shows you the health of your entire infrastructure at a glance.

Overview row:

  • Total active routes (single stat)
  • Total connected inputs / total inputs (ratio)
  • Total connected outputs / total outputs (ratio)
  • System CPU and memory usage (gauges)

Per-route row (repeated for each route):

  • Input bitrate over time (graph)
  • Output bitrate over time (graph)
  • Active input indicator (showing which priority level is active)
  • Failover event markers (annotations on the graph)

SRT health row:

  • RTT over time per SRT connection (graph)
  • Packet loss percentage per SRT connection (graph)
  • Receive buffer level (graph, should stay below 80% of configured latency)
  • Retransmission rate (graph)

Transcoding row (if using hardware transcoding):

  • GPU encoder utilization (graph)
  • Encoding FPS vs. target FPS (should always be at or above target)
  • Output quality metrics (if VMAF is enabled)

Example PromQL Queries

Average input bitrate across all routes:

avg(vajracast_input_bitrate_bps) / 1e6

Maximum packet loss across all SRT connections:

max(vajracast_srt_packet_loss_percent)

Failover events in the last hour:

increase(vajracast_failover_switches_total[1h])

Routes where the active input is not priority 1 (failover is active):

vajracast_active_input_priority > 1

Alerting

Prometheus Alertmanager and Grafana both support alerting rules. For live streaming, set up alerts for conditions that require immediate attention.

Critical Alerts (Page Someone)

# prometheus-rules.yml
groups:
  - name: vajracast-critical
    rules:
      - alert: StreamDisconnected
        expr: vajracast_input_connected == 0 and vajracast_output_connected == 1
        for: 30s
        labels:
          severity: critical
        annotations:
          summary: "Input disconnected on route {{ $labels.route }}"

      - alert: AllInputsDown
        expr: sum by (route) (vajracast_input_connected) == 0
        for: 10s
        labels:
          severity: critical
        annotations:
          summary: "All inputs down on route {{ $labels.route }}"

Warning Alerts (Notify the Team)

      - alert: FailoverActive
        expr: vajracast_active_input_priority > 1
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Failover active on route {{ $labels.route }}, using priority {{ $value }}"

      - alert: HighPacketLoss
        expr: vajracast_srt_packet_loss_percent > 5
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "SRT packet loss above 5% on {{ $labels.route }}"

      - alert: LowBitrate
        expr: vajracast_input_bitrate_bps < 500000
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Input bitrate below 500kbps on route {{ $labels.route }}"

Route these alerts to Slack, PagerDuty, email, or any other notification channel through Alertmanager.

VMAF Quality Analysis

For transcoded streams, bitrate alone does not tell you whether the output looks good. VMAF (Video Multimethod Assessment Fusion) is Netflix’s open-source video quality metric that correlates closely with human perception.

Vajra Cast can compute VMAF scores in real-time by comparing the transcoded output against the original input:

  • VMAF score range: 0-100 (higher is better)
  • 93+: Excellent. Visually indistinguishable from the source
  • 80-93: Good. Minor differences visible only in side-by-side comparison
  • 70-80: Fair. Noticeable artifacts under scrutiny
  • Below 70: Poor. Visible quality loss

VMAF scores are exposed as Prometheus metrics:

vajracast_vmaf_score{route="main-production", profile="1080p"} 94.2
vajracast_vmaf_score{route="main-production", profile="720p"} 89.7

Alert on VMAF dropping below your quality threshold to catch transcoding issues before they affect viewer experience.

Note: Real-time VMAF computation requires additional CPU resources. Enable it selectively on routes where quality verification is critical.

Next Steps