Real-Time Metrics: Prometheus and Grafana Monitoring
Monitor Vajra Cast with Prometheus and Grafana. Built-in metrics, dashboard setup, alerting rules, and VMAF quality analysis.
Why Monitoring Matters for Live Streaming
A live stream either works or it does not. There is no “deploy and forget.” Network conditions change, encoders misbehave, CDN ingest points go down, and hardware degrades over time. Without real-time monitoring, you find out about problems when your viewers complain, or worse, when they leave.
Vajra Cast includes a built-in metrics system that exposes every meaningful measurement about your streaming infrastructure. These metrics integrate with the industry-standard Prometheus and Grafana stack, giving you dashboards, alerting, and historical analysis out of the box.
Built-In Metrics
Vajra Cast exposes metrics at the /metrics endpoint in Prometheus exposition format:
http://your-server:8080/metrics
No authentication is required for the metrics endpoint by default (configurable). Prometheus scrapes this endpoint at a regular interval (typically 15 seconds) and stores the time-series data.
Stream Metrics
Per-route, per-input, and per-output metrics:
| Metric | Type | Description |
|---|---|---|
vajracast_input_bitrate_bps | Gauge | Current input bitrate in bits per second |
vajracast_output_bitrate_bps | Gauge | Current output bitrate in bits per second |
vajracast_input_packets_total | Counter | Total packets received per input |
vajracast_output_packets_total | Counter | Total packets sent per output |
vajracast_input_connected | Gauge | 1 if input is connected, 0 if not |
vajracast_output_connected | Gauge | 1 if output is connected, 0 if not |
vajracast_failover_switches_total | Counter | Total failover switch events per route |
vajracast_active_input_priority | Gauge | Priority number of the currently active input |
SRT-Specific Metrics
For SRT inputs and outputs, additional protocol-level metrics:
| Metric | Type | Description |
|---|---|---|
vajracast_srt_rtt_ms | Gauge | Round-trip time in milliseconds |
vajracast_srt_packet_loss_percent | Gauge | Current packet loss percentage |
vajracast_srt_retransmit_rate | Gauge | Retransmission rate |
vajracast_srt_recv_buffer_ms | Gauge | Receive buffer level in milliseconds |
vajracast_srt_send_buffer_ms | Gauge | Send buffer level in milliseconds |
vajracast_srt_bandwidth_mbps | Gauge | Estimated available bandwidth |
System Metrics
Infrastructure-level measurements:
| Metric | Type | Description |
|---|---|---|
vajracast_cpu_usage_percent | Gauge | Application CPU usage |
vajracast_memory_usage_bytes | Gauge | Application memory usage |
vajracast_gpu_encoder_usage_percent | Gauge | Hardware encoder utilization |
vajracast_routes_active | Gauge | Number of active routes |
vajracast_uptime_seconds | Gauge | Application uptime |
All metrics include labels for route name, input/output ID, and protocol, allowing you to filter and aggregate across your entire deployment.
Prometheus Configuration
Add Vajra Cast to your Prometheus scrape configuration:
# prometheus.yml
scrape_configs:
- job_name: 'vajracast'
scrape_interval: 10s
static_configs:
- targets: ['vajracast-host:8080']
labels:
environment: 'production'
location: 'studio-a'
For multiple Vajra Cast instances, add each as a target:
static_configs:
- targets:
- 'vajracast-studio-a:8080'
- 'vajracast-studio-b:8080'
- 'vajracast-remote:8080'
If you are running Vajra Cast in Docker or Kubernetes, use service discovery instead of static targets. Prometheus supports Docker and Kubernetes service discovery natively.
Grafana Dashboards
Grafana transforms raw Prometheus metrics into visual dashboards. A well-designed streaming dashboard shows you the health of your entire infrastructure at a glance.
Recommended Dashboard Panels
Overview row:
- Total active routes (single stat)
- Total connected inputs / total inputs (ratio)
- Total connected outputs / total outputs (ratio)
- System CPU and memory usage (gauges)
Per-route row (repeated for each route):
- Input bitrate over time (graph)
- Output bitrate over time (graph)
- Active input indicator (showing which priority level is active)
- Failover event markers (annotations on the graph)
SRT health row:
- RTT over time per SRT connection (graph)
- Packet loss percentage per SRT connection (graph)
- Receive buffer level (graph, should stay below 80% of configured latency)
- Retransmission rate (graph)
Transcoding row (if using hardware transcoding):
- GPU encoder utilization (graph)
- Encoding FPS vs. target FPS (should always be at or above target)
- Output quality metrics (if VMAF is enabled)
Example PromQL Queries
Average input bitrate across all routes:
avg(vajracast_input_bitrate_bps) / 1e6
Maximum packet loss across all SRT connections:
max(vajracast_srt_packet_loss_percent)
Failover events in the last hour:
increase(vajracast_failover_switches_total[1h])
Routes where the active input is not priority 1 (failover is active):
vajracast_active_input_priority > 1
Alerting
Prometheus Alertmanager and Grafana both support alerting rules. For live streaming, set up alerts for conditions that require immediate attention.
Critical Alerts (Page Someone)
# prometheus-rules.yml
groups:
- name: vajracast-critical
rules:
- alert: StreamDisconnected
expr: vajracast_input_connected == 0 and vajracast_output_connected == 1
for: 30s
labels:
severity: critical
annotations:
summary: "Input disconnected on route {{ $labels.route }}"
- alert: AllInputsDown
expr: sum by (route) (vajracast_input_connected) == 0
for: 10s
labels:
severity: critical
annotations:
summary: "All inputs down on route {{ $labels.route }}"
Warning Alerts (Notify the Team)
- alert: FailoverActive
expr: vajracast_active_input_priority > 1
for: 1m
labels:
severity: warning
annotations:
summary: "Failover active on route {{ $labels.route }}, using priority {{ $value }}"
- alert: HighPacketLoss
expr: vajracast_srt_packet_loss_percent > 5
for: 2m
labels:
severity: warning
annotations:
summary: "SRT packet loss above 5% on {{ $labels.route }}"
- alert: LowBitrate
expr: vajracast_input_bitrate_bps < 500000
for: 1m
labels:
severity: warning
annotations:
summary: "Input bitrate below 500kbps on route {{ $labels.route }}"
Route these alerts to Slack, PagerDuty, email, or any other notification channel through Alertmanager.
VMAF Quality Analysis
For transcoded streams, bitrate alone does not tell you whether the output looks good. VMAF (Video Multimethod Assessment Fusion) is Netflix’s open-source video quality metric that correlates closely with human perception.
Vajra Cast can compute VMAF scores in real-time by comparing the transcoded output against the original input:
- VMAF score range: 0-100 (higher is better)
- 93+: Excellent. Visually indistinguishable from the source
- 80-93: Good. Minor differences visible only in side-by-side comparison
- 70-80: Fair. Noticeable artifacts under scrutiny
- Below 70: Poor. Visible quality loss
VMAF scores are exposed as Prometheus metrics:
vajracast_vmaf_score{route="main-production", profile="1080p"} 94.2
vajracast_vmaf_score{route="main-production", profile="720p"} 89.7
Alert on VMAF dropping below your quality threshold to catch transcoding issues before they affect viewer experience.
Note: Real-time VMAF computation requires additional CPU resources. Enable it selectively on routes where quality verification is critical.
Next Steps
- Return to the Broadcast Streaming Software Guide for the complete feature overview
- Learn about Hardware Transcoding to optimize the streams you are monitoring
- Explore the REST API for programmatic access to metrics and status