Hardware Transcoding: Intel QSV and VAAPI in Vajra Cast
Configure hardware-accelerated transcoding with Intel QSV and VAAPI in Vajra Cast. H.264, H.265/HEVC encoding with minimal CPU usage.
Why Hardware Transcoding?
Transcoding (converting video from one codec, resolution, or bitrate to another) is the most CPU-intensive operation in a streaming pipeline. A single 1080p60 software encode can consume 100% of a modern CPU core. If you need multiple output profiles (1080p + 720p + 480p), you are looking at multiple cores dedicated entirely to encoding.
Hardware transcoding offloads this work to dedicated silicon on the GPU or integrated graphics processor. The CPU is freed for other tasks (routing, audio processing, API requests), and you can transcode more streams per server at a fraction of the power consumption.
Vajra Cast supports hardware transcoding via Intel Quick Sync Video (QSV) and VAAPI (Video Acceleration API) on Linux.
Hardware vs. Software Transcoding
| Aspect | Software (x264/x265) | Hardware (QSV/VAAPI) |
|---|---|---|
| CPU usage | Very high (1+ cores per stream) | Minimal (<5% per stream) |
| Quality at same bitrate | Excellent | Very good |
| Encoding speed | Depends on preset | Fixed silicon, consistently fast |
| Latency | Higher with slow presets | Consistently low |
| Power consumption | High | Low |
| Parallel streams | Limited by CPU cores | Limited by GPU encoder sessions |
| Cost | Any CPU | Requires Intel GPU |
The quality difference has narrowed significantly. Intel’s latest Quick Sync implementations produce output that is visually indistinguishable from software encoding at broadcast bitrates (5-15 Mbps for 1080p). For live streaming, where you need real-time encoding and cannot use slow multi-pass presets, hardware encoding is the practical choice.
Supported Hardware
Intel Quick Sync Video (QSV)
QSV is available on Intel CPUs with integrated graphics, starting from Sandy Bridge (2011) but practically useful from Skylake (2015) onward:
| Generation | Codecs | Max Streams (approx.) | Notes |
|---|---|---|---|
| Skylake (6th gen) | H.264 | 4-8 1080p | Solid baseline |
| Kaby Lake (7th gen) | H.264, HEVC 8-bit | 6-10 1080p | First HEVC support |
| Coffee Lake (8th gen+) | H.264, HEVC 8/10-bit | 8-15 1080p | Improved quality |
| Ice Lake / Tiger Lake (10th/11th gen) | H.264, HEVC, AV1 | 10-20 1080p | AV1 encode on 12th+ |
| Alder Lake+ (12th gen+) | H.264, HEVC, AV1 | 15-30+ 1080p | Best density |
The “max streams” column is approximate and depends on resolution, frame rate, and bitrate. These numbers assume 1080p30 at 5 Mbps.
Important: You need the integrated GPU, not just the CPU. Server CPUs with the “F” suffix (e.g., i9-13900F) lack integrated graphics and cannot use QSV. Xeon E-series processors with integrated graphics do support QSV.
VAAPI
VAAPI is the Linux standard video acceleration API. On Intel hardware, it provides an alternative interface to the same Quick Sync hardware. VAAPI also works with some AMD GPUs, though Intel is the primary target for Vajra Cast.
VAAPI and QSV access the same underlying hardware on Intel platforms. The choice between them is primarily about driver compatibility:
- QSV: Uses Intel’s proprietary Media SDK / oneVPL. Better tuning options, more codec features.
- VAAPI: Uses the open-source
intel-media-driver(iHD) or legacyi965driver. Simpler setup, fewer dependencies.
Vajra Cast supports both. For most deployments, QSV is recommended for its broader feature set.
Supported Codecs
H.264 (AVC)
The universal codec. Every player, every device, every platform supports H.264. Use it when maximum compatibility matters.
- Hardware encode: QSV and VAAPI
- Hardware decode: QSV and VAAPI
- Profiles: Baseline, Main, High
- Bit depth: 8-bit
H.265 (HEVC)
Roughly 50% better compression than H.264 at the same visual quality. Use it for bandwidth-constrained scenarios or higher-quality output at the same bitrate.
- Hardware encode: QSV (7th gen+) and VAAPI
- Hardware decode: QSV and VAAPI
- Profiles: Main, Main 10
- Bit depth: 8-bit, 10-bit (8th gen+)
AV1
The newest generation codec with excellent compression efficiency. Platform support is growing rapidly.
- Hardware encode: QSV (12th gen+)
- Hardware decode: QSV (11th gen+)
- Bit depth: 8-bit, 10-bit
Configuration in Vajra Cast
Enabling Hardware Acceleration
In Docker deployments, you need to pass the GPU device to the container:
services:
vajracast:
image: vajracast/vajracast:latest
devices:
- /dev/dri:/dev/dri
group_add:
- video
- render
The /dev/dri device provides access to the GPU. The video and render groups grant the necessary permissions.
Verifying Hardware Access
Check that Vajra Cast detects your hardware:
# Inside the container or on the host
ls -la /dev/dri/
# Should show renderD128 (and possibly card0)
# Verify Intel GPU
vainfo
# Should list supported profiles (H264, HEVC, etc.)
Creating a Transcoding Profile
In the Vajra Cast web interface:
- Navigate to your route.
- Open “Transcoding” settings.
- Select “Hardware (QSV)” or “Hardware (VAAPI)” as the encoder.
- Choose your output codec (H.264, H.265, or AV1).
- Set the target resolution and bitrate.
- Apply.
Via the REST API:
curl -X POST -H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"encoder": "qsv",
"codec": "h265",
"width": 1920,
"height": 1080,
"bitrate": 8000,
"fps": 30,
"profile": "main",
"preset": "medium"
}' \
http://localhost:8080/api/v1/routes/1/transcode
Presets
Hardware encoders support quality presets that trade encoding speed for quality:
| Preset | Speed | Quality | Use Case |
|---|---|---|---|
| veryfast | Fastest | Lower | Maximum density |
| fast | Fast | Good | Standard live |
| medium | Balanced | Very good | Recommended default |
| slow | Slower | Excellent | Quality-critical |
For live streaming, medium provides the best balance. The speed difference between presets is less dramatic than with software encoding because the hardware pipeline has a fixed architecture.
Performance Tuning
Multiple Streams
Each Intel GPU has a limited number of encoder sessions. When you exceed the limit, additional encodes fall back to software. Monitor your GPU utilization:
# Intel GPU Top (install intel-gpu-tools)
intel_gpu_top
Vajra Cast reports hardware encoder utilization in its Prometheus metrics, so you can track GPU usage in your Grafana dashboards alongside CPU, memory, and network metrics.
Bitrate Control
For live streaming, use CBR (Constant Bitrate) or VBR (Variable Bitrate) with a maximum constraint:
- CBR: Consistent bandwidth usage. Required by some CDNs. Less efficient for static scenes.
- VBR: Better quality-per-bit. Spikes during complex scenes. Set a max bitrate to prevent bandwidth overruns.
Hardware encoders also support CQP (Constant Quantization Parameter) for quality-fixed encoding, but this produces variable bitrates that may not be suitable for live delivery.
Lookahead
Enable lookahead for improved quality. The encoder buffers a few frames to make better rate-control decisions:
{
"lookahead": 10,
"lookaheadDepth": 10
}
Lookahead adds a small amount of latency (equal to the number of lookahead frames divided by the frame rate). For a 10-frame lookahead at 30fps, that is 333ms of additional encoding latency.
Decode + Encode Pipeline
When transcoding, Vajra Cast uses hardware for both decoding and encoding:
Input (SRT) -> HW Decode (QSV) -> Scale/Filter -> HW Encode (QSV) -> Output (SRT)
The entire pipeline stays on the GPU. Video frames are decoded in GPU memory, scaled or filtered in GPU memory, and encoded in GPU memory. The CPU touches only the control plane. No pixel data passes through the CPU.
This is why hardware transcoding has such low CPU overhead: the CPU manages the pipeline, but the GPU does all the heavy lifting.
Next Steps
- Return to the Broadcast Streaming Software Guide for the complete feature overview
- Learn about Real-Time Metrics for monitoring transcoding performance
- Explore Docker and Kubernetes Deployment for deploying with GPU access