Hardware Transcoding: Intel QSV and VAAPI in Vajra Cast

Why Hardware Transcoding?

Transcoding (converting video from one codec, resolution, or bitrate to another) is the most CPU-intensive operation in a streaming pipeline. A single 1080p60 software encode can consume 100% of a modern CPU core. If you need multiple output profiles (1080p + 720p + 480p), you are looking at multiple cores dedicated entirely to encoding.

Hardware transcoding offloads this work to dedicated silicon on the GPU or integrated graphics processor. The CPU is freed for other tasks (routing, audio processing, API requests), and you can transcode more streams per server at a fraction of the power consumption.

Vajra Cast supports hardware transcoding via Intel Quick Sync Video (QSV) and VAAPI (Video Acceleration API) on Linux.

Hardware vs. Software Transcoding

Aspect	Software (x264/x265)	Hardware (QSV/VAAPI)
CPU usage	Very high (1+ cores per stream)	Minimal (<5% per stream)
Quality at same bitrate	Excellent	Very good
Encoding speed	Depends on preset	Fixed silicon, consistently fast
Latency	Higher with slow presets	Consistently low
Power consumption	High	Low
Parallel streams	Limited by CPU cores	Limited by GPU encoder sessions
Cost	Any CPU	Requires Intel GPU

The quality difference has narrowed significantly. Intel’s latest Quick Sync implementations produce output that is visually indistinguishable from software encoding at broadcast bitrates (5-15 Mbps for 1080p). For live streaming, where you need real-time encoding and cannot use slow multi-pass presets, hardware encoding is the practical choice.

Supported Hardware

Intel Quick Sync Video (QSV)

QSV is available on Intel CPUs with integrated graphics, starting from Sandy Bridge (2011) but practically useful from Skylake (2015) onward:

Generation	Codecs	Max Streams (approx.)	Notes
Skylake (6th gen)	H.264	4-8 1080p	Solid baseline
Kaby Lake (7th gen)	H.264, HEVC 8-bit	6-10 1080p	First HEVC support
Coffee Lake (8th gen+)	H.264, HEVC 8/10-bit	8-15 1080p	Improved quality
Ice Lake / Tiger Lake (10th/11th gen)	H.264, HEVC, AV1	10-20 1080p	AV1 encode on 12th+
Alder Lake+ (12th gen+)	H.264, HEVC, AV1	15-30+ 1080p	Best density

The “max streams” column is approximate and depends on resolution, frame rate, and bitrate. These numbers assume 1080p30 at 5 Mbps.

Important: You need the integrated GPU, not just the CPU. Server CPUs with the “F” suffix (e.g., i9-13900F) lack integrated graphics and cannot use QSV. Xeon E-series processors with integrated graphics do support QSV.

VAAPI

VAAPI is the Linux standard video acceleration API. On Intel hardware, it provides an alternative interface to the same Quick Sync hardware. VAAPI also works with some AMD GPUs, though Intel is the primary target for Vajra Cast.

VAAPI and QSV access the same underlying hardware on Intel platforms. The choice between them is primarily about driver compatibility:

QSV: Uses Intel’s proprietary Media SDK / oneVPL. Better tuning options, more codec features.
VAAPI: Uses the open-source intel-media-driver (iHD) or legacy i965 driver. Simpler setup, fewer dependencies.

Vajra Cast supports both. For most deployments, QSV is recommended for its broader feature set.

Supported Codecs

H.264 (AVC)

The universal codec. Every player, every device, every platform supports H.264. Use it when maximum compatibility matters.

Hardware encode: QSV and VAAPI
Hardware decode: QSV and VAAPI
Profiles: Baseline, Main, High
Bit depth: 8-bit

H.265 (HEVC)

Roughly 50% better compression than H.264 at the same visual quality. Use it for bandwidth-constrained scenarios or higher-quality output at the same bitrate.

Hardware encode: QSV (7th gen+) and VAAPI
Hardware decode: QSV and VAAPI
Profiles: Main, Main 10
Bit depth: 8-bit, 10-bit (8th gen+)

AV1

The newest generation codec with excellent compression efficiency. Platform support is growing rapidly.

Hardware encode: QSV (12th gen+)
Hardware decode: QSV (11th gen+)
Bit depth: 8-bit, 10-bit

Configuration in Vajra Cast

Enabling Hardware Acceleration

In Docker deployments, you need to pass the GPU device to the container:

services:
  vajracast:
    image: vajracast/vajracast:latest
    devices:
      - /dev/dri:/dev/dri
    group_add:
      - video
      - render

The /dev/dri device provides access to the GPU. The video and render groups grant the necessary permissions.

Verifying Hardware Access

Check that Vajra Cast detects your hardware:

# Inside the container or on the host
ls -la /dev/dri/
# Should show renderD128 (and possibly card0)

# Verify Intel GPU
vainfo
# Should list supported profiles (H264, HEVC, etc.)

Creating a Transcoding Profile

In the Vajra Cast web interface:

Navigate to your route.
Open “Transcoding” settings.
Select “Hardware (QSV)” or “Hardware (VAAPI)” as the encoder.
Choose your output codec (H.264, H.265, or AV1).
Set the target resolution and bitrate.
Apply.

Via the REST API:

curl -X POST -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "encoder": "qsv",
    "codec": "h265",
    "width": 1920,
    "height": 1080,
    "bitrate": 8000,
    "fps": 30,
    "profile": "main",
    "preset": "medium"
  }' \
  http://localhost:8080/api/v1/routes/1/transcode

Presets

Hardware encoders support quality presets that trade encoding speed for quality:

Preset	Speed	Quality	Use Case
veryfast	Fastest	Lower	Maximum density
fast	Fast	Good	Standard live
medium	Balanced	Very good	Recommended default
slow	Slower	Excellent	Quality-critical

For live streaming, medium provides the best balance. The speed difference between presets is less dramatic than with software encoding because the hardware pipeline has a fixed architecture.

Performance Tuning

Multiple Streams

Each Intel GPU has a limited number of encoder sessions. When you exceed the limit, additional encodes fall back to software. Monitor your GPU utilization:

# Intel GPU Top (install intel-gpu-tools)
intel_gpu_top

Vajra Cast reports hardware encoder utilization in its Prometheus metrics, so you can track GPU usage in your Grafana dashboards alongside CPU, memory, and network metrics.

Bitrate Control

For live streaming, use CBR (Constant Bitrate) or VBR (Variable Bitrate) with a maximum constraint:

CBR: Consistent bandwidth usage. Required by some CDNs. Less efficient for static scenes.
VBR: Better quality-per-bit. Spikes during complex scenes. Set a max bitrate to prevent bandwidth overruns.

Hardware encoders also support CQP (Constant Quantization Parameter) for quality-fixed encoding, but this produces variable bitrates that may not be suitable for live delivery.

Lookahead

Enable lookahead for improved quality. The encoder buffers a few frames to make better rate-control decisions:

{
  "lookahead": 10,
  "lookaheadDepth": 10
}

Lookahead adds a small amount of latency (equal to the number of lookahead frames divided by the frame rate). For a 10-frame lookahead at 30fps, that is 333ms of additional encoding latency.

Decode + Encode Pipeline

When transcoding, Vajra Cast uses hardware for both decoding and encoding:

Input (SRT) -> HW Decode (QSV) -> Scale/Filter -> HW Encode (QSV) -> Output (SRT)

The entire pipeline stays on the GPU. Video frames are decoded in GPU memory, scaled or filtered in GPU memory, and encoded in GPU memory. The CPU touches only the control plane. No pixel data passes through the CPU.

This is why hardware transcoding has such low CPU overhead: the CPU manages the pipeline, but the GPU does all the heavy lifting.

Next Steps

Return to the Broadcast Streaming Software Guide for the complete feature overview
Learn about Real-Time Metrics for monitoring transcoding performance
Explore Docker and Kubernetes Deployment for deploying with GPU access

← Back to main guide