Hardware Transcoding with Intel QSV: GPU-Accelerated Video Processing

What is Hardware Transcoding?

Transcoding is the process of decoding a video stream from one format and re-encoding it into another. It is computationally expensive. A 1080p60 H.264 stream can consume an entire modern CPU core for encoding alone. When you need to transcode multiple streams simultaneously, or transcode at higher resolutions like 4K, software encoding hits a wall.

Hardware transcoding offloads this work to dedicated silicon on your GPU or CPU’s integrated graphics. Instead of using general-purpose CPU cores, the video encode/decode happens on fixed-function hardware blocks that are purpose-built for the task. The result: the same transcoding job runs 5-10x faster, uses a fraction of the power, and frees your CPU for other work.

When Do You Need Transcoding?

In a streaming gateway context, transcoding is necessary when:

  • Format conversion: your encoder sends HEVC (H.265) but your output destination only accepts H.264
  • Bitrate adaptation: you receive a 20 Mbps feed and need to output a 4 Mbps version for bandwidth-constrained viewers
  • Resolution scaling: converting 4K ingest to 1080p output
  • Multi-bitrate output: creating an ABR (Adaptive Bitrate) ladder from a single high-quality input
  • Codec upgrade: converting legacy H.264 feeds to HEVC for bandwidth savings

If your input and output share the same codec, resolution, and bitrate, you do not need transcoding. The gateway can pass through the stream untouched (zero-copy), which is the most efficient path. Vajra Cast automatically uses passthrough when no transformation is needed.

Hardware Transcoding Options

Three major hardware transcoding platforms exist today:

Intel Quick Sync Video (QSV)

Intel’s integrated GPU transcoding engine, available on most Intel CPUs with integrated graphics (i3, i5, i7, Xeon E with iGPU). Uses Intel’s Media SDK / oneVPL.

Strengths:

  • Available on nearly every Intel system (no discrete GPU needed)
  • Excellent quality-per-watt ratio
  • Strong HEVC encoding and decoding support
  • Available in servers and NUCs (compact form factor)
  • Low cost: the hardware is already in your CPU

Limitations:

  • Throughput limited compared to discrete GPUs (typically 4-8 simultaneous 1080p encodes)
  • Not available on F-series Intel CPUs (no iGPU) or AMD processors
  • Quality slightly below high-preset software encoding (but improving with each generation)

NVIDIA NVENC

NVIDIA’s dedicated hardware encoder on GeForce, Quadro, and Tesla GPUs.

Strengths:

  • High throughput on high-end GPUs (up to 8+ simultaneous 1080p encodes on Quadro)
  • Excellent B-frame support
  • AV1 encoding on RTX 40-series and newer
  • Widely available in workstations and cloud instances

Limitations:

  • Requires a discrete NVIDIA GPU (additional cost and power)
  • GeForce cards limited to 3 simultaneous encodes (Quadro/Tesla unlimited)
  • Driver-dependent: needs NVIDIA proprietary drivers
  • Not available in many compact or low-power server form factors

Software Encoding (libx264 / libx265)

CPU-based encoding using the x264 or x265 libraries.

Strengths:

  • Highest possible quality at any given bitrate
  • No special hardware required
  • Maximum configuration flexibility
  • Available everywhere

Limitations:

  • Extremely CPU-intensive: one 1080p stream can use 100% of a core at veryfast preset
  • Power-hungry
  • Scales poorly: each additional stream requires proportionally more CPU
  • Not viable for multi-stream workloads without massive server hardware

Comparison Table

FeatureIntel QSVNVIDIA NVENCSoftware (x264)
Hardware RequiredIntel iGPUNVIDIA GPUAny CPU
1080p30 Encode Slots4-83-unlimited1 per core
Power (per encode)~5W~15W~65W
Quality (same bitrate)GoodGoodExcellent
HEVC SupportYesYesYes (slow)
AV1 SupportGen 12+RTX 40+Very slow
LatencyVery lowVery lowPreset-dependent
Cost$0 (in CPU)$200-$10,000$0
Linux Server FriendlyExcellentDriver complexityNo issues

For streaming gateway deployments, Intel QSV is the sweet spot. It is available on common server hardware, requires no discrete GPU, and handles the typical gateway workload (2-8 simultaneous transcodes) with ease. Vajra Cast is optimized for Intel QSV and uses it as the primary hardware transcoding engine.

Intel QSV Setup

Supported Intel Hardware

QSV is available on Intel processors with integrated graphics:

GenerationExamplesKey Capabilities
6th Gen (Skylake)i7-6700, Xeon E3-1200 v5H.264 encode/decode, HEVC decode
7th Gen (Kaby Lake)i7-7700+ HEVC 10-bit decode
8th Gen (Coffee Lake)i9-9900K, Xeon E-2100+ HEVC encode
10th Gen (Ice Lake)i7-1065G7Improved HEVC quality
11th Gen (Rocket Lake)i9-11900+ AV1 decode
12th Gen (Alder Lake)i9-12900+ AV1 encode, improved quality
13th-14th Geni9-13900, i9-14900Enhanced multi-stream
Intel Core UltraUltra 7, Ultra 9Latest media engine

Important: Intel F-series CPUs (e.g., i9-12900F) have no integrated graphics and cannot use QSV. If you are purchasing hardware specifically for transcoding, avoid F-series models.

Linux Setup

Most Vajra Cast deployments run on Linux. Here is how to enable QSV:

1. Install Intel Media Drivers

On Ubuntu/Debian:

# Add Intel graphics repository
sudo apt update
sudo apt install -y intel-media-va-driver-non-free intel-gpu-tools vainfo

On RHEL/Rocky/AlmaLinux:

sudo dnf install -y intel-media-driver intel-gpu-tools libva-utils

2. Verify VAAPI Detection

vainfo

You should see output listing encode and decode capabilities:

libva info: VA-API version 1.20.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_20
vainfo: VA-API version: 1.20
vainfo: Driver version: Intel iHD driver for Intel Gen Graphics
vainfo: Supported profile and entrypoints
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSlice
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointEncSlice

Look for VAEntrypointEncSlice entries. These confirm encoding is available.

3. Check GPU Access

ls -la /dev/dri/

You should see renderD128 (and possibly renderD129). The user running Vajra Cast needs read/write access:

# Add user to the render and video groups
sudo usermod -aG render,video $USER

4. Verify with FFmpeg

ffmpeg -hwaccel qsv -c:v h264_qsv -i input.mp4 -c:v h264_qsv -b:v 4000k output.mp4

If this runs without errors, QSV is working.

Docker Setup

For containerized deployments, pass the GPU device into the container:

docker run -d \
  --name vajracast \
  --device /dev/dri:/dev/dri \
  -v /path/to/config:/config \
  -p 9000-9100:9000-9100/udp \
  -p 1935:1935/tcp \
  -p 8080:8080/tcp \
  vajracast/vajracast:latest

The --device /dev/dri:/dev/dri flag passes the Intel GPU device into the container, giving it access to QSV.

For Kubernetes:

apiVersion: v1
kind: Pod
metadata:
  name: vajracast
spec:
  containers:
  - name: vajracast
    image: vajracast/vajracast:latest
    resources:
      limits:
        gpu.intel.com/i915: 1
    securityContext:
      runAsGroup: 44  # video group

macOS Setup

On macOS with Intel processors (pre-Apple Silicon Macs), QSV is available through VideoToolbox. Vajra Cast on macOS uses VideoToolbox automatically. No additional configuration needed.

On Apple Silicon Macs (M1-M4), VideoToolbox provides hardware encoding through Apple’s media engine, which offers similar benefits to QSV. Vajra Cast uses the appropriate hardware acceleration backend automatically based on the platform.

Configuring Transcoding in Vajra Cast

Basic Transcoding Route

To set up a transcoding route in Vajra Cast:

  1. Create an ingest (SRT or RTMP)
  2. Create an output
  3. In the output settings, enable Transcoding
  4. Select the target codec:
    • H.264: maximum compatibility
    • HEVC (H.265): 30-40% smaller at equivalent quality
  5. Set the target bitrate
  6. Set the target resolution (if scaling)
  7. Vajra Cast automatically selects QSV if available, with VAAPI fallback

Transcoding Parameters

ParameterDescriptionRecommended Value
CodecOutput video codecH.264 or HEVC
BitrateTarget encoding bitrateSee table below
ResolutionOutput resolutionMatch source or scale down
PresetEncoding speed/quality trade-offbalanced (default)
ProfileH.264/HEVC profilehigh
Keyframe IntervalMaximum seconds between keyframes2

Bitrate Recommendations for Transcoding

SourceTargetRecommended Bitrate
4K H.264 30 Mbps1080p H.2646,000-8,000 Kbps
4K HEVC 15 Mbps1080p H.2646,000-8,000 Kbps
1080p H.264 8 Mbps1080p HEVC4,000-5,000 Kbps
1080p H.264 8 Mbps720p H.2643,000-4,000 Kbps
1080p HEVC 5 Mbps1080p H.2646,000-8,000 Kbps

HEVC (H.265) Transcoding

HEVC is increasingly important for contribution feeds. It delivers equivalent visual quality at 30-40% lower bitrate compared to H.264, which means:

  • Lower bandwidth costs for long-distance SRT contribution
  • Better quality on bandwidth-constrained paths (cellular, satellite)
  • 4K viability on standard internet connections

With QSV, HEVC encoding is hardware-accelerated and runs at the same speed as H.264 encoding. The quality penalty versus software x265 is minimal on modern Intel hardware (11th gen and newer).

A common workflow: receive HEVC from a remote encoder (saving bandwidth on the contribution path), then transcode to H.264 for output to platforms that do not support HEVC ingest.

Remote Camera → HEVC (SRT, 5 Mbps) → Vajra Cast [QSV transcode] → H.264 (RTMP, 6 Mbps) → YouTube
                                                                  → HEVC (SRT, 5 Mbps) → Archive

Performance Benchmarks

Real-world benchmarks on common Intel hardware, measured with Vajra Cast:

Intel i7-12700 (12th Gen, Alder Lake)

WorkloadCPU UsageGPU UsageLatency Added
1x 1080p30 H.264→H.2643%15%<5ms
1x 1080p30 H.264→HEVC3%20%<5ms
4x 1080p30 H.264→H.2648%55%<5ms
1x 4K30 H.264→1080p H.2645%35%<8ms
1x 1080p60 H.264→H.2644%25%<5ms

Intel Xeon E-2388G (Server)

WorkloadCPU UsageGPU UsageLatency Added
1x 1080p30 H.264→H.2642%12%<5ms
4x 1080p30 H.264→H.2646%45%<5ms
8x 1080p30 H.264→H.26410%85%<8ms
2x 4K30 H.264→1080p H.2646%50%<8ms

Intel N100 (Low-Power / Mini PC)

WorkloadCPU UsageGPU UsageLatency Added
1x 1080p30 H.264→H.2648%40%<8ms
2x 1080p30 H.264→H.26412%75%<10ms
1x 1080p30 H.264→HEVC10%50%<8ms

Key takeaways:

  • CPU usage is minimal: hardware transcoding barely touches the CPU, leaving it free for gateway routing, monitoring, and other tasks
  • Transcoding latency is negligible: under 10ms in all cases, invisible in a streaming context
  • Even low-power hardware handles the job: an Intel N100 mini PC can transcode 2 simultaneous 1080p streams

Comparison: QSV vs Software x264

Encoding 1x 1080p30 H.264 at 6000 Kbps:

MethodCPU UsageTime per FramePowerVMAF Score
x264 ultrafast45% (1 core)8ms~65W89
x264 veryfast80% (1 core)15ms~85W92
x264 medium100% (2+ cores)33ms~120W95
QSV balanced3%2ms~5W91

QSV at “balanced” preset achieves quality comparable to x264 veryfast while using 1/15th the CPU and 1/17th the power. For a streaming gateway that needs to transcode continuously 24/7, this difference is transformative.

The Auto-Fallback Chain

Vajra Cast implements an automatic fallback chain for transcoding:

Intel QSV → VAAPI → Software (libx264/libx265)
  1. QSV preferred: if Intel QSV is detected and the codec is supported, it is used
  2. VAAPI fallback: if QSV is not available but VAAPI is (e.g., some AMD GPUs or older Intel drivers), VAAPI is used
  3. Software last resort: if no hardware acceleration is available, software encoding is used

This fallback is automatic. You do not need to configure it. Vajra Cast detects available hardware at startup and selects the best option. The active transcoding engine is visible in the dashboard for each route.

Monitoring Transcoding Performance

Vajra Cast exposes per-route transcoding metrics:

  • GPU utilization: percentage of QSV media engine in use
  • Encode FPS: frames per second being encoded (should match source frame rate)
  • Encode latency: time per frame in milliseconds
  • Output bitrate: actual encoding bitrate (may differ slightly from target)
  • VMAF score: automated video quality assessment (0-100) comparing transcoded output to source

These metrics are available in the web dashboard and via the Prometheus /metrics endpoint. Use them to:

  • Detect GPU overload (utilization >90% sustained)
  • Verify output quality (VMAF >85 is generally good, >90 is excellent)
  • Plan capacity (how many more transcodes can this hardware handle?)

Best Practices

  1. Use passthrough when possible. If input and output share the same codec and resolution, skip transcoding entirely. It is always faster and preserves original quality.
  2. Match keyframe intervals. Set the transcoder keyframe interval to match your output platform requirements (2 seconds is the universal safe choice).
  3. Monitor GPU utilization. Keep sustained GPU usage below 80% to leave headroom for bitrate spikes and retransmission overhead.
  4. Test HEVC compatibility. Before switching outputs to HEVC, verify the downstream player or platform supports it. Not all do.
  5. Use CBR for transcoded outputs. Constant bitrate produces more predictable quality and simplifies bandwidth planning.
  6. Keep firmware and drivers updated. Intel regularly improves QSV quality and performance through driver updates. On Linux, keep intel-media-driver current.

For the full Vajra Cast feature set including zero-copy distribution, failover, and monitoring, see our SRT Streaming Gateway guide.