Hardware Transcoding with Intel QSV: GPU-Accelerated Video Processing

January 20, 2026 · Stephane · tutorials

What is Hardware Transcoding?

Transcoding is the process of decoding a video stream from one format and re-encoding it into another. It is computationally expensive. A 1080p60 H.264 stream can consume an entire modern CPU core for encoding alone. When you need to transcode multiple streams simultaneously, or transcode at higher resolutions like 4K, software encoding hits a wall.

Hardware transcoding offloads this work to dedicated silicon on your GPU or CPU’s integrated graphics. Instead of using general-purpose CPU cores, the video encode/decode happens on fixed-function hardware blocks that are purpose-built for the task. The result: the same transcoding job runs 5-10x faster, uses a fraction of the power, and frees your CPU for other work.

When Do You Need Transcoding?

In a streaming gateway context, transcoding is necessary when:

Format conversion: your encoder sends HEVC (H.265) but your output destination only accepts H.264
Bitrate adaptation: you receive a 20 Mbps feed and need to output a 4 Mbps version for bandwidth-constrained viewers
Resolution scaling: converting 4K ingest to 1080p output
Multi-bitrate output: creating an ABR (Adaptive Bitrate) ladder from a single high-quality input
Codec upgrade: converting legacy H.264 feeds to HEVC for bandwidth savings

If your input and output share the same codec, resolution, and bitrate, you do not need transcoding. The gateway can pass through the stream untouched (zero-copy), which is the most efficient path. Vajra Cast automatically uses passthrough when no transformation is needed.

Hardware Transcoding Options

Three major hardware transcoding platforms exist today:

Intel Quick Sync Video (QSV)

Intel’s integrated GPU transcoding engine, available on most Intel CPUs with integrated graphics (i3, i5, i7, Xeon E with iGPU). Uses Intel’s Media SDK / oneVPL.

Strengths:

Available on nearly every Intel system (no discrete GPU needed)
Excellent quality-per-watt ratio
Strong HEVC encoding and decoding support
Available in servers and NUCs (compact form factor)
Low cost: the hardware is already in your CPU

Limitations:

Throughput limited compared to discrete GPUs (typically 4-8 simultaneous 1080p encodes)
Not available on F-series Intel CPUs (no iGPU) or AMD processors
Quality slightly below high-preset software encoding (but improving with each generation)

NVIDIA NVENC

NVIDIA’s dedicated hardware encoder on GeForce, Quadro, and Tesla GPUs.

Strengths:

High throughput on high-end GPUs (up to 8+ simultaneous 1080p encodes on Quadro)
Excellent B-frame support
AV1 encoding on RTX 40-series and newer
Widely available in workstations and cloud instances

Limitations:

Requires a discrete NVIDIA GPU (additional cost and power)
GeForce cards limited to 3 simultaneous encodes (Quadro/Tesla unlimited)
Driver-dependent: needs NVIDIA proprietary drivers
Not available in many compact or low-power server form factors

Software Encoding (libx264 / libx265)

CPU-based encoding using the x264 or x265 libraries.

Strengths:

Highest possible quality at any given bitrate
No special hardware required
Maximum configuration flexibility
Available everywhere

Limitations:

Extremely CPU-intensive: one 1080p stream can use 100% of a core at veryfast preset
Power-hungry
Scales poorly: each additional stream requires proportionally more CPU
Not viable for multi-stream workloads without massive server hardware

Comparison Table

Feature	Intel QSV	NVIDIA NVENC	Software (x264)
Hardware Required	Intel iGPU	NVIDIA GPU	Any CPU
1080p30 Encode Slots	4-8	3-unlimited	1 per core
Power (per encode)	~5W	~15W	~65W
Quality (same bitrate)	Good	Good	Excellent
HEVC Support	Yes	Yes	Yes (slow)
AV1 Support	Gen 12+	RTX 40+	Very slow
Latency	Very low	Very low	Preset-dependent
Cost	$0 (in CPU)	$200-$10,000	$0
Linux Server Friendly	Excellent	Driver complexity	No issues

For streaming gateway deployments, Intel QSV is the sweet spot. It is available on common server hardware, requires no discrete GPU, and handles the typical gateway workload (2-8 simultaneous transcodes) with ease. Vajra Cast is optimized for Intel QSV and uses it as the primary hardware transcoding engine.

Intel QSV Setup

Supported Intel Hardware

QSV is available on Intel processors with integrated graphics:

Generation	Examples	Key Capabilities
6th Gen (Skylake)	i7-6700, Xeon E3-1200 v5	H.264 encode/decode, HEVC decode
7th Gen (Kaby Lake)	i7-7700	+ HEVC 10-bit decode
8th Gen (Coffee Lake)	i9-9900K, Xeon E-2100	+ HEVC encode
10th Gen (Ice Lake)	i7-1065G7	Improved HEVC quality
11th Gen (Rocket Lake)	i9-11900	+ AV1 decode
12th Gen (Alder Lake)	i9-12900	+ AV1 encode, improved quality
13th-14th Gen	i9-13900, i9-14900	Enhanced multi-stream
Intel Core Ultra	Ultra 7, Ultra 9	Latest media engine

Important: Intel F-series CPUs (e.g., i9-12900F) have no integrated graphics and cannot use QSV. If you are purchasing hardware specifically for transcoding, avoid F-series models.

Linux Setup

Most Vajra Cast deployments run on Linux. Here is how to enable QSV:

1. Install Intel Media Drivers

On Ubuntu/Debian:

# Add Intel graphics repository
sudo apt update
sudo apt install -y intel-media-va-driver-non-free intel-gpu-tools vainfo

On RHEL/Rocky/AlmaLinux:

sudo dnf install -y intel-media-driver intel-gpu-tools libva-utils

2. Verify VAAPI Detection

vainfo

You should see output listing encode and decode capabilities:

libva info: VA-API version 1.20.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_20
vainfo: VA-API version: 1.20
vainfo: Driver version: Intel iHD driver for Intel Gen Graphics
vainfo: Supported profile and entrypoints
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointEncSlice
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointEncSlice
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointEncSlice
      VAProfileHEVCMain10             : VAEntrypointVLD
      VAProfileHEVCMain10             : VAEntrypointEncSlice

Look for VAEntrypointEncSlice entries. These confirm encoding is available.

3. Check GPU Access

ls -la /dev/dri/

You should see renderD128 (and possibly renderD129). The user running Vajra Cast needs read/write access:

# Add user to the render and video groups
sudo usermod -aG render,video $USER

4. Verify with FFmpeg

ffmpeg -hwaccel qsv -c:v h264_qsv -i input.mp4 -c:v h264_qsv -b:v 4000k output.mp4

If this runs without errors, QSV is working.

Docker Setup

For containerized deployments, pass the GPU device into the container:

docker run -d \
  --name vajracast \
  --device /dev/dri:/dev/dri \
  -v /path/to/config:/config \
  -p 9000-9100:9000-9100/udp \
  -p 1935:1935/tcp \
  -p 8080:8080/tcp \
  vajracast/vajracast:latest

The --device /dev/dri:/dev/dri flag passes the Intel GPU device into the container, giving it access to QSV.

For Kubernetes:

apiVersion: v1
kind: Pod
metadata:
  name: vajracast
spec:
  containers:
  - name: vajracast
    image: vajracast/vajracast:latest
    resources:
      limits:
        gpu.intel.com/i915: 1
    securityContext:
      runAsGroup: 44  # video group

macOS Setup

On macOS with Intel processors (pre-Apple Silicon Macs), QSV is available through VideoToolbox. Vajra Cast on macOS uses VideoToolbox automatically. No additional configuration needed.

On Apple Silicon Macs (M1-M4), VideoToolbox provides hardware encoding through Apple’s media engine, which offers similar benefits to QSV. Vajra Cast uses the appropriate hardware acceleration backend automatically based on the platform.

Configuring Transcoding in Vajra Cast

Basic Transcoding Route

To set up a transcoding route in Vajra Cast:

Create an ingest (SRT or RTMP)
Create an output
In the output settings, enable Transcoding
Select the target codec:
- H.264: maximum compatibility
- HEVC (H.265): 30-40% smaller at equivalent quality
Set the target bitrate
Set the target resolution (if scaling)
Vajra Cast automatically selects QSV if available, with VAAPI fallback

Transcoding Parameters

Parameter	Description	Recommended Value
Codec	Output video codec	H.264 or HEVC
Bitrate	Target encoding bitrate	See table below
Resolution	Output resolution	Match source or scale down
Preset	Encoding speed/quality trade-off	`balanced` (default)
Profile	H.264/HEVC profile	`high`
Keyframe Interval	Maximum seconds between keyframes	2

Bitrate Recommendations for Transcoding

Source	Target	Recommended Bitrate
4K H.264 30 Mbps	1080p H.264	6,000-8,000 Kbps
4K HEVC 15 Mbps	1080p H.264	6,000-8,000 Kbps
1080p H.264 8 Mbps	1080p HEVC	4,000-5,000 Kbps
1080p H.264 8 Mbps	720p H.264	3,000-4,000 Kbps
1080p HEVC 5 Mbps	1080p H.264	6,000-8,000 Kbps

HEVC (H.265) Transcoding

HEVC is increasingly important for contribution feeds. It delivers equivalent visual quality at 30-40% lower bitrate compared to H.264, which means:

Lower bandwidth costs for long-distance SRT contribution
Better quality on bandwidth-constrained paths (cellular, satellite)
4K viability on standard internet connections

With QSV, HEVC encoding is hardware-accelerated and runs at the same speed as H.264 encoding. The quality penalty versus software x265 is minimal on modern Intel hardware (11th gen and newer).

A common workflow: receive HEVC from a remote encoder (saving bandwidth on the contribution path), then transcode to H.264 for output to platforms that do not support HEVC ingest.

Remote Camera → HEVC (SRT, 5 Mbps) → Vajra Cast [QSV transcode] → H.264 (RTMP, 6 Mbps) → YouTube
                                                                  → HEVC (SRT, 5 Mbps) → Archive

Performance Benchmarks

Real-world benchmarks on common Intel hardware, measured with Vajra Cast:

Intel i7-12700 (12th Gen, Alder Lake)

Workload	CPU Usage	GPU Usage	Latency Added
1x 1080p30 H.264→H.264	3%	15%	<5ms
1x 1080p30 H.264→HEVC	3%	20%	<5ms
4x 1080p30 H.264→H.264	8%	55%	<5ms
1x 4K30 H.264→1080p H.264	5%	35%	<8ms
1x 1080p60 H.264→H.264	4%	25%	<5ms

Intel Xeon E-2388G (Server)

Workload	CPU Usage	GPU Usage	Latency Added
1x 1080p30 H.264→H.264	2%	12%	<5ms
4x 1080p30 H.264→H.264	6%	45%	<5ms
8x 1080p30 H.264→H.264	10%	85%	<8ms
2x 4K30 H.264→1080p H.264	6%	50%	<8ms

Intel N100 (Low-Power / Mini PC)

Workload	CPU Usage	GPU Usage	Latency Added
1x 1080p30 H.264→H.264	8%	40%	<8ms
2x 1080p30 H.264→H.264	12%	75%	<10ms
1x 1080p30 H.264→HEVC	10%	50%	<8ms

Key takeaways:

CPU usage is minimal: hardware transcoding barely touches the CPU, leaving it free for gateway routing, monitoring, and other tasks
Transcoding latency is negligible: under 10ms in all cases, invisible in a streaming context
Even low-power hardware handles the job: an Intel N100 mini PC can transcode 2 simultaneous 1080p streams

Comparison: QSV vs Software x264

Encoding 1x 1080p30 H.264 at 6000 Kbps:

Method	CPU Usage	Time per Frame	Power	VMAF Score
x264 ultrafast	45% (1 core)	8ms	~65W	89
x264 veryfast	80% (1 core)	15ms	~85W	92
x264 medium	100% (2+ cores)	33ms	~120W	95
QSV balanced	3%	2ms	~5W	91

QSV at “balanced” preset achieves quality comparable to x264 veryfast while using 1/15th the CPU and 1/17th the power. For a streaming gateway that needs to transcode continuously 24/7, this difference is transformative.

The Auto-Fallback Chain

Vajra Cast implements an automatic fallback chain for transcoding:

Intel QSV → VAAPI → Software (libx264/libx265)

QSV preferred: if Intel QSV is detected and the codec is supported, it is used
VAAPI fallback: if QSV is not available but VAAPI is (e.g., some AMD GPUs or older Intel drivers), VAAPI is used
Software last resort: if no hardware acceleration is available, software encoding is used

This fallback is automatic. You do not need to configure it. Vajra Cast detects available hardware at startup and selects the best option. The active transcoding engine is visible in the dashboard for each route.

Monitoring Transcoding Performance

Vajra Cast exposes per-route transcoding metrics:

GPU utilization: percentage of QSV media engine in use
Encode FPS: frames per second being encoded (should match source frame rate)
Encode latency: time per frame in milliseconds
Output bitrate: actual encoding bitrate (may differ slightly from target)
VMAF score: automated video quality assessment (0-100) comparing transcoded output to source

These metrics are available in the web dashboard and via the Prometheus /metrics endpoint. Use them to:

Detect GPU overload (utilization >90% sustained)
Verify output quality (VMAF >85 is generally good, >90 is excellent)
Plan capacity (how many more transcodes can this hardware handle?)

Best Practices

Use passthrough when possible. If input and output share the same codec and resolution, skip transcoding entirely. It is always faster and preserves original quality.
Match keyframe intervals. Set the transcoder keyframe interval to match your output platform requirements (2 seconds is the universal safe choice).
Monitor GPU utilization. Keep sustained GPU usage below 80% to leave headroom for bitrate spikes and retransmission overhead.
Test HEVC compatibility. Before switching outputs to HEVC, verify the downstream player or platform supports it. Not all do.
Use CBR for transcoded outputs. Constant bitrate produces more predictable quality and simplifies bandwidth planning.
Keep firmware and drivers updated. Intel regularly improves QSV quality and performance through driver updates. On Linux, keep intel-media-driver current.

For the full Vajra Cast feature set including zero-copy distribution, failover, and monitoring, see our SRT Streaming Gateway guide.

← Back to Guides

What is Hardware Transcoding?

When Do You Need Transcoding?

Hardware Transcoding Options

Intel Quick Sync Video (QSV)

NVIDIA NVENC

Software Encoding (libx264 / libx265)

Comparison Table

Intel QSV Setup

Supported Intel Hardware

Linux Setup

1. Install Intel Media Drivers

2. Verify VAAPI Detection

3. Check GPU Access

4. Verify with FFmpeg

Docker Setup

macOS Setup

Configuring Transcoding in Vajra Cast

Basic Transcoding Route

Transcoding Parameters

Bitrate Recommendations for Transcoding

HEVC (H.265) Transcoding

Performance Benchmarks

Intel i7-12700 (12th Gen, Alder Lake)

Intel Xeon E-2388G (Server)

Intel N100 (Low-Power / Mini PC)

Comparison: QSV vs Software x264

The Auto-Fallback Chain

Monitoring Transcoding Performance

Best Practices

Related Guides

Video Stream Failover: Best Practices for Zero-Downtime Broadcasting

SRT Streaming Setup Guide: From Zero to Production

FFmpeg SRT Streaming: The Complete Command Reference