Hardware Transcoding with Intel QSV: GPU-Accelerated Video Processing
What is Hardware Transcoding?
Transcoding is the process of decoding a video stream from one format and re-encoding it into another. It is computationally expensive. A 1080p60 H.264 stream can consume an entire modern CPU core for encoding alone. When you need to transcode multiple streams simultaneously, or transcode at higher resolutions like 4K, software encoding hits a wall.
Hardware transcoding offloads this work to dedicated silicon on your GPU or CPU’s integrated graphics. Instead of using general-purpose CPU cores, the video encode/decode happens on fixed-function hardware blocks that are purpose-built for the task. The result: the same transcoding job runs 5-10x faster, uses a fraction of the power, and frees your CPU for other work.
When Do You Need Transcoding?
In a streaming gateway context, transcoding is necessary when:
- Format conversion: your encoder sends HEVC (H.265) but your output destination only accepts H.264
- Bitrate adaptation: you receive a 20 Mbps feed and need to output a 4 Mbps version for bandwidth-constrained viewers
- Resolution scaling: converting 4K ingest to 1080p output
- Multi-bitrate output: creating an ABR (Adaptive Bitrate) ladder from a single high-quality input
- Codec upgrade: converting legacy H.264 feeds to HEVC for bandwidth savings
If your input and output share the same codec, resolution, and bitrate, you do not need transcoding. The gateway can pass through the stream untouched (zero-copy), which is the most efficient path. Vajra Cast automatically uses passthrough when no transformation is needed.
Hardware Transcoding Options
Three major hardware transcoding platforms exist today:
Intel Quick Sync Video (QSV)
Intel’s integrated GPU transcoding engine, available on most Intel CPUs with integrated graphics (i3, i5, i7, Xeon E with iGPU). Uses Intel’s Media SDK / oneVPL.
Strengths:
- Available on nearly every Intel system (no discrete GPU needed)
- Excellent quality-per-watt ratio
- Strong HEVC encoding and decoding support
- Available in servers and NUCs (compact form factor)
- Low cost: the hardware is already in your CPU
Limitations:
- Throughput limited compared to discrete GPUs (typically 4-8 simultaneous 1080p encodes)
- Not available on F-series Intel CPUs (no iGPU) or AMD processors
- Quality slightly below high-preset software encoding (but improving with each generation)
NVIDIA NVENC
NVIDIA’s dedicated hardware encoder on GeForce, Quadro, and Tesla GPUs.
Strengths:
- High throughput on high-end GPUs (up to 8+ simultaneous 1080p encodes on Quadro)
- Excellent B-frame support
- AV1 encoding on RTX 40-series and newer
- Widely available in workstations and cloud instances
Limitations:
- Requires a discrete NVIDIA GPU (additional cost and power)
- GeForce cards limited to 3 simultaneous encodes (Quadro/Tesla unlimited)
- Driver-dependent: needs NVIDIA proprietary drivers
- Not available in many compact or low-power server form factors
Software Encoding (libx264 / libx265)
CPU-based encoding using the x264 or x265 libraries.
Strengths:
- Highest possible quality at any given bitrate
- No special hardware required
- Maximum configuration flexibility
- Available everywhere
Limitations:
- Extremely CPU-intensive: one 1080p stream can use 100% of a core at
veryfastpreset - Power-hungry
- Scales poorly: each additional stream requires proportionally more CPU
- Not viable for multi-stream workloads without massive server hardware
Comparison Table
| Feature | Intel QSV | NVIDIA NVENC | Software (x264) |
|---|---|---|---|
| Hardware Required | Intel iGPU | NVIDIA GPU | Any CPU |
| 1080p30 Encode Slots | 4-8 | 3-unlimited | 1 per core |
| Power (per encode) | ~5W | ~15W | ~65W |
| Quality (same bitrate) | Good | Good | Excellent |
| HEVC Support | Yes | Yes | Yes (slow) |
| AV1 Support | Gen 12+ | RTX 40+ | Very slow |
| Latency | Very low | Very low | Preset-dependent |
| Cost | $0 (in CPU) | $200-$10,000 | $0 |
| Linux Server Friendly | Excellent | Driver complexity | No issues |
For streaming gateway deployments, Intel QSV is the sweet spot. It is available on common server hardware, requires no discrete GPU, and handles the typical gateway workload (2-8 simultaneous transcodes) with ease. Vajra Cast is optimized for Intel QSV and uses it as the primary hardware transcoding engine.
Intel QSV Setup
Supported Intel Hardware
QSV is available on Intel processors with integrated graphics:
| Generation | Examples | Key Capabilities |
|---|---|---|
| 6th Gen (Skylake) | i7-6700, Xeon E3-1200 v5 | H.264 encode/decode, HEVC decode |
| 7th Gen (Kaby Lake) | i7-7700 | + HEVC 10-bit decode |
| 8th Gen (Coffee Lake) | i9-9900K, Xeon E-2100 | + HEVC encode |
| 10th Gen (Ice Lake) | i7-1065G7 | Improved HEVC quality |
| 11th Gen (Rocket Lake) | i9-11900 | + AV1 decode |
| 12th Gen (Alder Lake) | i9-12900 | + AV1 encode, improved quality |
| 13th-14th Gen | i9-13900, i9-14900 | Enhanced multi-stream |
| Intel Core Ultra | Ultra 7, Ultra 9 | Latest media engine |
Important: Intel F-series CPUs (e.g., i9-12900F) have no integrated graphics and cannot use QSV. If you are purchasing hardware specifically for transcoding, avoid F-series models.
Linux Setup
Most Vajra Cast deployments run on Linux. Here is how to enable QSV:
1. Install Intel Media Drivers
On Ubuntu/Debian:
# Add Intel graphics repository
sudo apt update
sudo apt install -y intel-media-va-driver-non-free intel-gpu-tools vainfo
On RHEL/Rocky/AlmaLinux:
sudo dnf install -y intel-media-driver intel-gpu-tools libva-utils
2. Verify VAAPI Detection
vainfo
You should see output listing encode and decode capabilities:
libva info: VA-API version 1.20.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_20
vainfo: VA-API version: 1.20
vainfo: Driver version: Intel iHD driver for Intel Gen Graphics
vainfo: Supported profile and entrypoints
VAProfileH264Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointEncSlice
VAProfileH264High : VAEntrypointVLD
VAProfileH264High : VAEntrypointEncSlice
VAProfileHEVCMain : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointEncSlice
VAProfileHEVCMain10 : VAEntrypointVLD
VAProfileHEVCMain10 : VAEntrypointEncSlice
Look for VAEntrypointEncSlice entries. These confirm encoding is available.
3. Check GPU Access
ls -la /dev/dri/
You should see renderD128 (and possibly renderD129). The user running Vajra Cast needs read/write access:
# Add user to the render and video groups
sudo usermod -aG render,video $USER
4. Verify with FFmpeg
ffmpeg -hwaccel qsv -c:v h264_qsv -i input.mp4 -c:v h264_qsv -b:v 4000k output.mp4
If this runs without errors, QSV is working.
Docker Setup
For containerized deployments, pass the GPU device into the container:
docker run -d \
--name vajracast \
--device /dev/dri:/dev/dri \
-v /path/to/config:/config \
-p 9000-9100:9000-9100/udp \
-p 1935:1935/tcp \
-p 8080:8080/tcp \
vajracast/vajracast:latest
The --device /dev/dri:/dev/dri flag passes the Intel GPU device into the container, giving it access to QSV.
For Kubernetes:
apiVersion: v1
kind: Pod
metadata:
name: vajracast
spec:
containers:
- name: vajracast
image: vajracast/vajracast:latest
resources:
limits:
gpu.intel.com/i915: 1
securityContext:
runAsGroup: 44 # video group
macOS Setup
On macOS with Intel processors (pre-Apple Silicon Macs), QSV is available through VideoToolbox. Vajra Cast on macOS uses VideoToolbox automatically. No additional configuration needed.
On Apple Silicon Macs (M1-M4), VideoToolbox provides hardware encoding through Apple’s media engine, which offers similar benefits to QSV. Vajra Cast uses the appropriate hardware acceleration backend automatically based on the platform.
Configuring Transcoding in Vajra Cast
Basic Transcoding Route
To set up a transcoding route in Vajra Cast:
- Create an ingest (SRT or RTMP)
- Create an output
- In the output settings, enable Transcoding
- Select the target codec:
- H.264: maximum compatibility
- HEVC (H.265): 30-40% smaller at equivalent quality
- Set the target bitrate
- Set the target resolution (if scaling)
- Vajra Cast automatically selects QSV if available, with VAAPI fallback
Transcoding Parameters
| Parameter | Description | Recommended Value |
|---|---|---|
| Codec | Output video codec | H.264 or HEVC |
| Bitrate | Target encoding bitrate | See table below |
| Resolution | Output resolution | Match source or scale down |
| Preset | Encoding speed/quality trade-off | balanced (default) |
| Profile | H.264/HEVC profile | high |
| Keyframe Interval | Maximum seconds between keyframes | 2 |
Bitrate Recommendations for Transcoding
| Source | Target | Recommended Bitrate |
|---|---|---|
| 4K H.264 30 Mbps | 1080p H.264 | 6,000-8,000 Kbps |
| 4K HEVC 15 Mbps | 1080p H.264 | 6,000-8,000 Kbps |
| 1080p H.264 8 Mbps | 1080p HEVC | 4,000-5,000 Kbps |
| 1080p H.264 8 Mbps | 720p H.264 | 3,000-4,000 Kbps |
| 1080p HEVC 5 Mbps | 1080p H.264 | 6,000-8,000 Kbps |
HEVC (H.265) Transcoding
HEVC is increasingly important for contribution feeds. It delivers equivalent visual quality at 30-40% lower bitrate compared to H.264, which means:
- Lower bandwidth costs for long-distance SRT contribution
- Better quality on bandwidth-constrained paths (cellular, satellite)
- 4K viability on standard internet connections
With QSV, HEVC encoding is hardware-accelerated and runs at the same speed as H.264 encoding. The quality penalty versus software x265 is minimal on modern Intel hardware (11th gen and newer).
A common workflow: receive HEVC from a remote encoder (saving bandwidth on the contribution path), then transcode to H.264 for output to platforms that do not support HEVC ingest.
Remote Camera → HEVC (SRT, 5 Mbps) → Vajra Cast [QSV transcode] → H.264 (RTMP, 6 Mbps) → YouTube
→ HEVC (SRT, 5 Mbps) → Archive
Performance Benchmarks
Real-world benchmarks on common Intel hardware, measured with Vajra Cast:
Intel i7-12700 (12th Gen, Alder Lake)
| Workload | CPU Usage | GPU Usage | Latency Added |
|---|---|---|---|
| 1x 1080p30 H.264→H.264 | 3% | 15% | <5ms |
| 1x 1080p30 H.264→HEVC | 3% | 20% | <5ms |
| 4x 1080p30 H.264→H.264 | 8% | 55% | <5ms |
| 1x 4K30 H.264→1080p H.264 | 5% | 35% | <8ms |
| 1x 1080p60 H.264→H.264 | 4% | 25% | <5ms |
Intel Xeon E-2388G (Server)
| Workload | CPU Usage | GPU Usage | Latency Added |
|---|---|---|---|
| 1x 1080p30 H.264→H.264 | 2% | 12% | <5ms |
| 4x 1080p30 H.264→H.264 | 6% | 45% | <5ms |
| 8x 1080p30 H.264→H.264 | 10% | 85% | <8ms |
| 2x 4K30 H.264→1080p H.264 | 6% | 50% | <8ms |
Intel N100 (Low-Power / Mini PC)
| Workload | CPU Usage | GPU Usage | Latency Added |
|---|---|---|---|
| 1x 1080p30 H.264→H.264 | 8% | 40% | <8ms |
| 2x 1080p30 H.264→H.264 | 12% | 75% | <10ms |
| 1x 1080p30 H.264→HEVC | 10% | 50% | <8ms |
Key takeaways:
- CPU usage is minimal: hardware transcoding barely touches the CPU, leaving it free for gateway routing, monitoring, and other tasks
- Transcoding latency is negligible: under 10ms in all cases, invisible in a streaming context
- Even low-power hardware handles the job: an Intel N100 mini PC can transcode 2 simultaneous 1080p streams
Comparison: QSV vs Software x264
Encoding 1x 1080p30 H.264 at 6000 Kbps:
| Method | CPU Usage | Time per Frame | Power | VMAF Score |
|---|---|---|---|---|
| x264 ultrafast | 45% (1 core) | 8ms | ~65W | 89 |
| x264 veryfast | 80% (1 core) | 15ms | ~85W | 92 |
| x264 medium | 100% (2+ cores) | 33ms | ~120W | 95 |
| QSV balanced | 3% | 2ms | ~5W | 91 |
QSV at “balanced” preset achieves quality comparable to x264 veryfast while using 1/15th the CPU and 1/17th the power. For a streaming gateway that needs to transcode continuously 24/7, this difference is transformative.
The Auto-Fallback Chain
Vajra Cast implements an automatic fallback chain for transcoding:
Intel QSV → VAAPI → Software (libx264/libx265)
- QSV preferred: if Intel QSV is detected and the codec is supported, it is used
- VAAPI fallback: if QSV is not available but VAAPI is (e.g., some AMD GPUs or older Intel drivers), VAAPI is used
- Software last resort: if no hardware acceleration is available, software encoding is used
This fallback is automatic. You do not need to configure it. Vajra Cast detects available hardware at startup and selects the best option. The active transcoding engine is visible in the dashboard for each route.
Monitoring Transcoding Performance
Vajra Cast exposes per-route transcoding metrics:
- GPU utilization: percentage of QSV media engine in use
- Encode FPS: frames per second being encoded (should match source frame rate)
- Encode latency: time per frame in milliseconds
- Output bitrate: actual encoding bitrate (may differ slightly from target)
- VMAF score: automated video quality assessment (0-100) comparing transcoded output to source
These metrics are available in the web dashboard and via the Prometheus /metrics endpoint. Use them to:
- Detect GPU overload (utilization >90% sustained)
- Verify output quality (VMAF >85 is generally good, >90 is excellent)
- Plan capacity (how many more transcodes can this hardware handle?)
Best Practices
- Use passthrough when possible. If input and output share the same codec and resolution, skip transcoding entirely. It is always faster and preserves original quality.
- Match keyframe intervals. Set the transcoder keyframe interval to match your output platform requirements (2 seconds is the universal safe choice).
- Monitor GPU utilization. Keep sustained GPU usage below 80% to leave headroom for bitrate spikes and retransmission overhead.
- Test HEVC compatibility. Before switching outputs to HEVC, verify the downstream player or platform supports it. Not all do.
- Use CBR for transcoded outputs. Constant bitrate produces more predictable quality and simplifies bandwidth planning.
- Keep firmware and drivers updated. Intel regularly improves QSV quality and performance through driver updates. On Linux, keep
intel-media-drivercurrent.
For the full Vajra Cast feature set including zero-copy distribution, failover, and monitoring, see our SRT Streaming Gateway guide.