AMD RDNA 4 Architecture: 64-CU Design Boosts Compute & Ray Tracing

Key Takeaways

1. AMD officially launched RDNA 4 and the Radeon RX 9070 series GPUs, with a retail release on March 6, 2025.
2. RDNA 4 features architectural improvements for better raster performance, efficiency, and enhanced ray tracing capabilities, along with upgraded AI and media encoding/decoding.
3. The design has shifted to a traditional monolithic structure, directly connecting memory and compute through Infinity Cache, with significant upgrades in compute units and memory cache.
4. Video encoding and decoding capabilities have been improved, addressing past drawbacks, and new features support lower power consumption and better performance for various formats.
5. RDNA 4 includes advanced AI capabilities with dedicated math pipelines, improved ray tracing performance, and support for innovative rendering techniques, promising substantial gains in gaming experiences.


AMD gave a glimpse of RDNA 4 at CES 2025, announcing the Radeon RX 9070 XT and RX 9070 models. However, they did not mention the new architecture during the keynote.

The company assured that further details about RDNA 4 and the new Radeon GPUs would be shared shortly, and now we have that information.

Launch Details

Today, AMD officially unveils RDNA 4 along with the new Radeon RX 9070 series GPUs. The RX 9070 series is set to hit retail stores on March 6, with performance reviews expected a day prior.

Architectural Improvements

RDNA 4 enhances the objectives AMD established with RDNA 3. As per AMD, RDNA 4 aims to handle more demanding gaming tasks, focusing on better raster performance and efficiency.

There are also the usual enhancements to ray tracing pipelines, a renewed emphasis on AI capabilities, and media encoding/decoding improvements.

Design Changes

While RDNA 3 introduced a chiplet design for GPUs inspired by Ryzen processors, featuring separate memory cache dies (MCDs) from the graphics compute die (GCD), RDNA 4 reverts to a traditional monolithic design. Although the components remain similar, the memory and compute are now directly connected through the Infinity Cache, eliminating MCD-GCD interconnects.

The RDNA 4 GPU, specifically the Radeon RX 9070 XT, has four shader engines, each with eight workgroup processors (WGPs). Each WGP consists of eight compute units (CUs), totaling 64 CUs.

AMD claims that the new compute units are now more powerful than before, providing enhanced ray tracing, doubled peak throughput, and support for the latest matrix acceleration features with wider numeric format compatibility.

Enhanced Features

A new feature in the RDNA 4 CU, akin to the Tensor cores in Nvidia’s Ampere architecture, is support for structured sparsity. This facilitates quicker matrix operations, especially when many weights are zero.

The memory subsystem has also seen upgrades. The L2 cache has increased from 6 MB in RDNA 3 to 8 MB in RDNA 4, while the Infinity Cache has been updated to 3rd gen but reduced from 96 MB to 64 MB in RDNA 3.

AMD continues to use GDDR6 memory in this new generation. Both the RX 9070 XT and RX 9070 provide a 384-bit 16 GB GDDR6 memory interface operating at 20 Gbps, resulting in an effective bandwidth of 640 GB/s. This is lower than the 960 GB/s from RDNA 3, but AMD states that the RDNA 4 memory specifications were deliberately chosen to support both current and future game titles.

Video Encoding Improvements

Video encoding was a significant drawback with RDNA 3, and AMD promises considerable enhancements in this area. The company guarantees major improvements in H.264 and AV1 encoding, with fewer blocking artifacts for the same data amount.

The upgrades also apply to video decoding, featuring lower power consumption and better performance for formats like AV1 and VP9.

The Radiance Display Engine now operates with much less power in dual-monitor FreeSync setups. Additionally, it supports hardware flip queue in the Windows Display Driver Model (WDDM) 3.0 for video playback.

Performance Enhancements

The RDNA 4 CU structure is similar to that of RDNA 3, but improvements in performance and efficiency are seen in each component.

WMMA (Wave Matrix Multiply Accumulate) operations have been optimized for the new hardware’s demands. Scaler units have received upgrades to manage Float32 operations, and the scheduler can split a large compute task into smaller, manageable barriers.

AMD states that RDNA 4 is designed to accommodate new rendering techniques that developers utilize in modern games. While upscaling has become popular, effective path tracing requires ML acceleration as an intrinsic part of the rendering process, not merely an afterthought.

Ray Tracing Features

RDNA 4 includes 64 3rd gen ray accelerators in the RX 9070 XT. The ray accelerator’s structure resembles that of RDNA 3 but features an extra intersection engine, doubling the number of ray box and ray triangle units.

There’s also a dedicated hardware ray transform that reduces the need for shader instructions, thereby lowering ray traversal overhead. Each dual CU has a 128 KB memory to store the ray stack for efficient push and sort operations.

The new RDNA 4 introduces oriented bounding boxes (OBBs) that align BVH bounding boxes with the geometry, which reduces false-positive ray interactions in empty spaces. AMD claims this method can enhance ray traversal performance by up to 10%.

Memory Request Support

Additionally, RDNA 4 supports relaxed out-of-order memory requests, effectively reducing wait times for waves that miss the high-level cache. This enhancement benefits not only ray tracing but other tasks as well.

In RDNA 4, shaders can dynamically allocate registers, allowing for more waves in flight and improved memory latency.

AMD graphics cards have faced challenges with ray tracing, making path tracing seem unattainable, even for the higher-end RDNA 3 models. RDNA 4 strives to change this with support for neural radiance caching along with a new neural supersampling and denoising model.

While AMD hasn’t provided specific performance metrics for path tracing titles, we can expect to gather insights during card reviews.

AI Capabilities

AMD highlights that RDNA 4 features dedicated math pipelines for ML acceleration, emphasizing high performance with narrower data types. New support for FP8 and BF8 is introduced for high-performance, high-precision inference.

During a demonstration of SDXL 1.5 image generation, AMD revealed that the RDNA 4-based Radeon RX 9070 XT delivers double the FP16 performance per CU compared to the RDNA 3-based RX 7900 XT.

Capitalizing on RDNA 4’s AI advancements is FSR 4, an end-to-end pipeline designed on AMD GPUs. FSR 4 utilizes FP8 for optimal bandwidth usage, performance, and power.

AMD showcased up to a 3.7x increase in fps with FSR 4 when combined with frame interpolation and Radeon Anti-Lag while preserving high image quality.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *