We can reveal the fourth and, arguably, the biggest news out of the AMD Tech Summit that was held in Sonoma, California from December 7-9, 2016 – details of the new AMD Vega GPU architecture!
In this article, we will reveal to you, the details of not just the Vega NCU (Next-Gen Compute Unit) and the HBM2 memory it uses, but also its spanking new High-Bandwidth Cache Controller. On top of that, we will delve into the new geometry pipeline and pixel engine!
As usual, we will offer you a summary of the key points, and greater details in the subsequent pages. Finally, we will give you the presentation slides and when we get it, the presentation video from the AMD Tech Summit in Sonoma.
The 4 Major Features In AMD Vega
As AMD’s next-generation GPU architecture, Vega will come with these 4 major features that will help it to leapfrog ahead of competing graphics architectures.
- High-Bandwidth Cache (HBM2 memory + High-Bandwidth Cache Controller)
- New Programmable Geometry Pipeline, with improved load balancing and new primitive shader
- Next-Generation Compute Unit (NCU), with configurable precision
- Next-Generation Pixel Engine, with a new rasteriser and L2 cache design
We will summarise the key points below. But for more details, click on the links above.
High-Bandwidth Cache
[adrotate group=”2″]- High-Bandwidth Cache = HBM2 memory + High-Bandwidth Cache Controller (HBCC)
- HBM2 memory technology offers :
- 2X bandwidth per pin over HBM
- 5X power efficiency over GDDR5
- 8X capacity / stack
- Over 50% smaller footprint compared to GDDR5
- The High-Bandwidth Cache Controller offers :
- Access to 512 TB virtual address space
- Adaptive, fine-grained data movement
- AMD showcased the performance of the High-Bandwidth Cache using a real-time render of Joe Macri’s living room on Radeon ProRender with 700+ GB of data
New Programmable Geometry Pipeline
- The new AMD Vega geometry pipeline has over 2X peak throughput per clock
- There is a new primitive shader that allows primitives to be discarded at a high rate
- A new Intelligent Workgroup Distributor allows for improved load balancing
Next-Generation Compute Unit (NCU)
- The AMD Vega NCU has configurable precision, allowing it to process :
- 512 8-bits ops per clock, or
- 256 16-bit ops per clock, or
- 128 32-bit ops per clock
- It is also optimised for higher clock speeds and higher IPC
- It boasts a larger instruction buffer, and higher clock speeds
Next-Generation Pixel Engine
- The AMD Vega pixel engine has a new Draw Stream Binning Rasteriser
- The new rasteriser is designed to improve performance while saving power
- The on-chip bin cache allows the rasteriser to only “fetch once”
- The rasteriser also “shade once” by culling pixels invisible in the final scene
- The render back-ends are now clients of the L2 cache, which improves deferred shading performance
I Want To Know More!
If you would like to know more about the four main improvements in the AMD Vega GPU architecture, please click on the following links, or just go on to the next page.
- High-Bandwidth Cache
- New Programmable Geometry Pipeline
- Next-Generation Compute Unit (NCU)
- Next-Generation Pixel Engine
- The Complete Set Of AMD Vega Architecture Slides
Next Page > High-Bandwidth Cache, HBM2 Memory, Cache Controller, Why Is It Called A Cache?
Support Tech ARP!
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
High-Bandwidth Cache
AMD Vega will use HBM2 memory, as well as a new High-Bandwidth Cache Controller (HBCC). Together, they are known as the High Bandwidth Cache.
AMD showcased the performance of the High-Bandwidth Cache using a real-time render of Joe Macri’s living room on Radeon ProRender with 700+ GB of data. Although no frame rate was visible, the real-time render appeared to be very smooth.
The HBM2 Memory
HBM2 offers twice the transfer rate per pin (up to 2 GT/s), over its the first-generation HBM memory. This allows it to achieve up to 256 GB/s memory bandwidth per package.
In both HBM and HBM2, up to 8 memory dies can be stacked in a package. But moving to HBM2 allows for twice the memory density – up to 8 GB per package is now possible.
The High-Bandwidth Cache Controller
The second component of the High Bandwidth Cache is the new High-Bandwidth Cache Controller (HBCC). It creates a homogenous memory system for the AMD Vega GPU, with up to 512 TB of addressable memory.
It also allows for the adaptive, fine-grained movement of data between the AMD Vega GPU and the system memory, the NVRAM and the network storage (as part of Infinity Fabric).
Why Is It Called A Cache?
AMD calls the combination of the HBM2 memory and the High-Bandwidth Cache Controller the “High-Bandwidth Cache“. Techies may wonder why AMD chose to call it “cache”, instead of “memory”. After all, HBM2 memory is a type of fast graphics memory.
[adrotate group=”2″]The answer lies in the AMD Vega’s heterogenous memory system. All memory in the system, whether it’s the HBM2 memory or shared memory from the computer’s SDRAM, is seen as a contiguous memory space. A big block of memory, irrespective of how fast they are.
That may be great for memory addressing, but may cause frequently-used data to be placed in slower memory. To avoid such an occurrence, the High-Bandwidth Cache Controller uses the HBM2 memory like a fast cache. This allows it to keep the most frequently-used data in the fastest memory available – the HBM2 memory.
Hence, the HBM2 memory functions like a cache in AMD Vega, and that is why AMD called the combination the High-Bandwidth Cache.
Next Page > New AMD Vega Geometry Pipeline, Compute Unit & Pixel Engine
Support Tech ARP!
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
The New Programmable Geometry Pipeline
The AMD Vega features a new programmable geometry pipeline. It boasts over twice the peak throughput per clock, compared to the previous-generation geometry pipeline. This is achieved through two new improvements – primitive shaders and the Intelligent Workgroup Distributor.
The primitive shader stage is a completely new addition. It allows for primitives to be discarded at a much higher rate. The Intelligent Workgroup Distributor, on the other hand, improves the load balancing of work going to the large number of pipelines.
The Next-Generation Compute Unit (NCU)
The AMD Vega NCU is optimised for higher clock speeds, and higher IPC. It boasts a larger instruction buffer, and naturally – high clock speeds. But what’s unique is its flexible, configurable precision. This allows it to process, not just 64-bit and 32-bit operations, but also 16-bit and 8-bit operations. For example, the AMD Vega NCU can process :
- 512 8-bits ops per clock, or
- 256 16-bit ops per clock, or
- 128 32-bit ops per clock
The Vega NCU does not “waste” computing power by only allowing one operation per clock. If it is a smaller-sized operation, it can be combined to maximise performance. This allows it to boost the performance of “packed math” applications.
This is how the new Radeon Instinct MI25 Vega accelerators deliver 25 teraflops of FP16 compute performance. You can read more about Radeon Instinct in the following articles :
[adrotate group=”1″]
The Next-Generation Pixel Engine
The AMD Vega pixel engine has a new Draw Stream Binning Rasteriser. It is designed to improve performance while saving power. The on-chip bin cache allows it to only “fetch once“, and it culls pixels invisible to the final scene so it can also “shade once“.
In previous GPU architectures, the pixel and texture memory accesses are non-coherent. That means the same data required by both pixel and texture shaders are not “visible”, and have to be fetched and flushed independently. This reduces efficiency and wastes cache bandwidth.
In the AMD Vega GPU, the homogenous memory system allows for coherent memory accesses. In addition, the render back-ends are now clients of the L2 cache. This allows data to remain in the L1 and L2 caches, and not get flushed and refetched over and over again.
Next Page > The Complete AMD Vega GPU Architecture Slides
Support Tech ARP!
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
The Complete AMD Vega GPU Architecture Slides
Here is the full set of official AMD slides on the AMD Vega GPU architecture for your perusal :
We hope to get our hands on the two AMD video presentations on the Vega GPU at the AMD Tech Summit. If we do get them, we will post them here. So check back later! 🙂
[adrotate group=”1″]Go Back To > First Page | Computer Hardware + Systems | Home
Support Tech ARP!
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!