Page 3 : The New AMD Vega Geometry Pipeline, Compute Unit & Pixel Engine
Contents
The New Programmable Geometry Pipeline
The AMD Vega features a new programmable geometry pipeline. It boasts over twice the peak throughput per clock, compared to the previous-generation geometry pipeline. This is achieved through two new improvements – primitive shaders and the Intelligent Workgroup Distributor.
The primitive shader stage is a completely new addition. It allows for primitives to be discarded at a much higher rate. The Intelligent Workgroup Distributor, on the other hand, improves the load balancing of work going to the large number of pipelines.
The Next-Generation Compute Unit (NCU)
The AMD Vega NCU is optimised for higher clock speeds, and higher IPC. It boasts a larger instruction buffer, and naturally – high clock speeds. But what’s unique is its flexible, configurable precision. This allows it to process, not just 64-bit and 32-bit operations, but also 16-bit and 8-bit operations. For example, the AMD Vega NCU can process :
- 512 8-bits ops per clock, or
- 256 16-bit ops per clock, or
- 128 32-bit ops per clock
The Vega NCU does not “waste” computing power by only allowing one operation per clock. If it is a smaller-sized operation, it can be combined to maximise performance. This allows it to boost the performance of “packed math” applications.
This is how the new Radeon Instinct MI25 Vega accelerators deliver 25 teraflops of FP16 compute performance. You can read more about Radeon Instinct in the following articles :
[adrotate group=”1″]
The Next-Generation Pixel Engine
The AMD Vega pixel engine has a new Draw Stream Binning Rasteriser. It is designed to improve performance while saving power. The on-chip bin cache allows it to only “fetch once“, and it culls pixels invisible to the final scene so it can also “shade once“.
In previous GPU architectures, the pixel and texture memory accesses are non-coherent. That means the same data required by both pixel and texture shaders are not “visible”, and have to be fetched and flushed independently. This reduces efficiency and wastes cache bandwidth.
In the AMD Vega GPU, the homogenous memory system allows for coherent memory accesses. In addition, the render back-ends are now clients of the L2 cache. This allows data to remain in the L1 and L2 caches, and not get flushed and refetched over and over again.
Next Page > The Complete AMD Vega GPU Architecture Slides
Support Tech ARP!
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
Pingback: Daily Roundup: 2017-01-05 - Bjorn3D.com
Pingback: Watch AMD Vega Run DOOM On Vulkan! - Tech ARP
Pingback: The Complete AMD Radeon Instinct Tech Briefing - Tech ARP
I was wondering if anyone had a recording for the event where Raja introduced the vega GPU in his hand. I can’t find that event anywhere except in that picture and a very short video snippet. http://images.anandtech.com/doci/11002/Raja_575px.jpg
No, we were actually NOT allowed to record a video of that, even though he held it up for just a few seconds.
I too have a picture of Raja showing off the Vega GPU but frankly, it’s just a picture of Raja showing off a chip.
It’s far more interesting to see the AMD Vega running DOOM at 4K -> http://www.techarp.com/articles/watch-amd-vega-run-doom-vulkan/
Pingback: The AMD Vega Memory Architecture Q&A With Jeffrey Cheng - Tech ARP
Pingback: The AMD GDC Capsaicin Event - Vega, Bethesda & More! - Tech ARP
Pingback: The AMD Radeon RX 580 Performance Comparison - Tech ARP
Pingback: The AMD Vega Memory Architecture Q&A With Jeffrey Cheng - Tech ARP
Pingback: The NVIDIA GeForce GTX 1080 Ti Founders Edition Review - Tech ARP