Page 3 : The New AMD Vega Geometry Pipeline, Compute Unit & Pixel Engine
The New Programmable Geometry Pipeline
The AMD Vega features a new programmable geometry pipeline. It boasts over twice the peak throughput per clock, compared to the previous-generation geometry pipeline. This is achieved through two new improvements – primitive shaders and the Intelligent Workgroup Distributor.
The primitive shader stage is a completely new addition. It allows for primitives to be discarded at a much higher rate. The Intelligent Workgroup Distributor, on the other hand, improves the load balancing of work going to the large number of pipelines.
The Next-Generation Compute Unit (NCU)
The AMD Vega NCU is optimised for higher clock speeds, and higher IPC. It boasts a larger instruction buffer, and naturally – high clock speeds. But what’s unique is its flexible, configurable precision. This allows it to process, not just 64-bit and 32-bit operations, but also 16-bit and 8-bit operations. For example, the AMD Vega NCU can process :
- 512 8-bits ops per clock, or
- 256 16-bit ops per clock, or
- 128 32-bit ops per clock
The Vega NCU does not “waste” computing power by only allowing one operation per clock. If it is a smaller-sized operation, it can be combined to maximise performance. This allows it to boost the performance of “packed math” applications.
This is how the new Radeon Instinct MI25 Vega accelerators deliver 25 teraflops of FP16 compute performance. You can read more about Radeon Instinct in the following articles :
The Next-Generation Pixel Engine
The AMD Vega pixel engine has a new Draw Stream Binning Rasteriser. It is designed to improve performance while saving power. The on-chip bin cache allows it to only “fetch once“, and it culls pixels invisible to the final scene so it can also “shade once“.
In previous GPU architectures, the pixel and texture memory accesses are non-coherent. That means the same data required by both pixel and texture shaders are not “visible”, and have to be fetched and flushed independently. This reduces efficiency and wastes cache bandwidth.
In the AMD Vega GPU, the homogenous memory system allows for coherent memory accesses. In addition, the render back-ends are now clients of the L2 cache. This allows data to remain in the L1 and L2 caches, and not get flushed and refetched over and over again.
Next Page > The Complete AMD Vega GPU Architecture Slides