The AMD Vega GPU Architecture Tech Report

Page 3 : The New AMD Vega Geometry Pipeline, Compute Unit & Pixel Engine

The New Programmable Geometry Pipeline

The AMD Vega features a new programmable geometry pipeline. It boasts over twice the peak throughput per clock, compared to the previous-generation geometry pipeline. This is achieved through two new improvements – primitive shaders and the Intelligent Workgroup Distributor.

This slideshow requires JavaScript.

The primitive shader stage is a completely new addition. It allows for primitives to be discarded at a much higher rate. The Intelligent Workgroup Distributor, on the other hand, improves the load balancing of work going to the large number of pipelines.

 

The Next-Generation Compute Unit (NCU)

The AMD Vega NCU is optimised for higher clock speeds, and higher IPC. It boasts a larger instruction buffer, and naturally – high clock speeds. But what’s unique is its flexible, configurable precision. This allows it to process, not just 64-bit and 32-bit operations, but also 16-bit and 8-bit operations. For example, the AMD Vega NCU can process :

  • 512 8-bits ops per clock, or
  • 256 16-bit ops per clock, or
  • 128 32-bit ops per clock

This slideshow requires JavaScript.

The Vega NCU does not “waste” computing power by only allowing one operation per clock. If it is a smaller-sized operation, it can be combined to maximise performance. This allows it to boost the performance of “packed math” applications.

This slideshow requires JavaScript.

This is how the new Radeon Instinct MI25 Vega accelerators deliver 25 teraflops of FP16 compute performance. You can read more about Radeon Instinct in the following articles :

 

The Next-Generation Pixel Engine

The AMD Vega pixel engine has a new Draw Stream Binning Rasteriser. It is designed to improve performance while saving power. The on-chip bin cache allows it to only “fetch once“, and it culls pixels invisible to the final scene so it can also “shade once“.

This slideshow requires JavaScript.

In previous GPU architectures, the pixel and texture memory accesses are non-coherent. That means the same data required by both pixel and texture shaders are not “visible”, and have to be fetched and flushed independently. This reduces efficiency and wastes cache bandwidth.

This slideshow requires JavaScript.

In the AMD Vega GPU, the homogenous memory system allows for coherent memory accesses. In addition, the render back-ends are now clients of the L2 cache. This allows data to remain in the L1 and L2 caches, and not get flushed and refetched over and over again.

Next Page > The Complete AMD Vega GPU Architecture Slides

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Comments

comments

About The Author

Related posts

6 Comments

  1. Pingback: Daily Roundup: 2017-01-05 - Bjorn3D.com

  2. Pingback: Watch AMD Vega Run DOOM On Vulkan! - Tech ARP

  3. Pingback: The Complete AMD Radeon Instinct Tech Briefing - Tech ARP

  4. Pingback: The AMD GDC Capsaicin Event - Vega, Bethesda & More! - Tech ARP

Leave a Reply

%d bloggers like this: