Tag Archives: Heterogenous Memory

AMD Vega Memory Architecture Q&A With Jeffrey Cheng

At the AMD Computex 2017 Press Conference, AMD President & CEO Dr. Lisa Su announced that AMD will launch the Radeon Vega Frontier Edition on 27 June 2017, and the Radeon RX Vega graphics cards at the end of July 2017. We figured this is a great time to revisit the new AMD Vega memory architecture.

Now, who better to tell us all about it than AMD Senior Fellow Jeffrey Cheng, who built the AMD Vega memory architecture? Check out this exclusive Q&A session from the AMD Tech Summit in Sonoma!

Updated @ 2017-06-11 : We clarified the difference between the AMD Vega’s 64-bit flat address space, and the 512 TB addressable memory. We also added new key points, and time stamps for the key points.

Originally posted @ 2017-02-04

Don’t forget to also check out the following AMD Vega-related articles :

 

The AMD Vega Memory Architecture

Jeffrey Cheng is an AMD Senior Fellow in the area of memory architecture. The AMD Vega memory architecture refers to how the AMD Vega GPU manages memory utilisation and handles large datasets. It does not deal with the AMD Vega memory hardware design, which includes the High Bandwidth Cache and HBM2 technology.

 

AMD Vega Memory Architecture Q&A Summary

Here are the key takeaway points from the Q&A session with Jeffrey Cheng :

  • Large amounts of DRAM can be used to handle big datasets, but this is not the best solution because DRAM is costly and consumes lots of power (see 2:54).
  • AMD chose to design a heterogenous memory architecture to support various memory technologies like HBM2 and even non-volatile memory (e.g. Radeon Solid State Graphics) (see 4:40 and 8:13).[adrotate group=”2″]
  • At any given moment, the amount of data processed by the GPU is limited, so it doesn’t make sense to store a large dataset in DRAM. It would be better to cache the data required by the GPU on very fast memory (e.g. HBM2), and intelligently move them according to the GPU’s requirements (see 5:40).
  • The AMD Vega’s heterogenous memory architecture allows for easy integration of future memory technologies like storage-class memory (flash memory that can be accessed in bytes, instead of blocks) (see 8:13).
  • The AMD Vega has a 64-bit flat address space for its shaders (see 12:0812:36 and 18:21), but like NVIDIA, AMD is (very likely) limiting the addressable memory to 49-bits, giving it 512 TB of addressable memory.
  • AMD Vega has full access to the CPU’s 48-bit address space, with additional bits beyond that used to handle its own internal memory, storage and registers (see 12:16). This ties back to the High Bandwidth Cache Controller and heterogenous memory architecture, which allows the use of different memory and storage types.

  • Game developers currently try to manage data and memory usage, often extremely conservatively to support graphics cards with limited amounts of graphics memory (see 16:29).
  • With the introduction of AMD Vega, AMD wants game developers to leave data and memory management to the GPU. Its High Bandwidth Cache Controller and heterogenous memory system will automatically handle it for them (see 17:19).
  • The memory architectural advantages of AMD Vega will initially have little impact on gaming performance (due to the current conservative approach of game developers). This will change when developers hand over data and memory management to the GPU. (see 24:42).[adrotate group=”2″]
  • The improved memory architecture in AMD Vega will mainly benefit AI applications (e.g. deep machine learning) with their large datasets (see 24:52).

Don’t forget to also check out the following AMD Vega-related articles :

Go Back To > Computer Hardware + Systems | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The AMD Vega GPU Architecture Tech Report

We can reveal the fourth and, arguably, the biggest news out of the AMD Tech Summit that was held in Sonoma, California from December 7-9, 2016 – details of the new AMD Vega GPU architecture!

In this article, we will reveal to you, the details of not just the Vega NCU (Next-Gen Compute Unit) and the HBM2 memory it uses, but also its spanking new High-Bandwidth Cache Controller. On top of that, we will delve into the new geometry pipeline and pixel engine!

As usual, we will offer you a summary of the key points, and greater details in the subsequent pages. Finally, we will give you the presentation slides and when we get it, the presentation video from the AMD Tech Summit in Sonoma.

 

The 4 Major Features In AMD Vega

As AMD’s next-generation GPU architecture, Vega will come with these 4 major features that will help it to leapfrog ahead of competing graphics architectures.

We will summarise the key points below. But for more details, click on the links above.

 

High-Bandwidth Cache

[adrotate group=”2″]
  • High-Bandwidth Cache = HBM2 memory + High-Bandwidth Cache Controller (HBCC)
  • HBM2 memory technology offers :
    • 2X bandwidth per pin over HBM
    • 5X power efficiency over GDDR5
    • 8X capacity / stack
    • Over 50% smaller footprint compared to GDDR5
  • The High-Bandwidth Cache Controller offers :
    • Access to 512 TB virtual address space
    • Adaptive, fine-grained data movement
  • AMD showcased the performance of the High-Bandwidth Cache using a real-time render of Joe Macri’s living room on Radeon ProRender with 700+ GB of data

 

New Programmable Geometry Pipeline

  • The new AMD Vega geometry pipeline has over 2X peak throughput per clock
  • There is a new primitive shader that allows primitives to be discarded at a high rate
  • A new Intelligent Workgroup Distributor allows for improved load balancing

 

Next-Generation Compute Unit (NCU)

  • The AMD Vega NCU has configurable precision, allowing it to process :
    • 512 8-bits ops per clock, or
    • 256 16-bit ops per clock, or
    • 128 32-bit ops per clock
  • It is also optimised for higher clock speeds and higher IPC
  • It boasts a larger instruction buffer, and higher clock speeds

 

Next-Generation Pixel Engine

  • The AMD Vega pixel engine has a new Draw Stream Binning Rasteriser
  • The new rasteriser is designed to improve performance while saving power
  • The on-chip bin cache allows the rasteriser to only “fetch once”
  • The rasteriser also “shade once” by culling pixels invisible in the final scene
  • The render back-ends are now clients of the L2 cache, which improves deferred shading performance

 

I Want To Know More!

If you would like to know more about the four main improvements in the AMD Vega GPU architecture, please click on the following links, or just go on to the next page.

Next Page > High-Bandwidth Cache, HBM2 Memory, Cache Controller, Why Is It Called A Cache?

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

High-Bandwidth Cache

AMD Vega will use HBM2 memory, as well as a new High-Bandwidth Cache Controller (HBCC). Together, they are known as the High Bandwidth Cache.

AMD showcased the performance of the High-Bandwidth Cache using a real-time render of Joe Macri’s living room on Radeon ProRender with 700+ GB of data. Although no frame rate was visible, the real-time render appeared to be very smooth.

 

The HBM2 Memory

HBM2 offers twice the transfer rate per pin (up to 2 GT/s), over its the first-generation HBM memory. This allows it to achieve up to 256 GB/s memory bandwidth per package.

In both HBM and HBM2, up to 8 memory dies can be stacked in a package. But moving to HBM2 allows for twice the memory density – up to 8 GB per package is now possible.

 

The High-Bandwidth Cache Controller

The second component of the High Bandwidth Cache is the new High-Bandwidth Cache Controller (HBCC). It creates a homogenous memory system for the AMD Vega GPU, with up to 512 TB of addressable memory.

It also allows for the adaptive, fine-grained movement of data between the AMD Vega GPU and the system memory, the NVRAM and the network storage (as part of Infinity Fabric).

 

Why Is It Called A Cache?

AMD calls the combination of the HBM2 memory and the High-Bandwidth Cache Controller the “High-Bandwidth Cache“. Techies may wonder why AMD chose to call it “cache”, instead of “memory”. After all, HBM2 memory is a type of fast graphics memory.

[adrotate group=”2″]

The answer lies in the AMD Vega’s heterogenous memory system. All memory in the system, whether it’s the HBM2 memory or shared memory from the computer’s SDRAM, is seen as a contiguous memory space. A big block of memory, irrespective of how fast they are.

That may be great for memory addressing, but may cause frequently-used data to be placed in slower memory. To avoid such an occurrence, the High-Bandwidth Cache Controller uses the HBM2 memory like a fast cache. This allows it to keep the most frequently-used data in the fastest memory available – the HBM2 memory.

Hence, the HBM2 memory functions like a cache in AMD Vega, and that is why AMD called the combination the High-Bandwidth Cache.

Next Page > New AMD Vega Geometry Pipeline, Compute Unit & Pixel Engine

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The New Programmable Geometry Pipeline

The AMD Vega features a new programmable geometry pipeline. It boasts over twice the peak throughput per clock, compared to the previous-generation geometry pipeline. This is achieved through two new improvements – primitive shaders and the Intelligent Workgroup Distributor.

The primitive shader stage is a completely new addition. It allows for primitives to be discarded at a much higher rate. The Intelligent Workgroup Distributor, on the other hand, improves the load balancing of work going to the large number of pipelines.

 

The Next-Generation Compute Unit (NCU)

The AMD Vega NCU is optimised for higher clock speeds, and higher IPC. It boasts a larger instruction buffer, and naturally – high clock speeds. But what’s unique is its flexible, configurable precision. This allows it to process, not just 64-bit and 32-bit operations, but also 16-bit and 8-bit operations. For example, the AMD Vega NCU can process :

  • 512 8-bits ops per clock, or
  • 256 16-bit ops per clock, or
  • 128 32-bit ops per clock

The Vega NCU does not “waste” computing power by only allowing one operation per clock. If it is a smaller-sized operation, it can be combined to maximise performance. This allows it to boost the performance of “packed math” applications.

This is how the new Radeon Instinct MI25 Vega accelerators deliver 25 teraflops of FP16 compute performance. You can read more about Radeon Instinct in the following articles :

[adrotate group=”1″]

 

The Next-Generation Pixel Engine

The AMD Vega pixel engine has a new Draw Stream Binning Rasteriser. It is designed to improve performance while saving power. The on-chip bin cache allows it to only “fetch once“, and it culls pixels invisible to the final scene so it can also “shade once“.

In previous GPU architectures, the pixel and texture memory accesses are non-coherent. That means the same data required by both pixel and texture shaders are not “visible”, and have to be fetched and flushed independently. This reduces efficiency and wastes cache bandwidth.

In the AMD Vega GPU, the homogenous memory system allows for coherent memory accesses. In addition, the render back-ends are now clients of the L2 cache. This allows data to remain in the L1 and L2 caches, and not get flushed and refetched over and over again.

Next Page > The Complete AMD Vega GPU Architecture Slides

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The Complete AMD Vega GPU Architecture Slides

Here is the full set of official AMD slides on the AMD Vega GPU architecture for your perusal :

We hope to get our hands on the two AMD video presentations on the Vega GPU at the AMD Tech Summit. If we do get them, we will post them here. So check back later! 🙂

[adrotate group=”1″]

Go Back To > First PageComputer Hardware + Systems | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!