Tag Archives: AMD CDNA

AMD CDNA Architecture : Tech Highlights!

In addition to the gaming-centric RDNA architecture, AMD just introduced a new CDNA architecture that is optimised for compute workloads.

Here are some key tech highlights of the new AMD CDNA architecture!

 

AMD CDNA Architecture : What Is It?

Unlike the fixed-function graphics accelerators of the past, GPUs are now fully-programmable accelerators using what’s called the GPGPU (General Purpose GPU) Architecture.

GPGPU allowed the industry to leverage their tremendous processing power for machine learning and scientific computing purposes.

Instead of continuing down the GPGPU path, AMD has decided to introduce two architectures :

  • AMD RDNA : optimised for gaming to maximise frames per second
  • AMD CDNA : optimised for compute workloads to maximise FLOPS per second.

Designed to accelerate compute workloads, AMD CDNA augments scalar and vector processing with new Matrix Core Engines, and adds Infinity Fabric technology for scale-up capability.

This allows the first CDNA-based accelerator – AMD Instinct MI100 – to break the 10 TFLOPS per second (FP64) barrier.

The GPU is connected to its host processor using a PCI Express 4.0 interface, that delivers up to 32 GB/s of bandwidth in both directions.

 

AMD CDNA Architecture : Compute Units

The command processor and scheduling logic receives API-level commands and translates them into compute tasks.

These compute tasks are implemented as compute arrays and managed by the four Asynchronous Compute Engines (ACE), which maintain their independent stream of commands to the compute units.

Its 120 compute units (CUs) are derived from the earlier GCN architecture, and organised into four compute engines that execute wavefronts that contain 64 work-items.

The CUs are, however, enhanced with new Matrix Core Engines, that are optimised for matrix data processing.

Here is the block diagram of the AMD Instinct MI100 accelerator, showing how its main blocks are all tied together with the on-die Infinity Fabric.

Unlike the RDNA architecture, CDNA removes all of the fixed-function graphics hardware for tasks like rasterisation, tessellation, graphics caches, blending and even the display engine.

CDNA retains the dedicated logic for HEVC, H.264 and VP9 decoding that is sometimes used for compute workloads that operate on multimedia data.

The new Matrix Core Engines add a new family of wavefront-level instructions – the Matrix Fused Multiply-Add (MFMA). The MFMA instructions perform mixed-precision arithmetic and operates on KxN matrices using four different types of input data :

  • INT8 – 8-bit integers
  • FP16 – 16-bit half-precision
  • bf16 – 16-bit brain FP
  • FP32 – 32-bit single-precision

The new Matrix Core Engines has several advantages over the traditional vector pipelines in GCN :

  • the execution unit reduces the number of register file reads, since many input values are reused in a matrix multiplication
  • narrower datatypes create opportunity for workloads that do not require full FP32 precision, e.g. machine learning – saving energy.

 

AMD CDNA Architecture : L2 Cache + Memory

Most scientific and machine learning data sets are gigabytes or even terabytes in size. Therefore L2 cache and memory performance is critical.

In CDNA, the L2 cache is shared across the entire chip, and physically partitioned into multiple slices.

The MI100, specifically, has an 8 MB cache that is 16-way set-associative and made up of 32 slices. Each slice can sustain 128 bytes for an aggregate bandwidth of over 6 TB/s across the GPU.

The CDNA memory controller can drive 4- or 8-stacks high of HBM2 memory at 2.4 GT/s for a maximum throughput of 1.23 TB/s.

The memory contents are also protected by hardware ECC.

 

AMD CDNA Architecture : Communication + Scaling

CDNA is also designed for scaling up, using the high-speed Infinity Fabric technology to connect multiple GPUs.

AMD Infinity Fabric links are 16-bits wide, and operate at 23 GT/s, with three links in CDNA to allow for full connectivity in a quad-GPU configuration.

While the last generation Radeon Instinct MI50 GPU only uses a ring topology, the new fully-connected Infinity Fabric topology boosts performance for common communication patterns like all-reduce and scatter / gather.

Unlike PCI Express, Infinity Fabric links support coherent GPU memory, which lets multiple GPUs share an address space and tightly work on a single task.

 

Recommended Reading

Go Back To > Enterprise ITComputer HardwareHome

 

Support Tech ARP!

If you like our work, you can help support us by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


AMD Graphics Roadmap 2020 by David Wang

At AMD Financial Analyst Day 2020, David Wang unveiled the AMD graphics roadmap for 2020 and beyond. Check it out!

 

David Wang : AMD Senior VP of Engineering, Radeon Technologis Group

David Wang is senior vice president of engineering for the Radeon Technologies Group (RTG) at AMD.

In this role, Wang is responsible for all aspects of graphics engineering, including the technical strategy, architecture, hardware and software for AMD’s graphics products and technologies

With more than 25 years of graphics and silicon engineering experience, Wang brings deep technical expertise and an excellent track record in managing complex silicon development to AMD.

 

AMD Graphics Roadmap 2020 by David Wang

During AMD Financial Analyst Day 2020, David Wang unveiled the AMD graphics roadmap for 2020 and beyond in his presentation – Driving GPU Leadership.

Here are the key points from David Wang’s presentation :

  • The AMD Radeon DNA (AMD RDNA) architecture was designed for gaming and is currently powering the award-winning AMD Radeon RX 5000 series GPUs.

Here are the key points from David Wang’s presentation :

  • The next-generation AMD RDNA 2 architecture is planned to deliver a 50% performance-per-watt improvement over the first-generation AMD RDNA architecture.
  • The AMD RDNA 2 architecture will support hardware-accelerated ray tracing, variable rate shading (VRS) and other advanced features.
  • The first AMD RDNA 2-based products are expected to launch in late 2020.
  • AMD unveiled its new AMD Compute DNA (AMD CDNA) architecture, designed to accelerate data center compute workloads.
  • The first-generation AMD CDNA architecture, planned to launch later in 2020, includes 2nd Generation AMD Infinity Architecture to enhance GPU to GPU connectivity and is optimized for machine learning and high-performance computing applications.
  • The follow-up AMD CDNA 2 architecture will support 3rd Generation AMD Infinity Architecture to enable next generation exascale-class supercomputers.
  • Expanding on previous generations of the ROCm open source software platform for the data center, AMD plans to introduce ROCm 4.0 later this year as a complete software solution for high-performance computing exascale systems and machine learning workloads.

 

Recommended Reading

Go Back To > Computer Hardware | Business | Home

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!