Tag Archives: AMD Infinity Fabric

AMD Instinct MI100 : 11.5 TFLOPS In A Single Card!

AMD Instinct MI100 : 11.5 TFLOPS In A Single Card!

AMD just announced the Instinct MI100 – the world’s fastest HPC GPU accelerator, delivering 11.5 TFLOPS in a single card!

 

AMD Instinct MI100 : 11.5 TFLOPS In A Single Card!

Powered by the new CDNA architecture, the AMD Instinct MI100 is the world’s fastest HPC GPU, and the first to break the 10 TFLOPS FP64 barrier!

Compared to the last-generation AMD accelerators, the AMD Instinct MI100 offers HPC applications almost 3.5X faster performance (FP32 matrix), and AI applications nearly 7X boost in throughput (FP16).

  • up to 11.5 TFLOPS of FP64 performance for HPC
  • up to 46.1 TFLOPS of FP32 Matrix performance for AI and machine learning
  • up to 184.6 TFLOPS of FP16 performance for AI training

2nd Gen AMD Infinity Fabric

It also leverages on the 2nd Gen AMD Infinity Fabric technology to deliver twice the peer-to-peer IO bandwidth of PCI Express 4.0. Thanks to its triple Infinity Fabric Links, it offers up to 340 GB/s of aggregate bandwidth per card.

In a server, MI100 GPUs can be configured as two fully-connected quad GPU hives, each providing up to 552 GB/s of P2P IO bandwidth.

Ultra-Fast HBM2 Memory

The AMD Instinct MI100 comes with 32 GB of HBM2 memory that deliver up to 1.23 TB/s of memory bandwidth to support large datasets.

PCI Express 4.0 Interface

The AMD Instinct MI100 is supports PCI Express 4.0, allowing for up to 64 GB/s of peak bandwidth from CPU to GPU, when paired with 2nd Gen AMD EPYC processors.

AMD Instinct MI100 : Specifications

Specifications AMD Instinct MI100
Fab Process 7 nm
Compute Units 120
Stream Processors 7,680
Peak BFLOAT16
Peak INT4 | INT8
Peak FP16
Peak FP32
Peak FMA32
Peak FP64 | FMA64
92.3 TFLOPS
184.6 TOPS
184.6 TFLOPS
46.1 TFLOPS
23.1 TFLOPS
11.5 TFLOPS
Memory 32 GB HBM2
Memory Interface 4,096 bits
Memory Clock 1.2 GHz
Memory Bandwidth 1.2 TB/s
Reliability Full Chip ECC
RAS Support
Scalability 3 x Infinity Fabric Links
OS Support Linux 64-bit
Bus Interface PCIe Gen 3 / Gen 4
Board Form Factor Full Height, Dual Slot
Board Length 10.5-inch long
Cooling Passively Cooled
Max Board Power 300 W TDP
Warranty 3-Years Limited

 

AMD Instinct MI100 : Availability

The AMD Instinct MI100 will be available in systems by the end of 2020 from OEM/ODM partners like Dell, Gigabyte, Hewlett Packard Enterprise (HPE), and Supermicro.

 

Recommended Reading

Go Back To > Enterprise ITComputer HardwareHome

 

Support Tech ARP!

If you like our work, you can help support us by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


AMD CDNA Architecture : Tech Highlights!

In addition to the gaming-centric RDNA architecture, AMD just introduced a new CDNA architecture that is optimised for compute workloads.

Here are some key tech highlights of the new AMD CDNA architecture!

 

AMD CDNA Architecture : What Is It?

Unlike the fixed-function graphics accelerators of the past, GPUs are now fully-programmable accelerators using what’s called the GPGPU (General Purpose GPU) Architecture.

GPGPU allowed the industry to leverage their tremendous processing power for machine learning and scientific computing purposes.

Instead of continuing down the GPGPU path, AMD has decided to introduce two architectures :

  • AMD RDNA : optimised for gaming to maximise frames per second
  • AMD CDNA : optimised for compute workloads to maximise FLOPS per second.

Designed to accelerate compute workloads, AMD CDNA augments scalar and vector processing with new Matrix Core Engines, and adds Infinity Fabric technology for scale-up capability.

This allows the first CDNA-based accelerator – AMD Instinct MI100 – to break the 10 TFLOPS per second (FP64) barrier.

The GPU is connected to its host processor using a PCI Express 4.0 interface, that delivers up to 32 GB/s of bandwidth in both directions.

 

AMD CDNA Architecture : Compute Units

The command processor and scheduling logic receives API-level commands and translates them into compute tasks.

These compute tasks are implemented as compute arrays and managed by the four Asynchronous Compute Engines (ACE), which maintain their independent stream of commands to the compute units.

Its 120 compute units (CUs) are derived from the earlier GCN architecture, and organised into four compute engines that execute wavefronts that contain 64 work-items.

The CUs are, however, enhanced with new Matrix Core Engines, that are optimised for matrix data processing.

Here is the block diagram of the AMD Instinct MI100 accelerator, showing how its main blocks are all tied together with the on-die Infinity Fabric.

Unlike the RDNA architecture, CDNA removes all of the fixed-function graphics hardware for tasks like rasterisation, tessellation, graphics caches, blending and even the display engine.

CDNA retains the dedicated logic for HEVC, H.264 and VP9 decoding that is sometimes used for compute workloads that operate on multimedia data.

The new Matrix Core Engines add a new family of wavefront-level instructions – the Matrix Fused Multiply-Add (MFMA). The MFMA instructions perform mixed-precision arithmetic and operates on KxN matrices using four different types of input data :

  • INT8 – 8-bit integers
  • FP16 – 16-bit half-precision
  • bf16 – 16-bit brain FP
  • FP32 – 32-bit single-precision

The new Matrix Core Engines has several advantages over the traditional vector pipelines in GCN :

  • the execution unit reduces the number of register file reads, since many input values are reused in a matrix multiplication
  • narrower datatypes create opportunity for workloads that do not require full FP32 precision, e.g. machine learning – saving energy.

 

AMD CDNA Architecture : L2 Cache + Memory

Most scientific and machine learning data sets are gigabytes or even terabytes in size. Therefore L2 cache and memory performance is critical.

In CDNA, the L2 cache is shared across the entire chip, and physically partitioned into multiple slices.

The MI100, specifically, has an 8 MB cache that is 16-way set-associative and made up of 32 slices. Each slice can sustain 128 bytes for an aggregate bandwidth of over 6 TB/s across the GPU.

The CDNA memory controller can drive 4- or 8-stacks high of HBM2 memory at 2.4 GT/s for a maximum throughput of 1.23 TB/s.

The memory contents are also protected by hardware ECC.

 

AMD CDNA Architecture : Communication + Scaling

CDNA is also designed for scaling up, using the high-speed Infinity Fabric technology to connect multiple GPUs.

AMD Infinity Fabric links are 16-bits wide, and operate at 23 GT/s, with three links in CDNA to allow for full connectivity in a quad-GPU configuration.

While the last generation Radeon Instinct MI50 GPU only uses a ring topology, the new fully-connected Infinity Fabric topology boosts performance for common communication patterns like all-reduce and scatter / gather.

Unlike PCI Express, Infinity Fabric links support coherent GPU memory, which lets multiple GPUs share an address space and tightly work on a single task.

 

Recommended Reading

Go Back To > Enterprise ITComputer HardwareHome

 

Support Tech ARP!

If you like our work, you can help support us by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


AMD Graphics Roadmap 2020 by David Wang

At AMD Financial Analyst Day 2020, David Wang unveiled the AMD graphics roadmap for 2020 and beyond. Check it out!

 

David Wang : AMD Senior VP of Engineering, Radeon Technologis Group

David Wang is senior vice president of engineering for the Radeon Technologies Group (RTG) at AMD.

In this role, Wang is responsible for all aspects of graphics engineering, including the technical strategy, architecture, hardware and software for AMD’s graphics products and technologies

With more than 25 years of graphics and silicon engineering experience, Wang brings deep technical expertise and an excellent track record in managing complex silicon development to AMD.

 

AMD Graphics Roadmap 2020 by David Wang

During AMD Financial Analyst Day 2020, David Wang unveiled the AMD graphics roadmap for 2020 and beyond in his presentation – Driving GPU Leadership.

Here are the key points from David Wang’s presentation :

  • The AMD Radeon DNA (AMD RDNA) architecture was designed for gaming and is currently powering the award-winning AMD Radeon RX 5000 series GPUs.

Here are the key points from David Wang’s presentation :

  • The next-generation AMD RDNA 2 architecture is planned to deliver a 50% performance-per-watt improvement over the first-generation AMD RDNA architecture.
  • The AMD RDNA 2 architecture will support hardware-accelerated ray tracing, variable rate shading (VRS) and other advanced features.
  • The first AMD RDNA 2-based products are expected to launch in late 2020.
  • AMD unveiled its new AMD Compute DNA (AMD CDNA) architecture, designed to accelerate data center compute workloads.
  • The first-generation AMD CDNA architecture, planned to launch later in 2020, includes 2nd Generation AMD Infinity Architecture to enhance GPU to GPU connectivity and is optimized for machine learning and high-performance computing applications.
  • The follow-up AMD CDNA 2 architecture will support 3rd Generation AMD Infinity Architecture to enable next generation exascale-class supercomputers.
  • Expanding on previous generations of the ROCm open source software platform for the data center, AMD plans to introduce ROCm 4.0 later this year as a complete software solution for high-performance computing exascale systems and machine learning workloads.

 

Recommended Reading

Go Back To > Computer Hardware | Business | Home

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


AMD Computing Roadmap 2020 by Mark Papermaster

At AMD Financial Analyst Day 2020, Mark Papermaster unveiled the AMD computing roadmap for 2020 and beyond. Check it out!

 

Mark Papermaster : AMD CTO & EVP (Technology & Engineering)

Mark Papermaster is chief technology officer and executive vice president of Technology and Engineering at AMD and is responsible for corporate technical direction, product development including system-on-chip (SOC) methodology, microprocessor design, I/O and memory and advanced research.

He led the re-design of engineering processes at AMD and the development of the award-winning “Zen” high-performance x86 CPU family, high-performance GPUs and the company’s modular design approach, Infinity Fabric. He also oversees Information Technology that delivers AMD’s compute infrastructure and services.

 

AMD Computing Roadmap 2020 by Mark Papermaster

During AMD Financial Analyst Day 2020, Mark Papermaster unveiled the AMD computing roadmap for 2020 and beyond in his presentation – Future of High Performance.

Here are the key points from Mark Papermaster’s presentation :

  • AMD plans to introduce the first processors based on its next-generation 7nm Zen 3 core in late 2020.
  • The Zen 4 core is currently in design and is targeted to use advanced 5nm process technology.
  • AMD unveiled plans to expand its chiplet and die stacking leadership, including new X3D packaging that combines chiplets and hybrid 2.5D and 3D die stacking to deliver more than a 10x increase in bandwidth density.
  • AMD announced its upcoming 3rd Generation AMD Infinity Architecture with optimized CPU and GPU memory coherency that can enable significant performance improvements and simplify the software programming required for accelerated computing solutions by allowing the CPU and GPU to seamlessly and coherently share the same memory.
  • AMD is building on its strong product security portfolio with expanded features. AMD announced it joined the Confidential Computing Consortium, a group of leading hardware and software companies working to close gaps to protect data through its entire lifecycle.

 

Recommended Reading

Go Back To > Computer Hardware | Business | Home

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


AMD 7nm Vega Presentation + Demo + First Look!

One of the biggest revelations at the AMD Computex 2018 press conference is how well along AMD is with their 7nm efforts. Everything appears to be chugging along as planned. AMD not only shared new details about the 7nm Vega GPU, they also showed off an actual sample!

 

The 7nm Vega Revealed!

Let’s start with this presentation on the 7nm Vega by David Wang, Senior Vice-President of Engineering at the Radeon Technologies Group. Gilbert Leung then demonstrated the performance of the 7nm Vega GPU, which has 32 GB of HBM2 memory, running Cinema4D R19 with Radeon ProRender.

Here are the key points from his presentation :

  • The AMD graphics roadmap from 2017 has not changed. The AMD Vega architecture will get a 7nm die shrink this year, before an architectural change with AMD Navi in 2019.
  • The 7nm die shrink will double power efficiency, and increase performance by 1.35X.
  • The first 7nm Vega GPU will be used in their Radeon Instinct Vega 7nm accelerator, just like how the first Vega GPUs were used in their first generation Radeon Instinct accelerators.

  • In addition to the 7nm die shrink, the Radeon Instinct Vega 7nm accelerator will feature the AMD Infinity Fabric interconnect for better multi GPU performance.
  • The Radeon Instinct Vega 7nm accelerator will also support hardware virtualisation for better security and performance in virtualised environments.

  • The Radeon Instinct Vega 7nm accelerator will come with new deep learning operations, that will not only help accelerate training and inference, but also blockchain applications.
  • The 7nm Vega GPU is sampling right now, and will launch in the second half of 2018 as the Radeon Instinct Vega 7nm accelerator.
[adrotate group=”1″]

 

First Look At 7nm Vega + 7nm EPYC!

In this video, Dr. Lisa Su shows off engineering samples of the 7nm EPYC processor (on the left), and the 7nm Vega GPU (on the right).

Go Back To > Computer Hardware + Systems | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The Complete AMD Ryzen “Summit Ridge” Tech Briefing

The AMD Tech Summit held in Sonoma, California from Dec 7-9, 2016 was not only very exclusive, it was highly secretive. The second major announcement we have been allowed to reveal is the new AMD Ryzen desktop CPU, formerly known as Summit Ridge.

Like our Radeon Instinct article, we will not only share what AMD revealed during the AMD Tech Summit at Sonoma, we will bring it to you in our videos. It will be as if you were there with us! Enjoy! 🙂

 

The AMD Ryzen Tech Briefing Summarised

For those who just want the quick low-down on the AMD Ryzen desktop processor, here are the key takeaway points :

  • The AMD Zen “Summit Ridge” processor is officially branded as the AMD Ryzen processor.
  • The first AMD Ryzen “Summit Ridge” processors will officially launch in Q1, 2017.
  • The AMD Zen-based server processor, codenamed Naples, will launch in Q2, 2017.
  • The AMD Zen-based notebook APU, codenamed Raven Ridge, will launch in H2, 2017.
  • The top Ryzen “Summit Ridge” processor SKU will have a 3.4 GHz base clock, or better. Its boost clock was not revealed.
  • The Ryzen “Summit Ridge” processors will have 8 cores and process 16 threads simultaneously.
  • The Ryzen “Summit Ridge” processors will have a 4 MB L2 cache and a 16 MB L3 cache.
  • The Ryzen “Summit Ridge” processors will feature the AMD Infinity Fabric network-centric interconnect technology.
  • The AMD Ryzen “Summit Ridge” processors will feature the AMD SenseMI sensing and adaptive technologies like :
    • Pure Power, which uses real-time sensors to support a closed-loop control through Infinity Fabric.
    • Precision Boost, which allows for fine-grained frequency control in 25 MHz increments.
    • Extended Frequency Range (XFR), that is fully automated and permits frequencies above the Precision Boost limits.
    • Neural Net Prediction, which anticipates future decisions, preloads instructions and chooses the best processing path.
    • Smart Prefetch algorithms have been greatly improved.
  • Preliminary benchmarks of the AMD Ryzen (3.4 GHz, no boost) showed that it was slightly faster than the Intel Core i7-6900K (3.2 GHz base, 3.7 GHz boost).
  • At idle, the total power consumption of the 3.4 GHz Ryzen system was about 13.5 W (12.67%) less than a Core i7-6900K system.
  • At full load, the total power consumption of the 3.4 GHz Ryzen system was about 3.8 W (1.98%) less than a Core i7-6900K system.

In the subsequent pages, we will give you the full low-down on the AMD Ryzen desktop processor, with the following presentations by AMD :

[adrotate banner=”4″]

We also prepared the complete video and slides of the AMD Ryzen tech briefing for your perusal :

Next Page > AMD Summit Ridge Tech Update, Performance, Infinity Fabric & SenseMI

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The AMD Summit Ridge Technology Update

Jim Anderson, SVP and GM of the AMD Computing and Graphics division, kicked off the session with an update on the AMD Summit Ridge desktop processor.

The key takeaway points :
  • The first AMD Summit Ridge processors will officially launch in Q1, 2017.
  • The AMD Zen-based server processor, codenamed Naples, will launch in Q2, 2017.
  • The AMD Zen-based notebook APU, codenamed Raven Ridge, will launch in H2, 2017.
  • The top AMD Summit Ridge processor SKU will have a 3.4 GHz base clock, or betterIts boost clock was not revealed.
  • The AMD Summit Ridge processors will have a 4 MB L2 cache and a 16 MB L3 cache.

 

The AMD Summit Ridge CPU’s Performance

The AMD Ryzen “Summit Ridge” processor may not be ready for primetime but how does it currently perform? John Taylor, Corporate Vice President, Worldwide Marketing at AMD, compared the performance of an AMD Ryzen processor running at 3.4 GHz (without boost) against the Intel Core i7-6900K running at the stock speed of 3.2 GHz (3.7 GHz boost).

The key takeaway point :
  • The AMD Ryzen running at 3.4 GHz is slightly faster than the Intel Core i7-6900K in this Handbrake benchmark.
[adrotate banner=”5″]

 

The Infinity Fabric & SenseMI Technologies

These technologies built into the AMD Ryzen processors are new revelations by AMD. Mark Papermaster, Senior Vice President & CTO of AMD explains what they are.

The key takeaway points about the AMD SenseMI sensing and adaptive technologies :
  • Pure Power uses real-time sensors to support a closed-loop control through Infinity Fabric.
  • Precision Boost allows for fine-grained frequency control in 25 MHz increments.
  • Extended Frequency Range (XFR) is fully automated and permits frequencies above the Precision Boost limits.
  • Neural Net Prediction anticipates future decisions, preloads instructions and chooses the best processing path.
  • Smart Prefetch algorithms have been greatly improved.

Next Page > Introducing AMD Ryzen, Performance & Power Efficiency, Closing Remarks

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Introducing AMD Ryzen

AMD President & CEO Dr. Lisa Su finally revealed that the AMD Zen processor will henceforth be known as the AMD Ryzen processor.

The key takeaway point :
  • The AMD Zen “Summit Ridge” processor is officially branded as the AMD Ryzen processor.

 

AMD Ryzen Performance & Power Efficiency Demo

Dr. Lisa Su then showed off a Blender 3D demonstration of the 3.4 GHz AMD Ryzen’s performance and power efficiency, compared to the Intel Core i7-6900K running at the stock speed of 3.2 GHz (3.7 GHz boost).

The key takeaway points :
  • The AMD Ryzen (3.4 GHz, no boost) was a fraction faster than the Intel Core i7-6900K (3.2 GHz base, 3.7 GHz boost).
  • At idle, the total power consumption of the 3.4 GHz Ryzen system was about 13.5 W (12.67%) less than a Core i7-6900K system.
  • At full load, the total power consumption of the 3.4 GHz Ryzen system was about 3.8 W (1.98%) less than a Core i7-6900K system.
[adrotate banner=”5″]

 

Closing Remarks On The AMD Ryzen Desktop CPU

In her closing remarks on the AMD Ryzen processor, Dr. Lisa Su also touched on the newly-launched Radeon Instinct, the upcoming AMD Vega GPU and Naples server processor.

The key takeaway points :
  • The AMD Vega GPU will launch in H1 2016 in both Gaming and Compute applications.
  • The first AMD Ryzen “Summit Ridge” processors will officially launch in Q1, 2017.
  • The AMD Zen-based server processor, codenamed Naples, will launch in Q2, 2017.
  • The AMD Zen-based notebook APU, codenamed Raven Ridge, will launch in H2, 2017.

Next Page > The Complete AMD Ryzen Tech Briefing Video & Slides

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The Complete AMD Ryzen “Summit Ridge” Tech Briefing

This is the complete AMD Ryzen “Summit Ridge” tech briefing.

[adrotate banner=”5″]

 

The Complete AMD Ryzen “Summit Ridge” Tech Briefing Slides

Here are the presentation slides from the AMD Ryzen “Summit Ridge” tech briefing for your perusal.

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!