Tag Archives: Heterogenous computing

Intel oneAPI Unified Programming Model Overview!

At Supercomputing 2019, Intel unveiled their oneAPI initiative for heterogenous computing, promising to deliver a unified programming experience for developers.

Here is an overview of the Intel oneAPI unified programming model, and what it means for programmers!


The Need For Intel oneAPI

The modern computing environment is now a lot less CPU-centric, with the greater adoption of GPUs, FGPAs and custom-built accelerators (like the Alibaba Hanguang 800).

Their different scalar, vector, matrix and spatial architectures require different APIs and code bases, which complicates attempts to utilise a mix of those capabilities.


Intel oneAPI For Heterogenous Computing

Intel oneAPI promises to change all that, offering a unified programming model for those different architectures.

It allows developers to create workloads and applications for multiple architectures on their platform of choice, without the need to develop and maintain separate code bases, tools and workflow.

Intel oneAPI comprises of two components – the open industry initiative, and the Intel oneAPI beta toolkit :

oneAPI Initiative

This is a cross-architecture development model based on industry standards, and an open specification, to encourage broader adoption.

Intel oneAPI Beta Toolkit

This beta toolkit offers the Intel oneAPI specification components with direct programming (Data Parallel C++), API-based programming with performance libraries, advanced analysis and debug tools.

Developers can test code and workloads in the Intel DevCloud for oneAPI on multiple Intel architectures.


What Processors + Accelerators Are Supported By Intel oneAPI?

The beta Intel oneAPI reference implementation currently supports these Intel platforms :

  • Intel Xeon Scalable processors
  • Intel Core and Atom processors
  • Intel processor graphics (as a proxy for future Intel discrete data centre GPUs)
  • Intel FPGAs (Intel Arria, Stratix)

The oneAPI specification is designed to support a broad range of CPUs and accelerators from multiple vendors. However, it is up to those vendors to create their own oneAPI implementations and optimise them for their own hardware.


Are oneAPI Elements Open-Sourced?

Many oneAPI libraries and components are already, or will soon be open sourced.


What Companies Are Participating In The oneAPI Initiative?

According to Intel, more than 30 vendors and research organisations support the oneAPI initiative, including CERN openlab, SAP and the University of Cambridge.

Companies that create their own implementation of oneAPI and complete a self-certification process will be allowed to use the oneAPI initiative brand and logo.


Available Intel oneAPI Toolkits

At the time of its launch (17 November 2019), here are the toolkits that Intel has made available for developers to download and use :

Intel oneAPI Base Toolkit (Beta)

This foundational kit enables developers of all types to build, test, and deploy performance-driven, data-centric applications across CPUs, GPUs, and FPGAs. Comes with :

[adrotate group=”2″]
  • Intel oneAPI Data Parallel C++ Compiler
  • Intel Distribution for Python
  • Multiple optimized libraries
  • Advanced analysis and debugging tools

Domain Specific oneAPI Toolkits for Specialised Workloads :

  • oneAPI HPC Toolkit (beta) : Deliver fast C++, Fortran, OpenMP, and MPI applications that scale.
  • oneAPI DL Framework Developer Toolkit (beta) : Build deep learning frameworks or customize existing ones.
  • oneAPI IoT Toolkit (beta) : Build high-performing, efficient, reliable solutions that run at the network’s edge.
  • oneAPI Rendering Toolkit (beta) : Create high-performance, high-fidelity visualization applications.

Additional Toolkits, Powered by oneAPI

  • Intel AI Analytics Toolkit (beta) : Speed AI development with tools for DL training, inference, and data analytics.
  • Intel Distribution of OpenVINO Toolkit : Deploy high-performance inference applications from device to cloud.
  • Intel System Bring-Up Toolkit (beta) : Debug and tune systems for power and performance.

You can download all of those toolkits here.


Recommended Reading

Go Back To > Business + Enterprise | Home


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Joe Macri : The Disruptive Nature of AMD Ryzen

Last week, AMD Corporate Vice President, Product Chief Technology Officer and Corporate Fellow, Joe Macri, flew in to brief us on the disruptive nature of the new AMD Ryzen processors. Join us for his full tech briefing!

If you are wondering why Adam Kozak, Radeon Product Marketing Manager (far left), is here as well, check out his presentation on the AMD Radeon RX 500 Series graphics cards! 😀


The Disruptive Nature of Ryzen

Joe Macri’s presentation is actually titled “The New Era“, but we think it more accurately describes the disruptive nature of the new Ryzen processors. Not only has the Ryzen proven to be a real winner, it has also fired up interest in desktop computing that has never been seen in many years hence.

Here are the key takeaway points :

  • Moore’s law has slowed down, so the industry is evolving to “Moore’s Law+” with new process technology, microarchitecture, integration technology and software.
  • Despite the increase in computing power, we are still far from achieving 1000 TFLOPS required to achieve full presence capability.
  • The AMD Zen core delivers an increase in IPC (instructions per clock) of more than 52% over the previous microarchitecture.[adrotate banner=”4″]
  • The AMD Zen core also delivers 10% better area efficiency than the 7th Generation Intel Core processor (codenamed Kaby Lake).
  • The AMD Zen core delivers 270% better Cinebench multi-core performance / watt over its predecessor.
  • AMD has design teams working on the Zen 2 and Zen 3 cores, just as they have teams working on the Navi GPU that will come after this year’s Vega GPU.
  • AMD is focused on open solutions for heterogenous computing, like the HSA Foundation, Radeon Open Compute (ROCm) and the open interconnect standards.


The Presentation Slides


Other Ryzen-Related Articles

Don’t forget to also read our other Ryzen-related articles :

[adrotate banner=”4″]


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The Complete AMD Radeon Instinct Tech Briefing Rev. 3.0

The AMD Tech Summit held in Sonoma, California from December 7-9, 2016 was not only very exclusive, it was highly secretive. The first major announcement we have been allowed to reveal is the new AMD Radeon Instinct heterogenous computing platform.

In this article, you will hear from AMD what the Radeon Instinct platform is all about. As usual, we have a ton of videos from the event, so it will be as if you were there with us. Enjoy! 🙂

Originally published @ 2016-12-12

Updated @ 2017-01-11 : Two of the videos were edited to comply with the NDA. Now that the NDA on AMD Vega has been lifted, we replaced the two videos with their full, unedited versions. We also made other changes, including adding links to the other AMD Tech Summit articles.

Updated @ 2017-01-20 : Replaced an incorrect slide, and a video featuring that slide. Made other small updates to the article.


The AMD Radeon Instinct Platform Summarised

For those who want the quick low-down on AMD Radeon Instinct, here are the key takeaway points :

  • The AMD Radeon Instinct platform is made up of two components – hardware and software.
  • The hardware components are the AMD Radeon Instinct accelerators built around the current Polaris and the upcoming Vega GPUs.
  • The software component is the AMD Radeon Open Compute (ROCm) platform, which includes the new MIOpen open-source deep learning library.
  • The first three Radeon Instinct accelerator cards are the MI6, MI8 and MI25 Vega with NCU.
  • The AMD Radeon Instinct MI6 is a passively-cooled inference accelerator with 5.7 TFLOPS of FP16 processing power, 224 GB/s of memory bandwidth, and a TDP of <150 W. It will come with 16 GB of GDDR5 memory.
  • The AMD Radeon Instinct MI8 is a small form-factor (SFF) accelerator with 8.2 TFLOPS of processing power, 512 GB/s of memory bandwidth, and a TDP of <175 W. It will come with 4 GB of HBM memory.
  • The AMD Radeon Instinct MI25 Vega with NCU is a passively-cooled training accelerator with 25 TFLOPS of processing power, support for 2X packed math, a High Bandwidth Cache and Controller, and a TDP of <300 W.
  • The Radeon Instinct accelerators will all be built exclusively by AMD.
  • The Radeon Instinct accelerators will all support MxGPU SRIOV hardware virtualisation.
  • The Radeon Instinct accelerators are all passively cooled.
  • The Radeon Instinct accelerators will all have large BAR (Base Address Register) support for multiple GPUs.
  • The upcoming AMD Zen “Naples” server platform is designed to supported multiple Radeon Instinct accelerators through a high-speed network fabric.
  • The ROCm platform is not only open source, it will support a multitude of standards in addition to MIOpen.
  • The MIOpen deep learning library is open source, and will be available in Q1 2017.
  • The MIOpen deep learning library is optimised for Radeon Instinct, allowing for 3X better performance in machine learning.
  • AMD Radeon Instinct accelerators will be significantly faster than NVIDIA Titan X GPUs based on the Maxwell and Pascal architectures.

In the subsequent pages, we will give you the full low-down on the Radeon Instinct platform, with the following presentations by AMD :

[adrotate banner=”4″]

We also prepared the complete video and slides of the Radeon Instinct tech briefing for your perusal :

Next Page > Heterogenous Computing, The Radeon Instinct Accelerators, MIOpen, Performance


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Why Is Heterogenous Computing Important?

Dr. Lisa Su, kicked things off with an inside look at her two-year long journey as AMD President and CEO. Then she revealed why Heterogenous Computing is an important part of AMD’s future going forward. She also mentioned the success of the recently-released Radeon Software Crimson ReLive Edition.


Here Are The New AMD Radeon Instinct Accelerators!

Next, Raja Koduri, Senior Vice President and Chief Architect of the Radeon Technologies Group, officially revealed the new AMD Radeon Instinct accelerators.


The MIOpen Deep Learning Library For Radeon Instinct

MIOpen is a new deep learning library optimised for Radeon Instinct. It is open source and will become part of the Radeon Open Compute (ROCm) platform. It will be available in Q1 2017.

[adrotate banner=”5″]


The Performance Advantage Of Radeon Instinct & MIOpen

MIOpen is optimised for Radeon Instinct, offering 3X better performance in machine learning. It allows the Radeon Instinct accelerators to be significantly faster than NVIDIA Titan X GPUs based on the Maxwell and Pascal architectures.

Next Page > Radeon Instinct MI25 & MI8 Demos, Zen “Naples” Platform, The First Servers, ROCm Discussion


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The Radeon Instinct MI25 Training Demonstration

Raja Koduri roped in Ben Sander, Senior Fellow at AMD, to show off the Radeon Instinct MI25 running a training demo.


The Radeon Instinct MI8 Visual Inference Demonstration

The visual inference demo is probably much easier to grasp, as it is visual in nature. AMD used the Radeon Instinct MI8 in this example.


The Radeon Instinct On The Zen “Naples” Platform

The upcoming AMD Zen “Naples” server platform is designed to supported multiple AMD Radeon Instinct accelerators through a high-speed network fabric.

[adrotate banner=”5″]


The First Radeon Instinct Servers

This is not a vapourware launch. Raja Koduri revealed the first slew of Radeon Instinct servers that will hit the market in H1 2017.


The Radeon Open Compute (ROCm) Platform Discussion

To illustrate the importance of heterogenous computing on Radeon Instinct, Greg Stoner (ROCm Senior Director at AMD), hosted a panel of AMD partners and early adopters in using the Radeon Open Compute (ROCm) platform.

Next Page > Closing Remarks On Radeon Instinct, The Complete Radeon Instinct Tech Briefing Video & Slides


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Closing Remarks On Radeon Instinct

Finally, Raja Koduri concluded the launch of the Radeon Instinct Initiative with some closing remarks on the recent Radeon Software Crimson ReLive Edition.


The Complete AMD Radeon Instinct Tech Briefing

This is the complete AMD Radeon Instinct tech briefing. Our earlier video was edited to comply with the AMD Vega NDA (which has now expired).

[adrotate banner=”5″]


The Complete AMD Radeon Instinct Tech Briefing Slides

Here are the Radeon Instinct presentation slides for your perusal.


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

AMD GPUOpen Initiative – 3 New Developments

The AMD Radeon Technologies Group has just announced the AMD GPUOpen initiative at their Technology Summit today. This is part of their effort to improve performance from the software side, as well as access to open source drivers and tools. Let’s take a look at the Radeon Technologies Group’s presentation below.


AMD GPUOpen For Gaming

As a continuation of the strategy we started with Mantle, we are giving even more control of the GPU to developers. As console developers have benefited from low-level access to the GPU, AMD wants to continue to bring this level of access to the PC space.

1 / 10

AMD GPUOpen for gaming is giving developers the ability to harness the investments they’ve made on console development, including feature-rich, close-to-the-metal programming, and bring that to life on PC game development. Game developers will now have direct access to GPU hardware, access to a large collection of open source effects, tools, libraries and SDKs.

As such, in early 2016, libraries and samples i.e. source access to the library directly will be made available from AMD. GPUOpen is the primary vehicle to allow low-level access to the GPU.


New Compiler For Heterogenous Computing

One of the primary goals of Heterogeneous Systems Architecture (HSA) is easing the development of parallel applications through the use of higher level languages. The new AMD “Boltzmann Initiative” suite includes an HCC compiler for C++ development, greatly expanding the field of programmers who can leverage HSA.

2 / 6

The new HCC C++ compiler is a key tool in enabling developers to easily and efficiently apply discrete GPU hardware resources in heterogeneous systems. A Heterogeneous Compute Compiler that compiles an Open Source C++ Compiler for GPUs, and HIP allows developers to convert CUDA code to portable C++. AMD testing shows that in many cases 90 percent or more of CUDA code can be automatically converted into C++ by HIP with the final 10 percent converted manually in the widely popular C++ language.


Linux Driver & Runtime For HPC Cluster Computing

Demonstrating its commitment to Linux, AMD developed a new HPC-focused open source driver and system runtime.

This new headless Linux driver brings key capabilities to address core high-performance computing needs, including low latency compute dispatch and PCIe® data transfers; peer-to-peer GPU support; Remote Direct Memory Access (RDMA) from InfiniBand™ that interconnects directly to GPU memory; and Large Single Memory Allocation support.



An early access program for the “Boltzmann Initiative” tools is planned for Q1 2016.