Tag Archives: HBM2

AMD Radeon Pro VII : What You Need To Know!

AMD just launched Radeon Pro VII – their latest workstation graphics card for broadcast and engineering professionals!

Here is a quick primer on what you need to know about the AMD Radeon Pro VII, including the official tech briefing!


AMD Radeon Pro VII : What You Need To Know!

The new AMD Radeon Pro VII graphics card is designed for post-production teams to create 8K content, as well as engineers and data scientists working on complex models with large datasets.

Here are the Radeon Pro VII’s key features :

Leading Double Precision Performance

With up to 6.5 TFLOPS (FP64) of double precision performance for demanding engineering and scientific workloads, the Radeon Pro VII graphics card provides 5.6X the performance-per-dollar versus the NVIDIA Quadro RTX on the AMD Internal Benchmark for Altair EDEM “Screw Auger” viewset. 

High-Speed Memory

The AMD Radeon Pro VII has 16 GB of HBM2 with 1 TB/s memory bandwidth and full ECC capability, allowing it to handle large and complex models and datasets smoothly with low latency.

AMD Infinity Fabric Link

This high-bandwidth, low-latency connection allows memory sharing between two AMD Radeon Pro VII GPUs, delivering up to 5.25X PCIe 3.0 x16 bandwidth with a communication speed of up to 168 GB/s peer-to-peer between GPUs.

This allows you to increase project workload size and scale, develop more complex designs and run larger simulations to drive scientific discovery.

Remote Working

Users can access their physical workstation from virtually anywhere for unhindered productivity with the remote workstation IP built into AMD Radeon Pro Software for Enterprise driver.

PCIe Support

PCIe 4.0 delivers double the bandwidth of PCIe 3.0 to enable smooth performance for 8K, multichannel image interaction.

Frame Lock / Genlock

When paired with the AMD FirePro S400 synchronisation module, this allows for precise synchronised output for display walls, digital signage and other visual displays.

Multi-Display Support

The AMD Radeon Pro VII supports up to 6 synchronised display panels, full HDR and 8K screen resolution (single display) combined with ultra-fast encode and decode support for enhanced multi-stream workflows.


AMD Radeon Pro VII : Specifications

Specifications AMD Radeon Pro VII
GPU Vega 20
Transistor Count 13.2 Billion
Fabrication Process 7 nm
Die Size 331 mm²
Stream Processors 3840
Single Precision 13.1 TFLOPS
Double Precision 6.5 TFLOPS
Graphics Memory 16 GB ECC HBM2
Memory Bus Width 4096-bits
Memory Speed 1000 MHz
Memory Bandwidth 1024 GB/s
Display Ports 6 x Mini-DP 1.4
TDP 250 watts


AMD Radeon Pro VII : Price + Availability

The AMD Radeon VII has a launch price of US$1,899, and will be available starting mid-June 2020.


Recommended Reading

Go Back To > Computer | Enterprise | Home

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

AMD 7nm Vega Presentation + Demo + First Look!

One of the biggest revelations at the AMD Computex 2018 press conference is how well along AMD is with their 7nm efforts. Everything appears to be chugging along as planned. AMD not only shared new details about the 7nm Vega GPU, they also showed off an actual sample!


The 7nm Vega Revealed!

Let’s start with this presentation on the 7nm Vega by David Wang, Senior Vice-President of Engineering at the Radeon Technologies Group. Gilbert Leung then demonstrated the performance of the 7nm Vega GPU, which has 32 GB of HBM2 memory, running Cinema4D R19 with Radeon ProRender.

Here are the key points from his presentation :

  • The AMD graphics roadmap from 2017 has not changed. The AMD Vega architecture will get a 7nm die shrink this year, before an architectural change with AMD Navi in 2019.
  • The 7nm die shrink will double power efficiency, and increase performance by 1.35X.
  • The first 7nm Vega GPU will be used in their Radeon Instinct Vega 7nm accelerator, just like how the first Vega GPUs were used in their first generation Radeon Instinct accelerators.

  • In addition to the 7nm die shrink, the Radeon Instinct Vega 7nm accelerator will feature the AMD Infinity Fabric interconnect for better multi GPU performance.
  • The Radeon Instinct Vega 7nm accelerator will also support hardware virtualisation for better security and performance in virtualised environments.

  • The Radeon Instinct Vega 7nm accelerator will come with new deep learning operations, that will not only help accelerate training and inference, but also blockchain applications.
  • The 7nm Vega GPU is sampling right now, and will launch in the second half of 2018 as the Radeon Instinct Vega 7nm accelerator.
[adrotate group=”1″]


First Look At 7nm Vega + 7nm EPYC!

In this video, Dr. Lisa Su shows off engineering samples of the 7nm EPYC processor (on the left), and the 7nm Vega GPU (on the right).

Go Back To > Computer Hardware + Systems | Home


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Samsung Aquabolt – World’s Fastest HBM2 Memory Revealed!

2018-01-11 – Samsung Electronics today announced that it has started mass production of the Samsung Aquabolt – its 2nd-generation 8 GB High Bandwidth Memory-2 (HBM2) with the fastest data transmission speed on the market today. This is the industry’s first HBM2 to deliver a 2.4 Gbps data transfer speed per pin.


Samsung Aquabolt – World’s Fastest HBM2 Memory

Samsung’s new 8GB HBM2 delivers the highest level of DRAM performance, featuring a 2.4Gbps pin speed at 1.2V, which translates into a performance upgrade of nearly 50% per each package, compared to the 1st-generation 8GB HBM2 package with its 1.6Gbps pin speed at 1.2V and 2.0Gbps at 1.35V.

With these improvements, a single Samsung 8GB HBM2 package will offer a 307 GB/s data bandwidth – 9.6X faster than an 8 Gb GDDR5 chip, which provides a 32 GB/s data bandwidth. Using four of the new HBM2 packages in a system will enable a 1.2 TB/s bandwidth. This improves overall system performance by as much as 50%, compared to a system that uses the first-generation 1.6 Gbps HBM2 memory.

[adrotate group=”1″]


How Samsung Created Aquabolt

To achieve Aquabolt’s unprecedented performance, Samsung has applied new technologies related to TSV design and thermal control.

A single 8GB HBM2 package consists of eight 8Gb HBM2 dies, which are vertically interconnected using over 5,000 TSVs (Through Silicon Via’s) per die. While using so many TSVs can cause collateral clock skew, Samsung succeeded in minimizing the skew to a very modest level and significantly enhancing chip performance in the process.

In addition, Samsung increased the number of thermal bumps between the HBM2 dies, which enables stronger thermal control in each package. Also, the new HBM2 includes an additional protective layer at the bottom, which increases the package’s overall physical strength.

Go Back To > News | Home


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

NVIDIA TITAN V – The First Desktop Volta Graphics Card!

NVIDIA CEO Jensen Huang (recently anointed as Fortune 2017 Businessperson of the Year) made as surprise reveal at the NIPS conference – the NVIDIA TITAN V. This is the first desktop graphics card to be built on the latest NVIDIA Volta microarchitecture, and the first to use HBM2 memory.

In this article, we will share with you everything we know about the NVIDIA TITAN V, and how it compares against its TITANic predecessors. We will also share with you what we think could be a future NVIDIA TITAN Vp graphics card!

Updated @ 2017-12-10 : Added a section on gaming with the NVIDIA TITAN V [1].

Originally posted @ 2017-12-09



NVIDIA Volta isn’t exactly new. Back in GTC 2017, NVIDIA revealed NVIDIA Volta, the NVIDIA GV100 GPU and the first NVIDIA Volta-powered product – the NVIDIA Tesla V100. Jensen even highlighted the Tesla V100 in his Computex 2017 keynote, more than 6 months ago!

Yet there has been no desktop GPU built around NVIDIA Volta. NVIDIA continued to churn out new graphics cards built around the Pascal architecture – GeForce GTX 1080 Ti and GeForce GTX 1070 Ti. That changed with the NVIDIA TITAN V.



The NVIDIA GV100 is the first NVIDIA Volta-based GPU, and the largest they have ever built. Even using the latest 12 nm FFN (FinFET NVIDIA) process, it is still a massive chip at 815 mm²! Compare that to the GP100 (610 mm² @ 16 nm FinFET) and GK110 (552 mm² @ 28 nm).

That’s because the GV100 is built using a whooping 21.1 billion transistors. In addition to 5376 CUDA cores and 336 Texture Units, it boasts 672 Tensor cores and 6 MB of L2 cache. All those transistors require a whole lot more power – to the tune of 300 W.

[adrotate group=”1″]



That’s V for Volta… not the Roman numeral V or V for Vendetta. Powered by the NVIDIA GV100 GPU, the TITAN V has 5120 CUDA cores, 320 Texture Units, 640 Tensor cores, and a 4.5 MB L2 cache. It is paired with 12 GB of HBM2 memory (3 x 4GB stacks) running at 850 MHz.

The blowout picture of the NVIDIA TITAN V reveals even more details :

  • It has 3 DisplayPorts and one HDMI port.
  • It has 6-pin + 8-pin PCIe power inputs.
  • It has 16 power phases, and what appears to be the Founders Edition copper heatsink and vapour chamber cooler, with a gold-coloured shroud.
  • There is no SLI connector, only what appears to be an NVLink connector.

Here are more pictures of the NVIDIA TITAN V, courtesy of NVIDIA.


Can You Game On The NVIDIA TITAN V? New!

Right after Jensen announced the TITAN V, the inevitable question was raised on the Internet – can it run Crysis / PUBG?

The NVIDIA TITAN V is the most powerful GPU for the desktop PC, but that does not mean you can actually use it to play games. NVIDIA notably did not mention anything about gaming, only that the TITAN V is “ideal for developers who want to use their PCs to do work in AI, deep learning and high performance computing.

[adrotate group=”2″]

In fact, the TITAN V is not listed in their GeForce Gaming section. The most powerful graphics card in the GeForce Gaming section remains the TITAN Xp.

Then again, the TITAN V uses the same NVIDIA Game Ready Driver as GeForce gaming cards, starting with version 388.59. Even so, it is possible that some or many games may not run well or properly on the TITAN V.

Of course, all this is speculative in nature. All that remains to crack this mystery is for someone to buy the TITAN V and use it to play some games!

Next Page > Specification Comparison, NVIDIA TITAN Vp?, The Official Press Release


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The NVIDIA TITAN V Specification Comparison

Let’s take a look at the known specifications of the NVIDIA TITAN V, compared to the TITAN Xp (launched earlier this year), and the TITAN X (launched late last year). We also inserted the specifications of a hypothetical NVIDIA TITAN Vp, based on a full GV100.

MicroarchitectureNVIDIA VoltaNVIDIA VoltaNVIDIA PascalNVIDIA Pascal
Process Technology12 nm FinFET+12 nm FinFET+16 nm FinFET16 nm FinFET
Die Size815 mm²815 mm²471 mm²471 mm²
Tensor Cores672640NoneNone
CUDA Cores5376512038403584
Texture Units336320240224
L2 Cache Size6 MB4.5 MB3 MB4 MB
GPU Core ClockNA1200 MHz1405 MHz1417 MHz
GPU Boost ClockNA1455 MHz1582 MHz1531 MHz
Texture FillrateNA384.0 GT/s
465.6 GT/s
355.2 GT/s
379.7 GT/s
317.4 GT/s
342.9 GT/s
Pixel FillrateNANA142.1 GP/s
151.9 GP/s
136.0 GP/s
147.0 GP/s
Memory SizeNA12 GB12 GB12 GB
Memory Bus3072-bit3072-bit384-bit384-bit
Memory ClockNA850 MHz1426 MHz1250 MHz
Memory BandwidthNA652.8 GB/s547.7 GB/s480.0 GB/s
TDP300 watts250 watts250 watts250 watts
Multi GPU CapabilityNVLinkNVLinkSLISLI
Launch PriceNAUS$ 2999US$ 1200US$ 1200



In case you are wondering, the TITAN Vp does not exist. It is merely a hypothetical future model that we think NVIDIA may introduce mid-cycle, like the NVIDIA TITAN Xp.

Our TITAN Vp is based on the full capabilities of the NVIDIA GV100 GPU. That means it will have 5376 CUDA cores with 336 Texture Units, 672 Tensor cores and 6 MB of L2 cache. It will also have a higher TDP of 300 watts.

[adrotate group=”1″]


The Official NVIDIA TITAN V Press Release

December 9, 2017—NVIDIA today introduced TITAN V, the world’s most powerful GPU for the PC, driven by the world’s most advanced GPU architecture, NVIDIA Volta .

Announced by NVIDIA founder and CEO Jensen Huang at the annual NIPS conference, TITAN V excels at computational processing for scientific simulation. Its 21.1 billion transistors deliver 110 teraflops of raw horsepower, 9x that of its predecessor, and extreme energy efficiency.

“Our vision for Volta was to push the outer limits of high performance computing and AI. We broke new ground with its new processor architecture, instructions, numerical formats, memory architecture and processor links,” said Huang. “With TITAN V, we are putting Volta into the hands of researchers and scientists all over the world. I can’t wait to see their breakthrough discoveries.”

NVIDIA Supercomputing GPU Architecture, Now for the PC

TITAN V’s Volta architecture features a major redesign of the streaming multiprocessor that is at the center of the GPU. It doubles the energy efficiency of the previous generation Pascal design, enabling dramatic boosts in performance in the same power envelope.

New Tensor Cores designed specifically for deep learning deliver up to 9x higher peak teraflops. With independent parallel integer and floating-point data paths, Volta is also much more efficient on workloads with a mix of computation and addressing calculations. Its new combined L1 data cache and shared memory unit significantly improve performance while also simplifying programming.

Fabricated on a new TSMC 12-nanometer FFN high-performance manufacturing process customised for NVIDIA, TITAN V also incorporates Volta’s highly tuned 12GB HBM2 memory subsystem for advanced memory bandwidth utilisation.


Free AI Software on NVIDIA GPU Cloud

[adrotate group=”2″]

TITAN V’s incredible power is ideal for developers who want to use their PCs to do work in AI, deep learning and high performance computing.

Users of TITAN V can gain immediate access to the latest GPU-optimised AI, deep learning and HPC software by signing up at no charge for an NVIDIA GPU Cloud account. This container registry includes NVIDIA-optimised deep learning frameworks, third-party managed HPC applications, NVIDIA HPC visualisation tools and the NVIDIA TensorRT inferencing optimiser.

More Details : Now Everyone Can Use NVIDIA GPU Cloud!


Immediate Availability

TITAN V is available to purchase today for US$2,999 from the NVIDIA store in participating countries.

Go Back To > First PageArticles | Home


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Everything On The Intel CPU With Radeon Graphics! Rev. 2.0

Intel just dropped a bombshell with the announcement that they will be introducing the 8th Gen Intel CPU with Radeon graphics! This article will cover everything we can find on the upcoming 8th Gen Intel CPU with Radeon graphics, and will be updated as and when we get new information.

Updated @ 2017-11-09 : Added new information on the Chip Design, Power Saving and Performance aspects of the 8th Gen Intel CPU with Radeon graphics.

Originally posted @ 2017-11-07


The New Intel CPU With Radeon Graphics

The combination of an Intel CPU with Radeon graphics has long been mooted as a great way to tackle NVIDIA’s dominance of the mobile PC gaming market. Now it has finally become a reality. Here is a summary of what we know so far:

Chip Design

  • This is a multi-chip module (MCM) that combines an 8th Generation Intel Core-H mobile processor, with a customised Radeon GPU and HBM2 memory.
  • It is a mobile solution designed to deliver better gaming performance in thin and light laptops, or even smaller mobile devices (tablets?).
  • This will be the first product in the world to feature both HBM2 memory, and the Embedded Multi-Die Interconnect Bridge (EMIB).
  • As the video below shows, Intel is using the EMIB interconnect for high-bandwidth data transfers between the Radeon GPU and the HBM2 memory.
  • The distance between the CPU and the GPU is necessary to improve thermal dissipation.
  • Due to the distance between the CPU and GPU, they cannot possibly use EMIB, which can only be used for chips in close proximity. They are most likely using a regular PCI Express interconnect.

Space Saving

  • The EMIB interconnect is not only fast, it is embedded within the substrate, helping to further reduce the thickness of the package.
  • The use of stacked HBM2 memory, instead of separate GDDR5 memory chips, saves a lot of space and greatly reduces power consumption.
  • By combining the CPU, GPU and HBM2 memory on a multi-chip module, Intel claims it will save 1,900 mm² of board space.

Power Saving

  • This multi-chip module has “a unique power sharing framework” between the Intel CPU and the AMD Radeon GPU.
  • The power sharing framework is a combination of the EMIB interconnect, as well as special drivers and interfaces to the GPU.
  • The ratio of power shared between the CPU and GPU can be dynamically adjusted according to workload and usage models.
  • The power sharing framework will also help to manage temperature, power delivery and performance states in real time.
  • Intel HD Graphics will be used for less strenuous graphics functions, including video acceleration. This allows the Radeon GPU and HBM2 memory to be powered down to save power.


[adrotate group=”2″]
  • The Intel Core-H CPU used will be the 35W or 45W Kaby Lake Refresh processor, with the HD Graphics core intact.
  • Neither AMD nor Intel mentioned what Radeon GPU will be used, but their press releases emphasised that this solution is targeted at “enthusiasts” who want to play “AAA titles“.
  • This means it is most likely an AMD Vega GPU with more than 10 Compute Units, delivering better performance than the AMD Vega core in the new AMD Ryzen Mobile APUs.


  • Intel will introduce the 8th Gen Intel CPU with Radeon graphics in Q1, 2018.

Go Back To > Articles | Home


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The NVIDIA Quadro Pascal GPUs Launched

6 February 2017NVIDIA today introduced a range of NVIDIA Quadro products, all based on its Pascal architecture, that transform desktop workstations into supercomputers with breakthrough capabilities for professional workflows across many industries.

Workflows in design, engineering and other areas are evolving rapidly to meet the exponential growth in data size and complexity that comes with photorealism, virtual reality and deep learning technologies. To tap into these opportunities, the new NVIDIA Quadro Pascal-based lineup provides an enterprise-grade visual computing platform that streamlines design and simulation workflows with up to twice the performance of the previous generation, and ultra-fast memory.

“Professional workflows are now infused with artificial intelligence, virtual reality and photorealism, creating new challenges for our most demanding users,” said Bob Pette, vice president of Professional Visualisation at NVIDIA. “Our new Quadro lineup provides the graphics and compute performance required to address these challenges. And, by unifying compute and design, the Quadro GP100 transforms the average desktop workstation with the power of a supercomputer.”

Benefits of Quadro Pascal Visual Computing Platform

The new generation of Quadro Pascal-based GPUs –the GP100, P4000, P2000, P1000, P600 and P400 –enables millions of engineers, designers, researchers and artists to:

  • Unify simulation, HPC, rendering and design – The GP100 combines unprecedented double precision performance with 16GB of high-bandwidth memory (HBM2) so users can conduct simulations during the design process and gather realistic multiphysics simulations faster than ever before. Customers can combine two GP100 GPUs with NVLink technology and scale to 32GB of HBM2 to create a massive visual computing solution on a single workstation.
  • Explore deep learning – The GP100 provides more than 20 TFLOPS of 16-bit floating point precision computing–making it an ideal development platform to enable deep learning in Windows and Linux environments.
  • Incorporate VR into design and simulation workflows – The “VR Ready” Quadro GP100 and P4000 have the power to create detailed, lifelike, immersive environments. Larger, more complex designs can be experienced at scale.
  • Reap the benefits of photorealistic design – Pascal-based Quadro GPUs can render photorealistic images more than 18 times faster than a CPU.
  • Create expansive visual workspaces – Visualise data in high resolution and HDR color on up to four 5K displays.
  • Build massive digital signage configurations cost effectively – Up to 32 4K displays can be configured through a single chassis by combining up to eight P4000 GPUs and two Quadro Sync II cards.5

The new cards complete the entire NVIDIA Quadro Pascal lineup including the previously announced P6000, P5000 and mobile GPUs. The entire NVIDIA Quadro Pascal lineup supports the latest NVIDIA CUDA 8compute platform providing developers access to powerful new Pascal features in developer tools, performance enhancements and new libraries including nvGraph.


[adrotate group=”2″]

The entire NVIDIA Quadro family of desktop GPUs will be on display at SOLIDWORKS World, starting today at the Los Angeles Convention Center, NVIDIA booth 628. NVIDIA Quadro will be powering the most demanding CAD workflows from physically based rendering to virtual reality. Visit our partners’ booths to try the latest mobile workstations powered by our new QuadromobileGPUs.

NVIDIA Quadro Pascal Supercomputers Availability

The new NVIDIA Quadro products will be available starting in March from leading workstation OEMs, including Dell, HP, Lenovo and Fujitsu, and authorized distribution partners, including PNY Technologies in North America and Europe, ELSA/Ryoyo in Japan and Leadtek in Asia Pacific.

Go Back To > Enterprise | Home


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The AMD Vega GPU Architecture Tech Report

We can reveal the fourth and, arguably, the biggest news out of the AMD Tech Summit that was held in Sonoma, California from December 7-9, 2016 – details of the new AMD Vega GPU architecture!

In this article, we will reveal to you, the details of not just the Vega NCU (Next-Gen Compute Unit) and the HBM2 memory it uses, but also its spanking new High-Bandwidth Cache Controller. On top of that, we will delve into the new geometry pipeline and pixel engine!

As usual, we will offer you a summary of the key points, and greater details in the subsequent pages. Finally, we will give you the presentation slides and when we get it, the presentation video from the AMD Tech Summit in Sonoma.


The 4 Major Features In AMD Vega

As AMD’s next-generation GPU architecture, Vega will come with these 4 major features that will help it to leapfrog ahead of competing graphics architectures.

We will summarise the key points below. But for more details, click on the links above.


High-Bandwidth Cache

[adrotate group=”2″]
  • High-Bandwidth Cache = HBM2 memory + High-Bandwidth Cache Controller (HBCC)
  • HBM2 memory technology offers :
    • 2X bandwidth per pin over HBM
    • 5X power efficiency over GDDR5
    • 8X capacity / stack
    • Over 50% smaller footprint compared to GDDR5
  • The High-Bandwidth Cache Controller offers :
    • Access to 512 TB virtual address space
    • Adaptive, fine-grained data movement
  • AMD showcased the performance of the High-Bandwidth Cache using a real-time render of Joe Macri’s living room on Radeon ProRender with 700+ GB of data


New Programmable Geometry Pipeline

  • The new AMD Vega geometry pipeline has over 2X peak throughput per clock
  • There is a new primitive shader that allows primitives to be discarded at a high rate
  • A new Intelligent Workgroup Distributor allows for improved load balancing


Next-Generation Compute Unit (NCU)

  • The AMD Vega NCU has configurable precision, allowing it to process :
    • 512 8-bits ops per clock, or
    • 256 16-bit ops per clock, or
    • 128 32-bit ops per clock
  • It is also optimised for higher clock speeds and higher IPC
  • It boasts a larger instruction buffer, and higher clock speeds


Next-Generation Pixel Engine

  • The AMD Vega pixel engine has a new Draw Stream Binning Rasteriser
  • The new rasteriser is designed to improve performance while saving power
  • The on-chip bin cache allows the rasteriser to only “fetch once”
  • The rasteriser also “shade once” by culling pixels invisible in the final scene
  • The render back-ends are now clients of the L2 cache, which improves deferred shading performance


I Want To Know More!

If you would like to know more about the four main improvements in the AMD Vega GPU architecture, please click on the following links, or just go on to the next page.

Next Page > High-Bandwidth Cache, HBM2 Memory, Cache Controller, Why Is It Called A Cache?


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

High-Bandwidth Cache

AMD Vega will use HBM2 memory, as well as a new High-Bandwidth Cache Controller (HBCC). Together, they are known as the High Bandwidth Cache.

AMD showcased the performance of the High-Bandwidth Cache using a real-time render of Joe Macri’s living room on Radeon ProRender with 700+ GB of data. Although no frame rate was visible, the real-time render appeared to be very smooth.


The HBM2 Memory

HBM2 offers twice the transfer rate per pin (up to 2 GT/s), over its the first-generation HBM memory. This allows it to achieve up to 256 GB/s memory bandwidth per package.

In both HBM and HBM2, up to 8 memory dies can be stacked in a package. But moving to HBM2 allows for twice the memory density – up to 8 GB per package is now possible.


The High-Bandwidth Cache Controller

The second component of the High Bandwidth Cache is the new High-Bandwidth Cache Controller (HBCC). It creates a homogenous memory system for the AMD Vega GPU, with up to 512 TB of addressable memory.

It also allows for the adaptive, fine-grained movement of data between the AMD Vega GPU and the system memory, the NVRAM and the network storage (as part of Infinity Fabric).


Why Is It Called A Cache?

AMD calls the combination of the HBM2 memory and the High-Bandwidth Cache Controller the “High-Bandwidth Cache“. Techies may wonder why AMD chose to call it “cache”, instead of “memory”. After all, HBM2 memory is a type of fast graphics memory.

[adrotate group=”2″]

The answer lies in the AMD Vega’s heterogenous memory system. All memory in the system, whether it’s the HBM2 memory or shared memory from the computer’s SDRAM, is seen as a contiguous memory space. A big block of memory, irrespective of how fast they are.

That may be great for memory addressing, but may cause frequently-used data to be placed in slower memory. To avoid such an occurrence, the High-Bandwidth Cache Controller uses the HBM2 memory like a fast cache. This allows it to keep the most frequently-used data in the fastest memory available – the HBM2 memory.

Hence, the HBM2 memory functions like a cache in AMD Vega, and that is why AMD called the combination the High-Bandwidth Cache.

Next Page > New AMD Vega Geometry Pipeline, Compute Unit & Pixel Engine


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The New Programmable Geometry Pipeline

The AMD Vega features a new programmable geometry pipeline. It boasts over twice the peak throughput per clock, compared to the previous-generation geometry pipeline. This is achieved through two new improvements – primitive shaders and the Intelligent Workgroup Distributor.

The primitive shader stage is a completely new addition. It allows for primitives to be discarded at a much higher rate. The Intelligent Workgroup Distributor, on the other hand, improves the load balancing of work going to the large number of pipelines.


The Next-Generation Compute Unit (NCU)

The AMD Vega NCU is optimised for higher clock speeds, and higher IPC. It boasts a larger instruction buffer, and naturally – high clock speeds. But what’s unique is its flexible, configurable precision. This allows it to process, not just 64-bit and 32-bit operations, but also 16-bit and 8-bit operations. For example, the AMD Vega NCU can process :

  • 512 8-bits ops per clock, or
  • 256 16-bit ops per clock, or
  • 128 32-bit ops per clock

The Vega NCU does not “waste” computing power by only allowing one operation per clock. If it is a smaller-sized operation, it can be combined to maximise performance. This allows it to boost the performance of “packed math” applications.

This is how the new Radeon Instinct MI25 Vega accelerators deliver 25 teraflops of FP16 compute performance. You can read more about Radeon Instinct in the following articles :

[adrotate group=”1″]


The Next-Generation Pixel Engine

The AMD Vega pixel engine has a new Draw Stream Binning Rasteriser. It is designed to improve performance while saving power. The on-chip bin cache allows it to only “fetch once“, and it culls pixels invisible to the final scene so it can also “shade once“.

In previous GPU architectures, the pixel and texture memory accesses are non-coherent. That means the same data required by both pixel and texture shaders are not “visible”, and have to be fetched and flushed independently. This reduces efficiency and wastes cache bandwidth.

In the AMD Vega GPU, the homogenous memory system allows for coherent memory accesses. In addition, the render back-ends are now clients of the L2 cache. This allows data to remain in the L1 and L2 caches, and not get flushed and refetched over and over again.

Next Page > The Complete AMD Vega GPU Architecture Slides


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The Complete AMD Vega GPU Architecture Slides

Here is the full set of official AMD slides on the AMD Vega GPU architecture for your perusal :

We hope to get our hands on the two AMD video presentations on the Vega GPU at the AMD Tech Summit. If we do get them, we will post them here. So check back later! 🙂

[adrotate group=”1″]

Go Back To > First PageComputer Hardware + Systems | Home


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

NVIDIA Tesla P100 For PCIe-Based Servers Overview

On June 20, 2016, NVIDIA officially unveiled their Tesla P100 accelerator for PCIe-based servers. This is a long-expected PCI Express variant of the Tesla P100 accelerator that was launched in April using the NVIDIA NVLink interconnect. Let’s check out what’s new!


NVIDIA Tesla P100

The NVIDIA Tesla P100 was originally unveiled at the GPU Technology Conference on April 5, 2016. Touted as the world’s most advanced hyperscale data center accelerator, it was built around the new NVIDIA Pascal architecture and the proprietary NVIDIA NVLink high-speed GPU interconnect.

Like all other Pascal-based GPUs, the NVIDIA Tesla P100 is fabricated on the 16 nm FinFET process technology. Even with the much smaller process technology, the Tesla P100 is the largest FinFET chip ever built.

Unlike the Pascal-based GeForce GTX 1080 and GTX 1070 GPUs designed for desktop gaming though, the Tesla P100 uses HBM2 memory. In fact, the P100  is actually built on top of the HBM2 memory chips in a single package. This new package technology, Chip on Wafer on Substrate (CoWoS), allows for a 3X boost in memory bandwidth to 720 GB/s.

The NVIDIA NVLink interconnect allows up to eight Tesla P100 accelerators to be linked in a single node. This allows a single Tesla P100-based server node to outperform 48 dual-socket CPU server nodes.


Now Available With PCIe Interface

To make Tesla P100 available for HPC (High Performance Computing) applications, NVIDIA has just introduced the Tesla P100 with a PCI Express interface. This is basically the PCI Express version of the original Tesla P100.


Massive Leap In Performance

Such High Performance Computing servers can already make use of the NVIDIA Tesla K80 accelerators, that are based on the previous-generation NVIDIA Maxwell architecture. The new NVIDIA Pascal architecture, coupled with much faster HBM2 memory, allow for a massive leap in performance. Check out these results that NVIDIA provided :

Ultimately, the NVIDIA Tesla P100 for PCIe-based servers promises to deliver “dramatically more” performance for your money. As a bonus, the energy cost of running Tesla P100-based servers is much lower than CPU-based servers, and those savings accrue over time.

[adrotate banner=”5″]


Two Configurations

The NVIDIA Tesla P100 for PCIe-based servers will be slightly (~11-12%) slower than the NVLink version, turning out up to 4.7 teraflops of double-precision performance, 9.3 teraflops of single-precision performance, and 18.7 teraflops of half-precision performance.

The Tesla P100 will be offered in two configurations. The high-end configuration will have 16 GB of HBM2 memory with a maximum memory bandwidth of 720 GB/s. The lower-end configuration will have 12 GB of HBM2 memory with a maximum memory bandwidth of 540 GB/s.


Complete NVIDIA Slides

For those who are interested in more details, here are the NVIDIA Tesla P100 for PCIe-based Servers slides.

12 / 11
[adrotate banner=”5″]


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!