Find out how NVIDIA created the new A800 GPU to bypass the US ban on sale of advanced chips to China!
NVIDIA Offers A800 GPU To Bypass US Ban On China!
Two months after it was banned by the US government from selling high-performance AI chips to China, NVIDIA introduced a new A800 GPU designed to bypass those restrictions.
The new NVIDIA A800 is based on the same Ampere microarchitecture as the A100, which was used as the performance baseline by the US government.
Despite its numerically larger model number (the lucky number 8 was probably picked to appeal to the Chinese), this is a detuned part, with slightly reduced performance to meet export control limitations.
The NVIDIA A800 GPU, which went into production in Q3, is another alternative product to the NVIDIA A100 GPU for customers in China.
The A800 meets the U.S. government’s clear test for reduced export control and cannot be programmed to exceed it.
NVIDIA is probably hoping that the slightly slower NVIDIA A800 GPU will allow it to continue supplying China with A100-level chips that are used to power supercomputers and high-performance datacenters for artificial intelligence applications.
As I will show you in the next section, except in very high-end applications, there won’t be truly significant performance difference between the A800 and the A100. So NVIDIA customers who want or need the A100 will have no issue opting for the A800 instead.
However, this can only be a stopgap fix, as NVIDIA is stuck selling A100-level chips to China until and unless the US government changes its mind.
Read more : AMD, NVIDIA Banned From Selling AI Chips To China!
How Fast Is The NVIDIA A800 GPU?
The US government considers the NVIDIA A100 as the performance baseline for its export control restrictions on China.
Any chip equal or faster to that Ampere-based chip, which was launched on May 14, 2020, is forbidden to be sold or exported to China. But as they say, the devil is in the details.
The US government didn’t specify just how much slower chips must be, to qualify for export to China. So NVIDIA could technically get away by slightly detuning the A100, while offering almost the same performance level.
And that was what NVIDIA did with the A800 – it is basically the A100 with a 33% slower NVLink interconnect speed. NVIDIA also limited the maximum number of GPUs supported in a single server to 8.
That only slightly reduces the performance of A800 servers, compare to A100 servers, while offering the same amount of GPU compute performance. Most users will not notice the difference.
The only significant impediment is on the very high-end – Chinese companies are now restricted to a maximum of eight GPUs per server, instead of up to sixteen.
To show you what I mean, I dug into the A800 specifications, and compared them to the A100 below:
NVIDIA A100 vs A800 : 80GB PCIe Version
Specifications | A100 80GB PCIe |
A800 80GB PCIe |
FP64 | 9.7 TFLOPS | |
FP64 Tensor Core | 19.5 TFLOPS | |
FP32 | 19.5 TFLOPS | |
Tensor Float 32 | 156 TFLOPS | |
BFLOAT 16 Tensor Core | 312 TFLOPS | |
FP16 Tensor Core | 312 TFLOPS | |
INT8 Tensor Core | 624 TOPS | |
GPU Memory | 80 GB HBM2 | |
GPU Memory Bandwifth | 1,935 GB/s | |
TDP | 300 W | |
Multi-Instance GPU | Up to 7 MIGs @ 10 GB | |
Interconnect | NVLink : 600 GB/s PCIe Gen4 : 64 GB/s |
NVLink : 400 GB/s PCIe Gen4 : 64 GB/s |
Server Options | 1-8 GPUs |
NVIDIA A100 vs A800 : 80GB SXM Version
Specifications | A100 80GB SXM |
A800 80GB SXM |
FP64 | 9.7 TFLOPS | |
FP64 Tensor Core | 19.5 TFLOPS | |
FP32 | 19.5 TFLOPS | |
Tensor Float 32 | 156 TFLOPS | |
BFLOAT 16 Tensor Core | 312 TFLOPS | |
FP16 Tensor Core | 312 TFLOPS | |
INT8 Tensor Core | 624 TOPS | |
GPU Memory | 80 GB HBM2 | |
GPU Memory Bandwifth | 2,039 GB/s | |
TDP | 400 W | |
Multi-Instance GPU | Up to 7 MIGs @ 10 GB | |
Interconnect | NVLink : 600 GB/s PCIe Gen4 : 64 GB/s |
NVLink : 400 GB/s PCIe Gen4 : 64 GB/s |
Server Options | 4/ 8 / 16 GPUs | 4 / 8 GPUs |
NVIDIA A100 vs A800 : 40GB PCIe Version
Specifications | A100 40GB PCIe |
A800 40GB PCIe |
FP64 | 9.7 TFLOPS | |
FP64 Tensor Core | 19.5 TFLOPS | |
FP32 | 19.5 TFLOPS | |
Tensor Float 32 | 156 TFLOPS | |
BFLOAT 16 Tensor Core | 312 TFLOPS | |
FP16 Tensor Core | 312 TFLOPS | |
INT8 Tensor Core | 624 TOPS | |
GPU Memory | 40 GB HBM2 | |
GPU Memory Bandwifth | 1,555 GB/s | |
TDP | 250 W | |
Multi-Instance GPU | Up to 7 MIGs @ 10 GB | |
Interconnect | NVLink : 600 GB/s PCIe Gen4 : 64 GB/s |
NVLink : 400 GB/s PCIe Gen4 : 64 GB/s |
Server Options | 1-8 GPUs |
Please Support My Work!
Name : Adrian Wong
Bank Transfer : CIMB 7064555917 (Swift Code : CIBBMYKL)
Credit Card / Paypal : https://paypal.me/techarp
Dr. Adrian Wong has been writing about tech and science since 1997, even publishing a book with Prentice Hall called Breaking Through The BIOS Barrier (ISBN 978-0131455368) while in medical school.
He continues to devote countless hours every day writing about tech, medicine and science, in his pursuit of facts in a post-truth world.
Recommended Reading
- iPhone Factory Under Lockdown, As Employees Flee!
- Samsung Galaxy Z Flip4 FlexCam Photography Tips!
- Russian Gov Caught Fabricating Dirty Bomb Evidence!
- Chinese Spies Caught Interfering With HUAWEI Case!
- Apple Freezes Use Of China’s YMTC NAND Chips!
Go Back To > Business | Computer | Tech ARP
Support Tech ARP!
Please support us by visiting our sponsors, participating in the Tech ARP Forums, or donating to our fund. Thank you!