How NVIDIA A800 Bypasses US Chip Ban On China!

Spread the love

Find out how NVIDIA created the new A800 GPU to bypass the US ban on sale of advanced chips to China!

 

NVIDIA Offers A800 GPU To Bypass US Ban On China!

Two months after it was banned by the US government from selling high-performance AI chips to China, NVIDIA introduced a new A800 GPU designed to bypass those restrictions.

The new NVIDIA A800 is based on the same Ampere microarchitecture as the A100, which was used as the performance baseline by the US government.

Despite its numerically larger model number (the lucky number 8 was probably picked to appeal to the Chinese), this is a detuned part, with slightly reduced performance to meet export control limitations.

The NVIDIA A800 GPU, which went into production in Q3, is another alternative product to the NVIDIA A100 GPU for customers in China.

The A800 meets the U.S. government’s clear test for reduced export control and cannot be programmed to exceed it.

NVIDIA is probably hoping that the slightly slower NVIDIA A800 GPU will allow it to continue supplying China with A100-level chips that are used to power supercomputers and high-performance datacenters for artificial intelligence applications.

As I will show you in the next section, except in very high-end applications, there won’t be truly significant performance difference between the A800 and the A100. So NVIDIA customers who want or need the A100 will have no issue opting for the A800 instead.

However, this can only be a stopgap fix, as NVIDIA is stuck selling A100-level chips to China until and unless the US government changes its mind.

Read more : AMD, NVIDIA Banned From Selling AI Chips To China!

NVIDIA Offers A800 GPU To Bypass US Ban On China!

 

How Fast Is The NVIDIA A800 GPU?

The US government considers the NVIDIA A100 as the performance baseline for its export control restrictions on China.

Any chip equal or faster to that Ampere-based chip, which was launched on May 14, 2020, is forbidden to be sold or exported to China. But as they say, the devil is in the details.

The US government didn’t specify just how much slower chips must be, to qualify for export to China. So NVIDIA could technically get away by slightly detuning the A100, while offering almost the same performance level.

And that was what NVIDIA did with the A800 – it is basically the A100 with a 33% slower NVLink interconnect speed. NVIDIA also limited the maximum number of GPUs supported in a single server to 8.

That only slightly reduces the performance of A800 servers, compare to A100 servers, while offering the same amount of GPU compute performance. Most users will not notice the difference.

The only significant impediment is on the very high-end – Chinese companies are now restricted to a maximum of eight GPUs per server, instead of up to sixteen.

To show you what I mean, I dug into the A800 specifications, and compared them to the A100 below:

NVIDIA A100 vs A800 : 80GB PCIe Version

Specifications A100
80GB PCIe
A800
80GB PCIe
FP64 9.7 TFLOPS
FP64 Tensor Core 19.5 TFLOPS
FP32 19.5 TFLOPS
Tensor Float 32 156 TFLOPS
BFLOAT 16 Tensor Core 312 TFLOPS
FP16 Tensor Core 312 TFLOPS
INT8 Tensor Core 624 TOPS
GPU Memory 80 GB HBM2
GPU Memory Bandwifth 1,935 GB/s
TDP 300 W
Multi-Instance GPU Up to 7 MIGs @ 10 GB
Interconnect NVLink : 600 GB/s
PCIe Gen4 : 64 GB/s
NVLink : 400 GB/s
PCIe Gen4 : 64 GB/s
Server Options 1-8 GPUs

NVIDIA A100 vs A800 : 80GB SXM Version

Specifications A100
80GB SXM
A800
80GB SXM
FP64 9.7 TFLOPS
FP64 Tensor Core 19.5 TFLOPS
FP32 19.5 TFLOPS
Tensor Float 32 156 TFLOPS
BFLOAT 16 Tensor Core 312 TFLOPS
FP16 Tensor Core 312 TFLOPS
INT8 Tensor Core 624 TOPS
GPU Memory 80 GB HBM2
GPU Memory Bandwifth 2,039 GB/s
TDP 400 W
Multi-Instance GPU Up to 7 MIGs @ 10 GB
Interconnect NVLink : 600 GB/s
PCIe Gen4 : 64 GB/s
NVLink : 400 GB/s
PCIe Gen4 : 64 GB/s
Server Options 4/ 8 / 16 GPUs 4 / 8 GPUs

NVIDIA A100 vs A800 : 40GB PCIe Version

Specifications A100
40GB PCIe
A800
40GB PCIe
FP64 9.7 TFLOPS
FP64 Tensor Core 19.5 TFLOPS
FP32 19.5 TFLOPS
Tensor Float 32 156 TFLOPS
BFLOAT 16 Tensor Core 312 TFLOPS
FP16 Tensor Core 312 TFLOPS
INT8 Tensor Core 624 TOPS
GPU Memory 40 GB HBM2
GPU Memory Bandwifth 1,555 GB/s
TDP 250 W
Multi-Instance GPU Up to 7 MIGs @ 10 GB
Interconnect NVLink : 600 GB/s
PCIe Gen4 : 64 GB/s
NVLink : 400 GB/s
PCIe Gen4 : 64 GB/s
Server Options 1-8 GPUs

 

Please Support My Work!

Support my work through a bank transfer /  PayPal / credit card!

Name : Adrian Wong
Bank Transfer : CIMB 7064555917 (Swift Code : CIBBMYKL)
Credit Card / Paypal : https://paypal.me/techarp

Dr. Adrian Wong has been writing about tech and science since 1997, even publishing a book with Prentice Hall called Breaking Through The BIOS Barrier (ISBN 978-0131455368) while in medical school.

He continues to devote countless hours every day writing about tech, medicine and science, in his pursuit of facts in a post-truth world.

 

Recommended Reading

Go Back To > Business | ComputerTech ARP

 

Support Tech ARP!

Please support us by visiting our sponsors, participating in the Tech ARP Forums, or donating to our fund. Thank you!

About The Author

Leave a Reply