On June 20, 2016, NVIDIA officially unveiled their Tesla P100 accelerator for PCIe-based servers. This is a long-expected PCI Express variant of the Tesla P100 accelerator that was launched in April using the NVIDIA NVLink interconnect. Let’s check out what’s new!
NVIDIA Tesla P100
The NVIDIA Tesla P100 was originally unveiled at the GPU Technology Conference on April 5, 2016. Touted as the world’s most advanced hyperscale data center accelerator, it was built around the new NVIDIA Pascal architecture and the proprietary NVIDIA NVLink high-speed GPU interconnect.
Like all other Pascal-based GPUs, the NVIDIA Tesla P100 is fabricated on the 16 nm FinFET process technology. Even with the much smaller process technology, the Tesla P100 is the largest FinFET chip ever built.
Unlike the Pascal-based GeForce GTX 1080 and GTX 1070 GPUs designed for desktop gaming though, the Tesla P100 uses HBM2 memory. In fact, the P100 is actually built on top of the HBM2 memory chips in a single package. This new package technology, Chip on Wafer on Substrate (CoWoS), allows for a 3X boost in memory bandwidth to 720 GB/s.
The NVIDIA NVLink interconnect allows up to eight Tesla P100 accelerators to be linked in a single node. This allows a single Tesla P100-based server node to outperform 48 dual-socket CPU server nodes.
Now Available With PCIe Interface
To make Tesla P100 available for HPC (High Performance Computing) applications, NVIDIA has just introduced the Tesla P100 with a PCI Express interface. This is basically the PCI Express version of the original Tesla P100.
Massive Leap In Performance
Such High Performance Computing servers can already make use of the NVIDIA Tesla K80 accelerators, that are based on the previous-generation NVIDIA Maxwell architecture. The new NVIDIA Pascal architecture, coupled with much faster HBM2 memory, allow for a massive leap in performance. Check out these results that NVIDIA provided :
Ultimately, the NVIDIA Tesla P100 for PCIe-based servers promises to deliver “dramatically more” performance for your money. As a bonus, the energy cost of running Tesla P100-based servers is much lower than CPU-based servers, and those savings accrue over time.
The NVIDIA Tesla P100 for PCIe-based servers will be slightly (~11-12%) slower than the NVLink version, turning out up to 4.7 teraflops of double-precision performance, 9.3 teraflops of single-precision performance, and 18.7 teraflops of half-precision performance.
The Tesla P100 will be offered in two configurations. The high-end configuration will have 16 GB of HBM2 memory with a maximum memory bandwidth of 720 GB/s. The lower-end configuration will have 12 GB of HBM2 memory with a maximum memory bandwidth of 540 GB/s.
Complete NVIDIA Slides
For those who are interested in more details, here are the NVIDIA Tesla P100 for PCIe-based Servers slides.