Tag Archives: NVIDIA Tesla P100

Fujitsu Supercomputer For RIKEN Uses 24 NVIDIA DGX-1s

SINGAPORE, 7 March 2016Fujitsu announced today that it is using 24 NVIDIA DGX-1 AI systems to help build a Fujitsu supercomputer for RIKEN, Japan’s largest comprehensive research institution, for deep learning research.

The largest customer installation of DGX-1 systems to date, the Fujitsu supercomputer will accelerate the application of AI to solve complex challenges in healthcare, manufacturing and public safety.

“DGX-1 is like a time-machine for AI researchers,” said Jen-HsunHuang, founder and CEO of NVIDIA. “Enterprises, research centres and universities worldwide are adopting DGX-1 to ride the wave of deep learning —the technology breakthrough at the centreof the AI revolution.”

The RIKEN Center for Advanced IntelligenceProject will use the new Fujitsu supercomputer, scheduled to go online next month, to accelerate AI research in several areas, including medicine, manufacturing, healthcare and disaster preparedness.

“We believe that the NVIDIA DGX-1-based system will acceleratereal-world implementation of the latest AI technologies technologies as well as research into next-generation AI algorithms,” said Arimichi Kunisawa, head of the Technical Computing Solution Unit at Fujitsu Limited. “Fujitsu is leveraging its extensive experience in high-performance computing development and AI research to support R&D that utilises this system, contributing to the creation of a future in which AI is used to find solutions to a variety of social issues.”


The New Fujitsu Supercomputer Runs On 24 NVIDIA DGX-1s

Conventional HPC architectures are proving too costly and inefficient for meeting the needs of AI researchers. That’s why companies like Fujitsu and customers such as RIKEN are looking for GPU-based solutions that reduce cost and power consumption while increasing performance.

Each DGX-1 combines the power of eight NVIDIA Tesla P100 GPUs with an integrated software stack optimised for deep learning frameworks, delivering the performance of 250 conventional x86 servers.

[adrotate group=”2″]

The system features a number of technological innovations unique to the DGX-1, including:

  • Containerised deep learning frameworks, optimised by NVIDIA for maximum GPU-accelerated deep learning training
  • Greater performance and multi-GPU scaling with NVIDIA NVLink, accelerating time to discovery
  • An integrated software and hardware architecture optimized for deep learning

The supercomputer will also use 32 Fujitsu PRIMERGY servers, which, combined with the DGX-1 systems, will boost its total theoretical processing performance to 4 petaflops when running half-precision floating-point calculations.

Go Back To > Enterprise | Home


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

NVIDIA Tesla P100 For PCIe-Based Servers Overview

On June 20, 2016, NVIDIA officially unveiled their Tesla P100 accelerator for PCIe-based servers. This is a long-expected PCI Express variant of the Tesla P100 accelerator that was launched in April using the NVIDIA NVLink interconnect. Let’s check out what’s new!


NVIDIA Tesla P100

The NVIDIA Tesla P100 was originally unveiled at the GPU Technology Conference on April 5, 2016. Touted as the world’s most advanced hyperscale data center accelerator, it was built around the new NVIDIA Pascal architecture and the proprietary NVIDIA NVLink high-speed GPU interconnect.

Like all other Pascal-based GPUs, the NVIDIA Tesla P100 is fabricated on the 16 nm FinFET process technology. Even with the much smaller process technology, the Tesla P100 is the largest FinFET chip ever built.

Unlike the Pascal-based GeForce GTX 1080 and GTX 1070 GPUs designed for desktop gaming though, the Tesla P100 uses HBM2 memory. In fact, the P100  is actually built on top of the HBM2 memory chips in a single package. This new package technology, Chip on Wafer on Substrate (CoWoS), allows for a 3X boost in memory bandwidth to 720 GB/s.

The NVIDIA NVLink interconnect allows up to eight Tesla P100 accelerators to be linked in a single node. This allows a single Tesla P100-based server node to outperform 48 dual-socket CPU server nodes.


Now Available With PCIe Interface

To make Tesla P100 available for HPC (High Performance Computing) applications, NVIDIA has just introduced the Tesla P100 with a PCI Express interface. This is basically the PCI Express version of the original Tesla P100.


Massive Leap In Performance

Such High Performance Computing servers can already make use of the NVIDIA Tesla K80 accelerators, that are based on the previous-generation NVIDIA Maxwell architecture. The new NVIDIA Pascal architecture, coupled with much faster HBM2 memory, allow for a massive leap in performance. Check out these results that NVIDIA provided :

Ultimately, the NVIDIA Tesla P100 for PCIe-based servers promises to deliver “dramatically more” performance for your money. As a bonus, the energy cost of running Tesla P100-based servers is much lower than CPU-based servers, and those savings accrue over time.

[adrotate banner=”5″]


Two Configurations

The NVIDIA Tesla P100 for PCIe-based servers will be slightly (~11-12%) slower than the NVLink version, turning out up to 4.7 teraflops of double-precision performance, 9.3 teraflops of single-precision performance, and 18.7 teraflops of half-precision performance.

The Tesla P100 will be offered in two configurations. The high-end configuration will have 16 GB of HBM2 memory with a maximum memory bandwidth of 720 GB/s. The lower-end configuration will have 12 GB of HBM2 memory with a maximum memory bandwidth of 540 GB/s.


Complete NVIDIA Slides

For those who are interested in more details, here are the NVIDIA Tesla P100 for PCIe-based Servers slides.

2 / 11
[adrotate banner=”5″]


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

4,500 NVIDIA Tesla GPU Upgrade For Piz Daint Supercomputer

Singapore, April 6, 2016—NVIDIA today announced that Pascal architecture-based NVIDIA Tesla GPU accelerators will power an upgraded version of Europe’s fastest supercomputer, the Piz Daint system at the Swiss National Supercomputing Center (CSCS) in Lugano, Switzerland. The upgrade is expected to more than double Piz Daint’s speed, with most of the system’s performance expected to come from its Tesla GPUs.

Piz Daint, named after a mountain in the Swiss Alps, currently delivers 7.8 petaflops of compute performance, or 7.8 quadrillion mathematical calculations per second. That puts it at No. 7 in the latest TOP500 list of the world’s fastest supercomputers. CSCS plans to upgrade the system later this year with 4,500 Pascal-based GPUs.


Piz Daint Supercomputer Upgrade

Pascal is the most advanced GPU architecture ever built, delivering unmatched performance and efficiency to power the most computationally demanding applications. Pascal-based Tesla GPUs will allow researchers to solve larger, more complex problems that are currently out of reach in cosmology, materials science, seismology, climatology and a host of other fields.

Pascal GPUs feature a number of breakthrough technologies, including second-generation High Bandwidth Memory (HBM2) that delivers three times higher bandwidth than the previous generation architecture, and 16nm FinFET technology for unprecedented energy efficiency. For scientists with near infinite computing needs, Pascal GPUs deliver a giant leap in application performance and time to discovery for their scientific research.

The upgrade will enable CSCS scientists to do simulations, data analysis and visualisations faster and more efficiently. Piz Daint will be used to analyse data from the Large Hadron Collider at CERN, the world’s largest particle accelerator. The upgrade will also accelerate research on the

Human Brain Project’s High Performance Analytics and Computing Platform, which currently uses Piz Daint. The project’s goal is to build neuromorphic computing systems that use the same principles of computation and cognitive architectures as the brain. The upgrade will also facilitate CSCS research in geophysics, cosmology and materials science.

[adrotate banner=”5″]

“We are taking advantage of NVIDIA GPUs to significantly accelerate simulations in such diverse areas as cosmology, materials science, seismology and climatology,” said Thomas Schulthess, professor of computational physics at ETH Zurich and director of the Swiss National Supercomputing Center. “Tesla accelerators represent a leap forward in computing, allowing our researchers to solve larger, more complex problems that are currently out of reach in a host of fields.”

“CSCS scientists are using Piz Daint to tackle some of the most important computational challenges of our day, like modeling the human brain and uncovering new insights into the origins of the universe,” said Ian Buck, vice president of Accelerated Computing at NVIDIA. “Tesla GPUs deliver a massive leap in application performance, allowing CSCS to push the limits of scientific discovery.”


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

NVIDIA Tesla P100 GPU Launched

SINGAPORE, April 6, 2016—NVIDIA today introduced the NVIDIA Tesla P100 GPU, the most advanced accelerator ever built. The latest addition to the NVIDIA Tesla Accelerated Computing Platform, the Tesla P100 enables a new class of servers that can deliver the performance of hundreds of CPU server nodes.

Today’s data centres — vast network infrastructures with numerous interconnected commodity CPU servers — process large numbers of transactional workloads, such as web services. But they are inefficient at next-generation artificial intelligence and scientific applications, which require ultra-efficient, lightning-fast server nodes.

Based on the new NVIDIA Pascal GPU architecture with five breakthrough technologies, the Tesla P100 delivers unmatched performance and efficiency to power the most computationally demanding applications.

“Our greatest scientific and technical challenges — finding cures for cancer, understanding climate change, building intelligent machines — require a near-infinite amount of computing performance,” said Jen-Hsun Huang, CEO and co-founder, NVIDIA. “We designed the Pascal GPU architecture from the ground up with innovation at every level. It represents a massive leap forward in computing performance and efficiency, and will help some of the smartest minds drive tomorrow’s advances.”

Dr. John Kelly III, senior vice president, Cognitive Solutions and IBM Research, said: “As we enter this new era of computing, entirely new approaches to the underlying technologies will be required to fully realise the benefits of AI and cognitive. The combination of NVIDIA GPUs and OpenPOWER technology is already accelerating Watson’s learning of new skills. Together, IBM’s Power architecture and NVIDIA’s Pascal architecture with NVLink will further accelerate cognitive workload performance and advance the artificial intelligence industry.”

Five Architectural Breakthroughs

The Tesla P100 delivers its unprecedented performance, scalability and programming efficiency based on five breakthroughs:

  • NVIDIA Pascal architecture for exponential performance leap – A Pascal-based Tesla P100 solution delivers over a 12x increase in neural network training performance compared with a previous-generation NVIDIA Maxwell-based solution.
  • NVIDIA NVLink for maximum application scalability – The NVIDIA NVLink high-speed GPU interconnect scales applications across multiple GPUs, delivering a 5x acceleration in bandwidth compared to today’s best-in-class solution. Up to eight Tesla P100 GPUs can be interconnected with NVLink to maximise application performance in a single node, and IBM has implemented NVLink on its POWER8 CPUs for fast CPU-to-GPU communication.
  • 16nm FinFET for unprecedented energy efficiency – With 15.3 billion transistors built on 16 nanometer FinFET fabrication technology, the Pascal GPU is the world’s largest FinFET chip ever built.2 It is engineered to deliver the fastest performance and best energy efficiency for workloads with near-infinite computing needs.
  • CoWoS with HBM2 for big data workloads – The Pascal architecture unifies processor and data into a single package to deliver unprecedented compute efficiency. An innovative approach to memory design, Chip on Wafer on Substrate (CoWoS) with HBM2, provides a 3x boost in memory bandwidth performance, or 720GB/sec, compared to the Maxwell architecture.
  • New AI algorithms for peak performance – New half-precision instructions deliver more than 21 teraflops of peak performance for deep learning.
[adrotate banner=”5″]

The Tesla P100 GPU accelerator delivers a new level of performance for a range of HPC and deep learning applications, including the AMBER molecular dynamics code, which runs faster on a single server node with Tesla P100 GPUs than on 48 dual-socket CPU server nodes.

Training the popular AlexNet deep neural network would take 250 dual-socket CPU server nodes to match the performance of eight Tesla P100 GPUs.4 And the widely used weather forecasting application, COSMO, runs faster on eight Tesla P100 GPUs than on 27 dual-socket CPU servers.

The first accelerator to deliver more than 5 and 10 teraflops of double-precision and singleprecision performance, respectively, the Tesla P100 provides a giant leap in processing capabilities and time-to-discovery for research across a broad spectrum of domains.

Tesla P100 Specifications

Specifications of the Tesla P100 GPU accelerator include:

  • 5.3 teraflops double-precision performance, 10.6 teraflops single-precision performance and 21.2 teraflops half-precision performance with NVIDIA GPU BOOST technology
  • 160GB/sec bi-directional interconnect bandwidth with NVIDIA NVLink
  • 16GB of CoWoS HBM2 stacked memory
  • 720GB/sec memory bandwidth with CoWoS HBM2 stacked memory
  • Enhanced programmability with page migration engine and unified memory
  • ECC protection for increased reliability
  • Server-optimised for highest data centre throughput and reliability


General availability for the Pascal-based NVIDIA Tesla P100 GPU accelerator in the new NVIDIA DGX-1 deep learning system is in June. It is also expected to be available beginning in early 2017 from leading server manufacturers.

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

NVIDIA DGX-1 Deep Learning Supercomputer Launched

April 6, 2016 — NVIDIA today unveiled the NVIDIA DGX-1, the world’s first deep learning supercomputer to meet the unlimited computing demands of artificial intelligence.

The NVIDIA DGX-1 is the first system designed specifically for deep learning — it comes fully integrated with hardware, deep learning software and development tools for quick, easy deployment. It is a turnkey system that contains a new generation of GPU accelerators, delivering the equivalent throughput of 250 x86 servers.

The NVIDIA DGX-1 deep learning system enables researchers and data scientists to easily harness the power of GPU-accelerated computing to create a new class of intelligent machines that learn, see and perceive the world as humans do. It delivers unprecedented levels of computing power to drive next-generation AI applications, allowing researchers to dramatically reduce the time to train larger, more sophisticated deep neural networks.

NVIDIA designed the DGX-1 for a new computing model to power the AI revolution that is sweeping across science, enterprises and increasingly all aspects of daily life. Powerful deep neural networks are driving a new kind of software created with massive amounts of data, which require considerably higher levels of computational performance.

“Artificial intelligence is the most far-reaching technological advancement in our lifetime,” said Jen-Hsun Huang, CEO and co-founder of NVIDIA. “It changes every industry, every company, everything. It will open up markets to benefit everyone. Data scientists and AI researchers today spend far too much time on home-brewed high performance computing solutions. The DGX-1 is easy to deploy and was created for one purpose: to unlock the powers of superhuman capabilities and apply them to problems that were once unsolvable.”


Powered by Five Breakthroughs

The NVIDIA DGX-1 deep learning system is built on NVIDIA Tesla P100 GPUs, based on the new NVIDIA Pascal GPU architecture. It provides the throughput of 250 CPU-based servers, networking, cables and racks — all in a single box.

The DGX-1 features four other breakthrough technologies that maximise performance and ease of use. These include the NVIDIA NVLink high-speed interconnect for maximum application scalability; 16nm FinFET fabrication technology for unprecedented energy efficiency; Chip on Wafer on Substrate with HBM2 for big data workloads; and new half-precision instructions to deliver more than 21 teraflops of peak performance for deep learning.

Together, these major technological advancements enable DGX-1 systems equipped with Tesla P100 GPUs to deliver over 12x faster training than four-way NVIDIA Maxwell architecturebased solutions from just one year ago.

[adrotate group=”2″]

The Pascal architecture has strong support from the artificial intelligence ecosystem.

“NVIDIA GPU is accelerating progress in AI. As neural nets become larger and larger, we not only need faster GPUs with larger and faster memory, but also much faster GPU-to-GPU communication, as well as hardware that can take advantage of reduced-precision arithmetic. This is precisely what Pascal delivers,” said Yann LeCun, director of AI Research at Facebook.

Andrew Ng, chief scientist at Baidu, said: “AI computers are like space rockets: The bigger the better. Pascal’s throughput and interconnect will make the biggest rocket we’ve seen yet.” NVIDIA Launches World’s First Deep Learning Supercomputer

“Microsoft is developing super deep neural networks that are more than 1,000 layers,” said Xuedong Huang, chief speech scientist at Microsoft Research. “NVIDIA Tesla P100’s impressive horsepower will enable Microsoft’s CNTK to accelerate AI breakthroughs.”


Comprehensive Deep Learning Software Suite

The NVIDIA DGX-1 system includes a complete suite of optimised deep learning software that allows researchers and data scientists to quickly and easily train deep neural networks. The DGX-1 software includes the NVIDIA Deep Learning GPU Training System (DIGITS), a complete, interactive system for designing deep neural networks (DNNs).

It also includes the newly released NVIDIA CUDA Deep Neural Network library (cuDNN) version 5, a GPUaccelerated library of primitives for designing DNNs. It also includes optimised versions of several widely used deep learning frameworks — Caffe, Theano and Torch. The DGX-1 additionally provides access to cloud management tools, software updates and a repository for containerised applications.


NVIDIA DGX-1 Specifications

[adrotate group=”2″]
  • Up to 170 teraflops of half-precision (FP16) peak performance
  • Eight Tesla P100 GPU accelerators, 16GB memory per GPU
  • NVLink Hybrid Mesh Cube
  • 7TB SSD DL Cache
  • Dual 10GbE, Quad InfiniBand 100Gb networking
  • 3U – 3200W

Optional support services for the NVIDIA DGX-1 improve productivity and reduce downtime for production systems. Hardware and software support provides access to NVIDIA deep learning expertise, and includes cloud management services, software upgrades and updates, and priority resolution of critical issues.


NVIDIA DGX-1 Availability

General availability for the NVIDIA DGX-1 deep learning system in the United States is in June, and in other regions beginning in the third quarter direct from NVIDIA and select systems integrators.

Go Back To > Enterprise | Home


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!