Tag Archives: NVIDIA SDK

NVIDIA Tesla P100 GPU Launched

SINGAPORE, April 6, 2016—NVIDIA today introduced the NVIDIA Tesla P100 GPU, the most advanced accelerator ever built. The latest addition to the NVIDIA Tesla Accelerated Computing Platform, the Tesla P100 enables a new class of servers that can deliver the performance of hundreds of CPU server nodes.

Today’s data centres — vast network infrastructures with numerous interconnected commodity CPU servers — process large numbers of transactional workloads, such as web services. But they are inefficient at next-generation artificial intelligence and scientific applications, which require ultra-efficient, lightning-fast server nodes.

Based on the new NVIDIA Pascal GPU architecture with five breakthrough technologies, the Tesla P100 delivers unmatched performance and efficiency to power the most computationally demanding applications.

“Our greatest scientific and technical challenges — finding cures for cancer, understanding climate change, building intelligent machines — require a near-infinite amount of computing performance,” said Jen-Hsun Huang, CEO and co-founder, NVIDIA. “We designed the Pascal GPU architecture from the ground up with innovation at every level. It represents a massive leap forward in computing performance and efficiency, and will help some of the smartest minds drive tomorrow’s advances.”

Dr. John Kelly III, senior vice president, Cognitive Solutions and IBM Research, said: “As we enter this new era of computing, entirely new approaches to the underlying technologies will be required to fully realise the benefits of AI and cognitive. The combination of NVIDIA GPUs and OpenPOWER technology is already accelerating Watson’s learning of new skills. Together, IBM’s Power architecture and NVIDIA’s Pascal architecture with NVLink will further accelerate cognitive workload performance and advance the artificial intelligence industry.”

Five Architectural Breakthroughs

The Tesla P100 delivers its unprecedented performance, scalability and programming efficiency based on five breakthroughs:

  • NVIDIA Pascal architecture for exponential performance leap – A Pascal-based Tesla P100 solution delivers over a 12x increase in neural network training performance compared with a previous-generation NVIDIA Maxwell-based solution.
  • NVIDIA NVLink for maximum application scalability – The NVIDIA NVLink high-speed GPU interconnect scales applications across multiple GPUs, delivering a 5x acceleration in bandwidth compared to today’s best-in-class solution. Up to eight Tesla P100 GPUs can be interconnected with NVLink to maximise application performance in a single node, and IBM has implemented NVLink on its POWER8 CPUs for fast CPU-to-GPU communication.
  • 16nm FinFET for unprecedented energy efficiency – With 15.3 billion transistors built on 16 nanometer FinFET fabrication technology, the Pascal GPU is the world’s largest FinFET chip ever built.2 It is engineered to deliver the fastest performance and best energy efficiency for workloads with near-infinite computing needs.
  • CoWoS with HBM2 for big data workloads – The Pascal architecture unifies processor and data into a single package to deliver unprecedented compute efficiency. An innovative approach to memory design, Chip on Wafer on Substrate (CoWoS) with HBM2, provides a 3x boost in memory bandwidth performance, or 720GB/sec, compared to the Maxwell architecture.
  • New AI algorithms for peak performance – New half-precision instructions deliver more than 21 teraflops of peak performance for deep learning.
[adrotate banner=”5″]

The Tesla P100 GPU accelerator delivers a new level of performance for a range of HPC and deep learning applications, including the AMBER molecular dynamics code, which runs faster on a single server node with Tesla P100 GPUs than on 48 dual-socket CPU server nodes.

Training the popular AlexNet deep neural network would take 250 dual-socket CPU server nodes to match the performance of eight Tesla P100 GPUs.4 And the widely used weather forecasting application, COSMO, runs faster on eight Tesla P100 GPUs than on 27 dual-socket CPU servers.

The first accelerator to deliver more than 5 and 10 teraflops of double-precision and singleprecision performance, respectively, the Tesla P100 provides a giant leap in processing capabilities and time-to-discovery for research across a broad spectrum of domains.

Tesla P100 Specifications

Specifications of the Tesla P100 GPU accelerator include:

  • 5.3 teraflops double-precision performance, 10.6 teraflops single-precision performance and 21.2 teraflops half-precision performance with NVIDIA GPU BOOST technology
  • 160GB/sec bi-directional interconnect bandwidth with NVIDIA NVLink
  • 16GB of CoWoS HBM2 stacked memory
  • 720GB/sec memory bandwidth with CoWoS HBM2 stacked memory
  • Enhanced programmability with page migration engine and unified memory
  • ECC protection for increased reliability
  • Server-optimised for highest data centre throughput and reliability


General availability for the Pascal-based NVIDIA Tesla P100 GPU accelerator in the new NVIDIA DGX-1 deep learning system is in June. It is also expected to be available beginning in early 2017 from leading server manufacturers.

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

NVIDIA SDK Receives Major Update

by Greg Estes, NVIDIA

While NVIDIA is best known for our hardware platforms, our software plays a key role advancing the state of the art of GPU-accelerated computing.

This body of work — the NVIDIA SDK — today got a significant update, announced at our annual GPU Technology Conference. It takes advantage of our new Pascal architecture and makes it easier than ever for developers to create great solutions on our platforms.

Our goal is to make more of our software capabilities available to even more developers. Over a million developers have already downloaded our CUDA toolkit, and there are more than 400 GPU-accelerated applications that benefit from our software libraries, in addition to hundreds more game titles.

Here’s a look at the software updates we’re introducing in seven key areas:


1) Deep Learning

What’s new — cuDNN 5, our GPU-accelerated library of primitives for deep neural networks, now includes Pascal GPU support; acceleration of recurrent neural networks, which are used for video and other sequential data; and additional enhancements used in medical, oil & gas and other industries.

Why it matters — Deep learning developers rely on cuDNN’s optimized routines so they can focus on designing and training neural network models, rather than low-level performance tuning. cuDNN accelerates leading deep learning frameworks like Google TensorFlow, UC Berkeley’s Caffe, University of Montreal’s Theano and NYU’s Torch. These, in turn, power deep learning solutions used by Amazon, Facebook, Google and others.


2) Accelerated Computing

What’s new — CUDA 8, the latest version of our parallel computing platform, gives developers direct access to powerful new Pascal features such as unified memory and NVLink. Also included in this release is a new graph analytics library — nvGRAPH — which can be used for robotic path planning, cyber security and logistics analysis, expanding the application of GPU acceleration in the realm of big data analytics.

One new feature developers will appreciate is critical path analysis, which automatically identifies latent bottlenecks in code for CPUs and GPUs. And for visualizing volume and surface datasets, NVIDIA IndeX 1.4 is now available as a plug-in for Kitware ParaView, bringing interactive visualization of large volumes with high-quality rendering to ParaView users.

Why it matters — CUDA has been called “the backbone of GPU computing.” We’ve sold millions of CUDA-enabled GPUs to date. As a result, many of the most important scientific applications are based on CUDA, and CUDA has played a role in major discoveries, such as understanding how HIV protects its genetic materials using a protein shell, and unraveling the mysteries of the human genome by discovering 3D loops and other genetic folding patterns.


3) Self-Driving Cars

What’s new — At GTC, we also announced our end-to-end HD mapping solution for self-driving cars (see “How HD Maps Will Show Self-Driving Cars the Way”). We built this state-of-the-art system on our DriveWorks software development kit, part of our deep learning platform for the automotive industry.

Why it matters — Incorporating perception, localization, planning and visualization algorithms, DriveWorks provides libraries, tools and reference applications for automakers, tier 1 suppliers and startups developing autonomous vehicle computing pipelines. DriveWorks now includes an end-to-end HD mapping solution, making it easier and faster to create and update highly detailed maps. Along with NVIDIA DIGITS and NVIDIA DRIVENET, these technologies will make driving safer, more efficient and more enjoyable.

[adrotate banner=”5″]


4) Design Visualization

What’s new — At GTC, we’ve brought NVIDIA Iray — our photorealistic rendering solution — to the world of VR with the introduction of new cameras within Iray that let users create VR panoramas and view their creations with unprecedented accuracy in virtual reality (see “NVIDIA Brings Interactive Photorealism to VR with Iray”). We also announced Adobe’s support of NVIDIA’s Materials Definition Language, bringing the possibility of physically based materials to a wide range of creative professionals.

Why it matters — NVIDIA Iray is used in a wide array of industries to give designers the ability to create photorealistic models of their work quickly and to speed their products to market. We’ve licensed it to leading software manufacturers such as Dassault Systèmes and Siemens PLM. Iray is also available from NVIDIA as a plug-in for popular software like Autodesk 3ds Max and Maya.


5) Autonomous Machines

What’s new — We’re bringing deep learning capabilities to devices that will interact with — and learn from — the environment around them. Our cuDNN version 5, noted above, improves deep learning inference performance for common deep neural networks, allowing embedded devices to make decisions faster and work with higher resolution sensors. NVIDIA GPU Inference Engine (GIE) is a high-performance neural network inference solution for application deployment. Developers can use GIE to generate optimized implementations of trained neural network models that deliver the fastest inference performance on NVIDIA GPUs.

Why it matters — Robots, drones, submersibles and other intelligent devices require autonomous capabilities. The Jetpack SDK — which powers the Jetson TX1 Developer Kit — includes libraries and APIs for advanced computer vision and deep learning, enabling developers to build extraordinarily capable autonomous machines that can see, understand and even interact with their environments.


6) Gaming

What’s new — We recently announced three new technologies for NVIDIA GameWorks, our combination of development tools, sample code and advanced libraries for real-time graphics and simulation for games. They include Volumetric Lighting, Voxel-based Ambient Occlusion and Hybrid Frustum Traced Shadows.

Why it matters — Developers are already using these new libraries for AAA game titles like Fallout 4. And GameWorks technology is in many of the major game engines, such as Unreal Engine, Unity and Stingray, which are also increasingly being used for non-gaming applications like architectural walk-throughs, training and even automotive design.


7) Virtual Reality

What’s new — We’re continuing to add features to VRWorks — our suite of APIs, sample code and libraries for VR developers. For example, Multi-Res Shading accelerates performance by up to 50 percent by rendering each part of an image at a resolution that better matches the pixel density of the warped VR image. VRWorks Direct Mode treats VR headsets as head-mounted displays accessible only to VR applications, rather than a normal Windows monitor in desktop mode.

Why it matters — VRWorks helps headset and application developers achieve the highest performance, lowest latency and plug-and-play compatibility. You can see how developers are using what VRWorks has to offer at GTC, where we’re demonstrating these new technologies with partners such as Sólfar Studios (Everest VR), Fusion Studios (Mars 2030), Oculus and HTC.


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!