Tag Archives: NVIDIA CUDA

NVIDIA DGX-1 Deep Learning Supercomputer Launched

April 6, 2016 — NVIDIA today unveiled the NVIDIA DGX-1, the world’s first deep learning supercomputer to meet the unlimited computing demands of artificial intelligence.

The NVIDIA DGX-1 is the first system designed specifically for deep learning — it comes fully integrated with hardware, deep learning software and development tools for quick, easy deployment. It is a turnkey system that contains a new generation of GPU accelerators, delivering the equivalent throughput of 250 x86 servers.

The NVIDIA DGX-1 deep learning system enables researchers and data scientists to easily harness the power of GPU-accelerated computing to create a new class of intelligent machines that learn, see and perceive the world as humans do. It delivers unprecedented levels of computing power to drive next-generation AI applications, allowing researchers to dramatically reduce the time to train larger, more sophisticated deep neural networks.

NVIDIA designed the DGX-1 for a new computing model to power the AI revolution that is sweeping across science, enterprises and increasingly all aspects of daily life. Powerful deep neural networks are driving a new kind of software created with massive amounts of data, which require considerably higher levels of computational performance.

“Artificial intelligence is the most far-reaching technological advancement in our lifetime,” said Jen-Hsun Huang, CEO and co-founder of NVIDIA. “It changes every industry, every company, everything. It will open up markets to benefit everyone. Data scientists and AI researchers today spend far too much time on home-brewed high performance computing solutions. The DGX-1 is easy to deploy and was created for one purpose: to unlock the powers of superhuman capabilities and apply them to problems that were once unsolvable.”


Powered by Five Breakthroughs

The NVIDIA DGX-1 deep learning system is built on NVIDIA Tesla P100 GPUs, based on the new NVIDIA Pascal GPU architecture. It provides the throughput of 250 CPU-based servers, networking, cables and racks — all in a single box.

The DGX-1 features four other breakthrough technologies that maximise performance and ease of use. These include the NVIDIA NVLink high-speed interconnect for maximum application scalability; 16nm FinFET fabrication technology for unprecedented energy efficiency; Chip on Wafer on Substrate with HBM2 for big data workloads; and new half-precision instructions to deliver more than 21 teraflops of peak performance for deep learning.

Together, these major technological advancements enable DGX-1 systems equipped with Tesla P100 GPUs to deliver over 12x faster training than four-way NVIDIA Maxwell architecturebased solutions from just one year ago.

[adrotate group=”2″]

The Pascal architecture has strong support from the artificial intelligence ecosystem.

“NVIDIA GPU is accelerating progress in AI. As neural nets become larger and larger, we not only need faster GPUs with larger and faster memory, but also much faster GPU-to-GPU communication, as well as hardware that can take advantage of reduced-precision arithmetic. This is precisely what Pascal delivers,” said Yann LeCun, director of AI Research at Facebook.

Andrew Ng, chief scientist at Baidu, said: “AI computers are like space rockets: The bigger the better. Pascal’s throughput and interconnect will make the biggest rocket we’ve seen yet.” NVIDIA Launches World’s First Deep Learning Supercomputer

“Microsoft is developing super deep neural networks that are more than 1,000 layers,” said Xuedong Huang, chief speech scientist at Microsoft Research. “NVIDIA Tesla P100’s impressive horsepower will enable Microsoft’s CNTK to accelerate AI breakthroughs.”


Comprehensive Deep Learning Software Suite

The NVIDIA DGX-1 system includes a complete suite of optimised deep learning software that allows researchers and data scientists to quickly and easily train deep neural networks. The DGX-1 software includes the NVIDIA Deep Learning GPU Training System (DIGITS), a complete, interactive system for designing deep neural networks (DNNs).

It also includes the newly released NVIDIA CUDA Deep Neural Network library (cuDNN) version 5, a GPUaccelerated library of primitives for designing DNNs. It also includes optimised versions of several widely used deep learning frameworks — Caffe, Theano and Torch. The DGX-1 additionally provides access to cloud management tools, software updates and a repository for containerised applications.


NVIDIA DGX-1 Specifications

[adrotate group=”2″]
  • Up to 170 teraflops of half-precision (FP16) peak performance
  • Eight Tesla P100 GPU accelerators, 16GB memory per GPU
  • NVLink Hybrid Mesh Cube
  • 7TB SSD DL Cache
  • Dual 10GbE, Quad InfiniBand 100Gb networking
  • 3U – 3200W

Optional support services for the NVIDIA DGX-1 improve productivity and reduce downtime for production systems. Hardware and software support provides access to NVIDIA deep learning expertise, and includes cloud management services, software upgrades and updates, and priority resolution of critical issues.


NVIDIA DGX-1 Availability

General availability for the NVIDIA DGX-1 deep learning system in the United States is in June, and in other regions beginning in the third quarter direct from NVIDIA and select systems integrators.

Go Back To > Enterprise | Home


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

NVIDIA SDK Receives Major Update

by Greg Estes, NVIDIA

While NVIDIA is best known for our hardware platforms, our software plays a key role advancing the state of the art of GPU-accelerated computing.

This body of work — the NVIDIA SDK — today got a significant update, announced at our annual GPU Technology Conference. It takes advantage of our new Pascal architecture and makes it easier than ever for developers to create great solutions on our platforms.

Our goal is to make more of our software capabilities available to even more developers. Over a million developers have already downloaded our CUDA toolkit, and there are more than 400 GPU-accelerated applications that benefit from our software libraries, in addition to hundreds more game titles.

Here’s a look at the software updates we’re introducing in seven key areas:


1) Deep Learning

What’s new — cuDNN 5, our GPU-accelerated library of primitives for deep neural networks, now includes Pascal GPU support; acceleration of recurrent neural networks, which are used for video and other sequential data; and additional enhancements used in medical, oil & gas and other industries.

Why it matters — Deep learning developers rely on cuDNN’s optimized routines so they can focus on designing and training neural network models, rather than low-level performance tuning. cuDNN accelerates leading deep learning frameworks like Google TensorFlow, UC Berkeley’s Caffe, University of Montreal’s Theano and NYU’s Torch. These, in turn, power deep learning solutions used by Amazon, Facebook, Google and others.


2) Accelerated Computing

What’s new — CUDA 8, the latest version of our parallel computing platform, gives developers direct access to powerful new Pascal features such as unified memory and NVLink. Also included in this release is a new graph analytics library — nvGRAPH — which can be used for robotic path planning, cyber security and logistics analysis, expanding the application of GPU acceleration in the realm of big data analytics.

One new feature developers will appreciate is critical path analysis, which automatically identifies latent bottlenecks in code for CPUs and GPUs. And for visualizing volume and surface datasets, NVIDIA IndeX 1.4 is now available as a plug-in for Kitware ParaView, bringing interactive visualization of large volumes with high-quality rendering to ParaView users.

Why it matters — CUDA has been called “the backbone of GPU computing.” We’ve sold millions of CUDA-enabled GPUs to date. As a result, many of the most important scientific applications are based on CUDA, and CUDA has played a role in major discoveries, such as understanding how HIV protects its genetic materials using a protein shell, and unraveling the mysteries of the human genome by discovering 3D loops and other genetic folding patterns.


3) Self-Driving Cars

What’s new — At GTC, we also announced our end-to-end HD mapping solution for self-driving cars (see “How HD Maps Will Show Self-Driving Cars the Way”). We built this state-of-the-art system on our DriveWorks software development kit, part of our deep learning platform for the automotive industry.

Why it matters — Incorporating perception, localization, planning and visualization algorithms, DriveWorks provides libraries, tools and reference applications for automakers, tier 1 suppliers and startups developing autonomous vehicle computing pipelines. DriveWorks now includes an end-to-end HD mapping solution, making it easier and faster to create and update highly detailed maps. Along with NVIDIA DIGITS and NVIDIA DRIVENET, these technologies will make driving safer, more efficient and more enjoyable.

[adrotate banner=”5″]


4) Design Visualization

What’s new — At GTC, we’ve brought NVIDIA Iray — our photorealistic rendering solution — to the world of VR with the introduction of new cameras within Iray that let users create VR panoramas and view their creations with unprecedented accuracy in virtual reality (see “NVIDIA Brings Interactive Photorealism to VR with Iray”). We also announced Adobe’s support of NVIDIA’s Materials Definition Language, bringing the possibility of physically based materials to a wide range of creative professionals.

Why it matters — NVIDIA Iray is used in a wide array of industries to give designers the ability to create photorealistic models of their work quickly and to speed their products to market. We’ve licensed it to leading software manufacturers such as Dassault Systèmes and Siemens PLM. Iray is also available from NVIDIA as a plug-in for popular software like Autodesk 3ds Max and Maya.


5) Autonomous Machines

What’s new — We’re bringing deep learning capabilities to devices that will interact with — and learn from — the environment around them. Our cuDNN version 5, noted above, improves deep learning inference performance for common deep neural networks, allowing embedded devices to make decisions faster and work with higher resolution sensors. NVIDIA GPU Inference Engine (GIE) is a high-performance neural network inference solution for application deployment. Developers can use GIE to generate optimized implementations of trained neural network models that deliver the fastest inference performance on NVIDIA GPUs.

Why it matters — Robots, drones, submersibles and other intelligent devices require autonomous capabilities. The Jetpack SDK — which powers the Jetson TX1 Developer Kit — includes libraries and APIs for advanced computer vision and deep learning, enabling developers to build extraordinarily capable autonomous machines that can see, understand and even interact with their environments.


6) Gaming

What’s new — We recently announced three new technologies for NVIDIA GameWorks, our combination of development tools, sample code and advanced libraries for real-time graphics and simulation for games. They include Volumetric Lighting, Voxel-based Ambient Occlusion and Hybrid Frustum Traced Shadows.

Why it matters — Developers are already using these new libraries for AAA game titles like Fallout 4. And GameWorks technology is in many of the major game engines, such as Unreal Engine, Unity and Stingray, which are also increasingly being used for non-gaming applications like architectural walk-throughs, training and even automotive design.


7) Virtual Reality

What’s new — We’re continuing to add features to VRWorks — our suite of APIs, sample code and libraries for VR developers. For example, Multi-Res Shading accelerates performance by up to 50 percent by rendering each part of an image at a resolution that better matches the pixel density of the warped VR image. VRWorks Direct Mode treats VR headsets as head-mounted displays accessible only to VR applications, rather than a normal Windows monitor in desktop mode.

Why it matters — VRWorks helps headset and application developers achieve the highest performance, lowest latency and plug-and-play compatibility. You can see how developers are using what VRWorks has to offer at GTC, where we’re demonstrating these new technologies with partners such as Sólfar Studios (Everest VR), Fusion Studios (Mars 2030), Oculus and HTC.


Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

AMD GPUOpen Initiative – 3 New Developments

The AMD Radeon Technologies Group has just announced the AMD GPUOpen initiative at their Technology Summit today. This is part of their effort to improve performance from the software side, as well as access to open source drivers and tools. Let’s take a look at the Radeon Technologies Group’s presentation below.


AMD GPUOpen For Gaming

As a continuation of the strategy we started with Mantle, we are giving even more control of the GPU to developers. As console developers have benefited from low-level access to the GPU, AMD wants to continue to bring this level of access to the PC space.

1 / 10

AMD GPUOpen for gaming is giving developers the ability to harness the investments they’ve made on console development, including feature-rich, close-to-the-metal programming, and bring that to life on PC game development. Game developers will now have direct access to GPU hardware, access to a large collection of open source effects, tools, libraries and SDKs.

As such, in early 2016, libraries and samples i.e. source access to the library directly will be made available from AMD. GPUOpen is the primary vehicle to allow low-level access to the GPU.


New Compiler For Heterogenous Computing

One of the primary goals of Heterogeneous Systems Architecture (HSA) is easing the development of parallel applications through the use of higher level languages. The new AMD “Boltzmann Initiative” suite includes an HCC compiler for C++ development, greatly expanding the field of programmers who can leverage HSA.

2 / 6

The new HCC C++ compiler is a key tool in enabling developers to easily and efficiently apply discrete GPU hardware resources in heterogeneous systems. A Heterogeneous Compute Compiler that compiles an Open Source C++ Compiler for GPUs, and HIP allows developers to convert CUDA code to portable C++. AMD testing shows that in many cases 90 percent or more of CUDA code can be automatically converted into C++ by HIP with the final 10 percent converted manually in the widely popular C++ language.


Linux Driver & Runtime For HPC Cluster Computing

Demonstrating its commitment to Linux, AMD developed a new HPC-focused open source driver and system runtime.

This new headless Linux driver brings key capabilities to address core high-performance computing needs, including low latency compute dispatch and PCIe® data transfers; peer-to-peer GPU support; Remote Direct Memory Access (RDMA) from InfiniBand™ that interconnects directly to GPU memory; and Large Single Memory Allocation support.



An early access program for the “Boltzmann Initiative” tools is planned for Q1 2016.