NVIDIA just launched TensorRT 7, introducing the capability for Real-Time Conversational AI!
Here is a primer on the NVIDIA TensorRT 7, and the new real-time conversational AI capability!
NVIDIA TensorRT 7 with Real-Time Conversational AI
NVIDIA TensorRT 7 is their seventh-generation inference software development kit. It introduces the capability for real-time conversational AI, opening the door for human-to-AI interactions.
TensorRT 7 features a new deep learning compiler designed to automatically optimise and accelerate the increasingly complex recurrent and transformer-based neural networks needed for AI speech applications.
This boosts the performance of conversational AI components by more than 10X, compared to running them on CPUs. This drives down the latency below the 300 millisecond (0.3 second) threshold considered necessary for real-time interactions.
TensorRT 7 Targets Recurrent Neural Networks
TensorRT 7 is designed to speed up AI models that are used to make predictions on time-series, sequence-data scenarios that use recurrent loop structures (RNNs).
RNNs are used not only for conversational AI speed networks, they also help with arrival time planning for cars and satellites, predictions of events in electronic medical records, financial asset forecasting and fraud detection.
The use of RNN has hitherto been limited to a few companies with the talent and manpower to hand-optimise the code to meet real-time performance requirements.
With TensorRT 7’s new deep learning compiler, developers now have the ability to automatically optimise these neural networks to deliver the best possible performance and lowest latencies.
The new compiler also optimises transformer-based models like BERT for natural language processing.
TensorRT 7 Availability
NVIDIA TensorRT 7 will be made available in the coming days for development and deployment for free to members of the NVIDIA Developer program.
The latest versions of plug-ins, parsers and samples are also available as open source from the TensorRT GitHub repository.
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
The new Intel Nervana NNP-I1000 neural network processor comes in PCIe and M.2 card options designed for AI inference acceleration.
Here is EVERYTHING you need to know about the Intel Nervana NNP-I1000 PCIe and M.2 card options!
Intel Nervana Neural Network Processors
Intel Nervana neural network processors, NNPs for short, are designed to accelerated two key deep learning technologies – training and inference.
To target these two different tasks, Intel created two AI accelerator families – Nervana NNP-T that’s optimised for training, and Nervana NNP-I that’s optimised for inference.
They are both paired with a full software stack, developed with open components and deep learning framework integration.
The Intel Nervana NNP-I1000, on the other hand, is optimised for multi-modal inferencing of near-real-time, high-volume compute.
Each Nervana NNP-I1000 features 12 Inference Compute Engines (ICE), which are paired with two Intel CPU cores, a large on-die 75 MB SRAM cache and an on-die Network-on-Chip (NoC).
It offers mixed-precision support, with a special focus on low-precision applications for near-real-time performance.
Like the NNP-T, the NNP-I comes with a full software stack that is built with open components, including direct integration with deep learning frameworks.
Intel Nervana NNP-I1000 Models
The Nervana NNP-I1000 comes in a M.2 form factor, or a PCI Express card, to accommodate exponentially larger and more complex models, or to run dozens of models and networks in parallel.
Specifications
Intel Nervana NNP-I1100
Intel Nervana NNP-I1300
Form Factor
M.2 Card
PCI Express Card
Compute
1 x Intel Nervana NNP-I1000
2 x Intel Nervana NNP-I1000
SRAM
75 MB
2 x 75 MB
Int8 Performance
Up to 50 TOPS
Up to 170 TOPS
TDP
12 W
75 W
Intel Nervana NNP-I1000 PCIe Card
This is what the Intel Nervana NNP-I1000 (also known as the NNP-I1100) PCIe card looks like :
Intel Nervana NNP-I1000 M.2 Card
This is what the Intel Nervana NNP-I1000 (also known as the NNP-I1300) M.2 card looks like :
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
The new Intel Nervana NNP-T1000 neural network processor comes in PCIe and Mezzanine card options designed for AI training acceleration.
Here is EVERYTHING you need to know about the Intel Nervana NNP-T1000 PCIe and Mezzanine card options!
Intel Nervana Neural Network Processors
Intel Nervana neural network processors, NNPs for short, are designed to accelerated two key deep learning technologies – training and inference.
To target these two different tasks, Intel created two AI accelerator families – Nervana NNP-T that’s optimised for training, and Nervana NNP-I that’s optimised for inference.
They are both paired with a full software stack, developed with open components and deep learning framework integration.
The Intel Nervana NNP-T1000 is not only capable of training even the most complex deep learning models, it is highly scalable – offering near linear scaling and efficiency.
By combining compute, memory and networking capabilities in a single ASIC, it allows for maximum efficiency with flexible and simple scaling.
Each Nervana NNP-T1000 is powered by up to 24 Tensor Processing Clusters (TPCs), and comes with 16 bi-directional Inter-Chip Links (ICL).
Its TPC supports 32-bit floating point (FP32) and brain floating point (bfloat16) formats, allowing for multiple deep learning primitives with maximum processing efficiency.
Its high-speed ICL communication fabric allows for near-linear scaling, directly connecting multiple NNP-T cards within servers, between servers and even inside and across racks.
High compute utilisation using Tensor Processing Clusters (TPC) with bfloat16 numeric format
Both on-die SRAM and on-package High-Bandwidth Memory (HBM) keep data local, reducing movement
Its Inter-Chip Links (ICL) glueless fabric architecture and fully-programmable router achieves near-linear scaling across multiple cards, systems and PODs
Available in PCIe and OCP Open Accelerator Module (OAM) form factors
Offers a programmable Tensor-based instruction set architecture (ISA)
Supports common open-source deep learning frameworks like TensorFlow, PaddlePaddle and PyTorch
Intel Nervana NNP-T1000 Models
The Intel Nervana NNP-T1000 is currently available in two form factors – a dual-slot PCI Express card, and a OAM Mezzanine Card, with these specifications :
Specifications
Intel Nervana NNP-T1300
Intel Nervana NNP-T1400
Form Factor
Dual-slot PCIe Card
OAM Mezzanine Card
Compliance
PCIe CEM
OAM 1.0
Compute Cores
22 TPCs
24 TPCs
Frequency
950 MHz
1100 MHz
SRAM
55 MB on-chip, with ECC
60 MB on-chip, with ECC
Memory
32 GB HBM2, with ECC
32 GB HBM2, with ECC
Memory Bandwidth
2.4 Gbps (300 MB/s)
Inter-Chip Link (ICL)
16 x 112 Gbps (448 GB/s)
ICL Topology
Ring
Ring, Hybrid Cube Mesh,
Fully Connected
Multi-Chassis Scaling
Yes
Yes
Multi-Rack Scaling
Yes
Yes
I/O to Host CPU
PCIe Gen3 / Gen4 x16
Thermal Solution
Passive, Integrated
Passive Cooling
TDP
300 W
375 W
Dimensions
265.32 mm x 111.15 mm
165 mm x 102 mm
Intel Nervana NNP-T1000 PCIe Card
This is what the Intel Nervana NNP-T1000 (also known as the NNP-T1300) PCIe card looks like :
Intel Nervana NNP-T1000 OAM Mezzanine Card
This is what the Intel Nervana NNP-T1000 (also known as NNP-T1400) Mezzanine card looks like :
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
Intel just introduced their Nervana AI accelerators – the Nervana NNP-T1000 for training, and Nervana NNP-I1000 for inference!
Here is EVERYTHING you need to know about these two new Intel Nervana AI accelerators!
Intel Nervana Neural Network Processors
Intel Nervana neural network processors, NNPs for short, are designed to accelerated two key deep learning technologies – training and inference.
To target these two different tasks, Intel created two AI accelerator families – Nervana NNP-T that’s optimised for training, and Nervana NNP-I that’s optimised for inference.
They are both paired with a full software stack, developed with open components and deep learning framework integration.
Nervana NNP-T For Training
The Intel Nervana NNP-T1000 is not only capable of training even the most complex deep learning models, it is highly scalable – offering near linear scaling and efficiency.
By combining compute, memory and networking capabilities in a single ASIC, it allows for maximum efficiency with flexible and simple scaling.
Each Nervana NNP-T1000 is powered by up to 24 Tensor Processing Clusters (TPCs), and comes with 16 bi-directional Inter-Chip Links (ICL).
Its TPC supports 32-bit floating point (FP32) and brain floating point (bfloat16) formats, allowing for multiple deep learning primitives with maximum processing efficiency.
Its high-speed ICL communication fabric allows for near-linear scaling, directly connecting multiple NNP-T cards within servers, between servers and even inside and across racks.
High compute utilisation using Tensor Processing Clusters (TPC) with bfloat16 numeric format
Both on-die SRAM and on-package High-Bandwidth Memory (HBM) keep data local, reducing movement
Its Inter-Chip Links (ICL) glueless fabric architecture and fully-programmable router achieves near-linear scaling across multiple cards, systems and PODs
Available in PCIe and OCP Open Accelerator Module (OAM) form factors
Offers a programmable Tensor-based instruction set architecture (ISA)
Supports common open-source deep learning frameworks like TensorFlow, PaddlePaddle and PyTorch
Intel Nervana NNP-T Accelerator Models
The Intel Nervana NNP-T is currently available in two form factors – a dual-slot PCI Express card, and a OAM Mezzanine Card, with these specifications :
Specifications
Intel Nervana NNP-T1300
Intel Nervana NNP-T1400
Form Factor
Dual-slot PCIe Card
OAM Mezzanine Card
Compliance
PCIe CEM
OAM 1.0
Compute Cores
22 TPCs
24 TPCs
Frequency
950 MHz
1100 MHz
SRAM
55 MB on-chip, with ECC
60 MB on-chip, with ECC
Memory
32 GB HBM2, with ECC
32 GB HBM2, with ECC
Memory Bandwidth
2.4 Gbps (300 MB/s)
Inter-Chip Link (ICL)
16 x 112 Gbps (448 GB/s)
ICL Topology
Ring
Ring, Hybrid Cube Mesh,
Fully Connected
Multi-Chassis Scaling
Yes
Yes
Multi-Rack Scaling
Yes
Yes
I/O to Host CPU
PCIe Gen3 / Gen4 x16
Thermal Solution
Passive, Integrated
Passive Cooling
TDP
300 W
375 W
Dimensions
265.32 mm x 111.15 mm
165 mm x 102 mm
Nervana NNP-I For Inference
The Intel Nervana NNP-I1000, on the other hand, is optimised for multi-modal inferencing of near-real-time, high-volume compute.
Each Nervana NNP-I1000 features 12 Inference Compute Engines (ICE), which are paired with two Intel CPU cores, a large on-die 75 MB SRAM cache and an on-die Network-on-Chip (NoC).
It offers mixed-precision support, with a special focus on low-precision applications for near-real-time performance.
Like the NNP-T, the NNP-I comes with a full software stack that is built with open components, including direct integration with deep learning frameworks.
Intel Nervana NNP-I Accelerator Models
The NNP-I1000 comes in a 12 W M.2 form factor, or a 75 W PCI Express card, to accommodate exponentially larger and more complex models, or to run dozens of models and networks in parallel.
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
NVIDIA just announced a FREE course on getting started with AI on the Jetson Nano!
Here is everything you need to know about this new Jetson Nano AI course – the first to be offered for FREE by the Deep Learning Institute!
The FREE AI Course For NVIDIA Jetson Nano
Looking to get started with AI, but don’t know how? The NVIDIA Deep Learning Institute has just published a new self-paced course that uses the newly released Jetson Nano Developer Kit to get up and running fast.
Best of all – this AI course for the NVIDIA Jetson Nano is FREE. This is the first Deep Learning Institute course to be offered for free.
In the course, students will learn to collect image data and use it to train, optimize, and deploy AI models for custom tasks like recognizing hand gestures, and image regression for locating a key point in an image.
Set up your Jetson Nano and camera
Collect image data for classification models
Annotate image data for regression models
Train a neural network on your data to create your own models
Run inference on the Jetson Nano with the models you create
Upon completion, you’ll be able to create your own deep learning classification and regression models with the Jetson Nano.
Some experience with Python is helpful but not required. You will need the NVIDIA Jetson Nano Developer Kit, of course.
The FREE Jetson Nano AI Course Requirements
Duration : 8 hours
Prerequisites: Basic familiarity with Python (helpful, not required)
High-performance microSD card: 32GB minimum (NVIDIA tested and recommend this one)
5V 4A power supply with 2.1mm DC barrel connector (NVIDIA tested and recommend this one)
2-pin jumper: must be added to the Jetson Nano Developer Kit board to enable power from the barrel jack power supply (here’s an example)
Logitech C270 USB Webcam (NVIDIA tested and recommend this one). Alternate camera: Raspberry Pi Camera Module v2 (NVIDIA tested and recommend this one)
USB cable: Micro-B To Type-A with data enabled (NVIDIA tested and recommend this one)
A computer with an Internet connection and the ability to flash your microSD card
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
At the Dell Technologies World 2019, we were lucky enough to snag a seat at the talk by MIT Professor Erik Brynjolfsson; and MIT alumni and Affectiva CEO, Rana el Kaliouby, on human-machine partnership.
We managed to record the incredibly insightful session for everyone who could not make it for this exclusive guru session. This is a video you must not miss!
The DTW 2019 Guru Sessions
One of the best reasons to attend Dell Technologies World 2019 are the guru sessions. If you are lucky enough to reserve a seat, you will have the opportunity to listen to some of the world’s most brilliant thinkers and doers.
The Human-Machine Partnership
The talk on human-machine partnership by Professor Brynjolfsson and Ms. Rana was the first of several guru sessions at Dell Technologies World 2019.
Entitled “How Emerging Technologies & Human Machine Partnerships Will Transform the Economy“, it focused on how technology changed human society, and what the burgeoning efforts in artificial intelligence will mean for humanity.
Here are the key points from their guru session on the human-machine partnership :
Erik Brynjolfsson (00:05 to 22:05) on the Human-Machine Partnership
You cannot replace old technologies with new technologies, without rethinking the organisation or institution.
We are now undergoing a triple revolution
– a rebalancing of mind and machine through Big Data and Artificial Intelligence
– a shift from products to (digital) platforms
– a shift from the core to crowd-based decision making
Shifting to data-driven decision-making based on Big Data results in higher productivity and greater profitability.
Since 2015, computers can now recognise objects better than humans, thanks to rapid advances in machine learning.
Even machine-based speech recognition has become as accurate as humans from 2017 onwards.
While new AI capabilities are opening up new possibilities in many fields, they are also drastically reducing or eliminating the need for humans.
Unlike platforms of the past, the new digital networks leverage “two-sided networks“. In many cases, one network is used to subsidise the other network, or make it free-to-use.
Shifting to crowd-based decision-making introduces diversity in the ways of thinking, gaining new perspectives and breakthroughs in problem-solving.
Digital innovations have greatly expanded the economy, but it doesn’t mean that everyone will benefit. In fact, there has been a great decoupling between the productivity and median income of the American worker in the past few decades.
Rana el Kaliouby (22:08 to 45:05) on the Human-Machine Partnership
Human communication is mostly conveyed indirectly – 93% is non-verbal. Half of that are facial expression and gestures, the other half is vocal intonation.
Affectiva has the world’s largest emotion repository, with 5 billion frames of 8 million faces from 87 countries.
Facial expressions are largely universal, but there is a need diversity of their data to avoid bias in their models. For example, there are gender differences that vary by culture.
They use computer vision, machine learning and deep learning to create an Emotional AI model that learns from all those facial expressions to accurately determine a person’s emotions.
Emotional artificial intelligence has many real-world or potential uses
– detecting dangerous driving, allowing for proactive measures to be taken
– personalising the ride in a future robot-taxi or autonomous car
– the creation of more engaging and effective social robots in retail and hospitality industries
– help autistic children understand how facial expressions correspond to emotions, and learn social cues.
Erik Brynjolfsson + Rana el Kaliouby
Professor Erik Brynjolfsson holds many hats. He is currently :
Professor at the MIT Sloan School of Management,
Director of the MIT Initiative on the Digital Economy,
Director of the MIT Center for Digital Business, and
Research Associate at the National Bureau of Economic Research
Rana el Kaliouby was formerly a computer scientist at MIT, helping to form their Autism & Communication Technology Initiative. She currently serves as CEO of Affectiva, a spin-off from MIT’s Media Lab that focuses on emotion recognition technology.
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
Sophos today announced the availability of Intercept X with malware detection powered by advanced deep learning neural networks. Join us for a briefing by Sumit Bansal, Sophos Managing Director for ASEAN and Korea!
Sophos Intercept X with Predictive Protection
Combined with new active-hacker mitigation, advanced application lockdown, and enhanced ransomware protection, this latest release of the Sophos Intercept X endpoint protection delivers previously unseen levels of detection and prevention.
Deep learning is the latest evolution of machine learning. It delivers a massively scalable detection model that is able to learn the entire observable threat landscape. With the ability to process hundreds of millions of samples, deep learning can make more accurate predictions at a faster rate with far fewer false-positives when compared to traditional machine learning.
This new version of Sophos Intercept X also includes innovations in anti-ransomware and exploit prevention, and active-hacker mitigations such as credential theft protection. As anti-malware has improved, attacks have increasingly focused on stealing credentials in order to move around systems and networks as a legitimate user, and Intercept X detects and prevents this behavior.
Deployed through the cloud-based management platform Sophos Central, Intercept X can be installed alongside existing endpoint security software from any vendor, immediately boosting endpoint protection. When used with the Sophos XG Firewall, Intercept X can introduce synchronized security capabilities to further enhance protection.
New Sophos Intercept X Features
Deep Learning Malware Detection
Deep learning model detects known and unknown malware and potentially unwanted applications (PUAs) before they execute, without relying on signatures
The model is less than 20 MB and requires infrequent updates
Active Adversary Mitigations
Credential theft protection – Preventing theft of authentication passwords and hash information from memory, registry, and persistent storage, as leveraged by such attacks as Mimikatz
Code cave utilization – Detects the presence of code deployed into another application, often used for persistence and antivirus avoidance
APC protection – Detects abuse of Application Procedure Calls (APC) often used as part of the AtomBombing code injection technique and more recently used as the method of spreading the WannaCry worm and NotPetya wiper via EternalBlue and DoublePulsar (adversaries abuse these calls to get another process to execute malicious code)
New and Enhanced Exploit Prevention Techniques
[adrotate group=”2″]
Malicious process migration – Detects remote reflective DLL injection used by adversaries to move between processes running on the system
Process privilege escalation – Prevents a low-privilege process from being escalated to a higher privilege, a tactic used to gain elevated system access
Enhanced Application Lockdown
Browser behavior lockdown – Intercept X prevents the malicious use of PowerShell from browsers as a basic behavior lockdown
HTA application lockdown – HTML applications loaded by the browser will have the lockdown mitigations applied as if they were a browser
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
NVIDIA CEO Jensen Huang (recently anointed as Fortune 2017 Businessperson of the Year) made as surprise reveal at the NIPS conference – the NVIDIA TITAN V. This is the first desktop graphics card to be built on the latest NVIDIA Volta microarchitecture, and the first to use HBM2 memory.
In this article, we will share with you everything we know about the NVIDIA TITAN V, and how it compares against its TITANic predecessors. We will also share with you what we think could be a future NVIDIA TITAN Vp graphics card!
Updated @ 2017-12-10 : Added a section on gaming with the NVIDIA TITAN V [1].
Originally posted @ 2017-12-09
NVIDIA Volta
NVIDIA Volta isn’t exactly new. Back in GTC 2017, NVIDIA revealed NVIDIA Volta, the NVIDIA GV100 GPU and the first NVIDIA Volta-powered product – the NVIDIA Tesla V100. Jensen even highlighted the Tesla V100 in his Computex 2017 keynote, more than 6 months ago!
Yet there has been no desktop GPU built around NVIDIA Volta. NVIDIA continued to churn out new graphics cards built around the Pascal architecture – GeForce GTX 1080 Ti and GeForce GTX 1070 Ti. That changed with the NVIDIA TITAN V.
NVIDIA GV100
The NVIDIA GV100 is the first NVIDIA Volta-based GPU, and the largest they have ever built. Even using the latest 12 nm FFN (FinFET NVIDIA) process, it is still a massive chip at 815 mm²! Compare that to the GP100 (610 mm² @ 16 nm FinFET) and GK110 (552 mm² @ 28 nm).
That’s because the GV100 is built using a whooping 21.1 billion transistors. In addition to 5376 CUDA cores and 336 Texture Units, it boasts 672 Tensor cores and 6 MB of L2 cache. All those transistors require a whole lot more power – to the tune of 300 W.
[adrotate group=”1″]
The NVIDIA TITAN V
That’s V for Volta… not the Roman numeral V or V for Vendetta. Powered by the NVIDIA GV100 GPU, the TITAN V has 5120 CUDA cores, 320 Texture Units, 640 Tensor cores, and a 4.5 MB L2 cache. It is paired with 12 GB of HBM2 memory (3 x 4GB stacks) running at 850 MHz.
The blowout picture of the NVIDIA TITAN V reveals even more details :
It has 3 DisplayPorts and one HDMI port.
It has 6-pin + 8-pin PCIe power inputs.
It has 16 power phases, and what appears to be the Founders Edition copper heatsink and vapour chamber cooler, with a gold-coloured shroud.
There is no SLI connector, only what appears to be an NVLink connector.
Here are more pictures of the NVIDIA TITAN V, courtesy of NVIDIA.
Can You Game On The NVIDIA TITAN V? New!
Right after Jensen announced the TITAN V, the inevitable question was raised on the Internet – can it run Crysis / PUBG?
The NVIDIA TITAN V is the most powerful GPU for the desktop PC, but that does not mean you can actually use it to play games. NVIDIA notably did not mention anything about gaming, only that the TITAN V is “ideal for developers who want to use their PCs to do work in AI, deep learning and high performance computing.”
[adrotate group=”2″]
In fact, the TITAN V is not listed in their GeForce Gaming section. The most powerful graphics card in the GeForce Gaming section remains the TITAN Xp.
Then again, the TITAN V uses the same NVIDIA Game Ready Driver as GeForce gaming cards, starting with version 388.59. Even so, it is possible that some or many games may not run well or properly on the TITAN V.
Of course, all this is speculative in nature. All that remains to crack this mystery is for someone to buy the TITAN V and use it to play some games!
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
The NVIDIA TITAN V Specification Comparison
Let’s take a look at the known specifications of the NVIDIA TITAN V, compared to the TITAN Xp (launched earlier this year), and the TITAN X (launched late last year). We also inserted the specifications of a hypotheticalNVIDIA TITAN Vp, based on a full GV100.
Specifications
Future TITAN Vp?
NVIDIA TITAN V
NVIDIA TITAN Xp
NVIDIA TITAN X
Microarchitecture
NVIDIA Volta
NVIDIA Volta
NVIDIA Pascal
NVIDIA Pascal
GPU
GV100
GV100
GP102-400
GP102-400
Process Technology
12 nm FinFET+
12 nm FinFET+
16 nm FinFET
16 nm FinFET
Die Size
815 mm²
815 mm²
471 mm²
471 mm²
Tensor Cores
672
640
None
None
CUDA Cores
5376
5120
3840
3584
Texture Units
336
320
240
224
ROPs
NA
NA
96
96
L2 Cache Size
6 MB
4.5 MB
3 MB
4 MB
GPU Core Clock
NA
1200 MHz
1405 MHz
1417 MHz
GPU Boost Clock
NA
1455 MHz
1582 MHz
1531 MHz
Texture Fillrate
NA
384.0 GT/s
to
465.6 GT/s
355.2 GT/s
to
379.7 GT/s
317.4 GT/s
to
342.9 GT/s
Pixel Fillrate
NA
NA
142.1 GP/s
to
151.9 GP/s
136.0 GP/s
to
147.0 GP/s
Memory Type
HBM2
HBM2
GDDR5X
GDDR5X
Memory Size
NA
12 GB
12 GB
12 GB
Memory Bus
3072-bit
3072-bit
384-bit
384-bit
Memory Clock
NA
850 MHz
1426 MHz
1250 MHz
Memory Bandwidth
NA
652.8 GB/s
547.7 GB/s
480.0 GB/s
TDP
300 watts
250 watts
250 watts
250 watts
Multi GPU Capability
NVLink
NVLink
SLI
SLI
Launch Price
NA
US$ 2999
US$ 1200
US$ 1200
The NVIDIA TITAN Vp?
In case you are wondering, the TITAN Vp does not exist. It is merely a hypothetical future model that we think NVIDIA may introduce mid-cycle, like the NVIDIA TITAN Xp.
Our TITAN Vp is based on the full capabilities of the NVIDIA GV100 GPU. That means it will have 5376 CUDA cores with 336 Texture Units, 672 Tensor cores and 6 MB of L2 cache. It will also have a higher TDP of 300 watts.
[adrotate group=”1″]
The Official NVIDIA TITAN V Press Release
December 9, 2017—NVIDIA today introduced TITAN V, the world’s most powerful GPU for the PC, driven by the world’s most advanced GPU architecture, NVIDIA Volta .
Announced by NVIDIA founder and CEO Jensen Huang at the annual NIPS conference, TITAN V excels at computational processing for scientific simulation. Its 21.1 billion transistors deliver 110 teraflops of raw horsepower, 9x that of its predecessor, and extreme energy efficiency.
“Our vision for Volta was to push the outer limits of high performance computing and AI. We broke new ground with its new processor architecture, instructions, numerical formats, memory architecture and processor links,” said Huang. “With TITAN V, we are putting Volta into the hands of researchers and scientists all over the world. I can’t wait to see their breakthrough discoveries.”
NVIDIA Supercomputing GPU Architecture, Now for the PC
TITAN V’s Volta architecture features a major redesign of the streaming multiprocessor that is at the center of the GPU. It doubles the energy efficiency of the previous generation Pascal design, enabling dramatic boosts in performance in the same power envelope.
New Tensor Cores designed specifically for deep learning deliver up to 9x higher peak teraflops. With independent parallel integer and floating-point data paths, Volta is also much more efficient on workloads with a mix of computation and addressing calculations. Its new combined L1 data cache and shared memory unit significantly improve performance while also simplifying programming.
Fabricated on a new TSMC 12-nanometer FFN high-performance manufacturing process customised for NVIDIA, TITAN V also incorporates Volta’s highly tuned 12GB HBM2 memory subsystem for advanced memory bandwidth utilisation.
Free AI Software on NVIDIA GPU Cloud
[adrotate group=”2″]
TITAN V’s incredible power is ideal for developers who want to use their PCs to do work in AI, deep learning and high performance computing.
Users of TITAN V can gain immediate access to the latest GPU-optimised AI, deep learning and HPC software by signing up at no charge for an NVIDIA GPU Cloud account. This container registry includes NVIDIA-optimised deep learning frameworks, third-party managed HPC applications, NVIDIA HPC visualisation tools and the NVIDIA TensorRT inferencing optimiser.
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
For future-oriented digital topics, the Volkswagen Group remains committed to artificial intelligence (AI). This is why Volkswagen IT is cooperating with US technology company NVIDIA with a view to expanding its competence in the field of deep learning. At the Volkswagen Data Lab, IT experts are developing advanced AI systems with deep learning.
Volkswagen & NVIDIA In Deep Learning Partnership
At Volkswagen, the Data Lab has been named the Group’s center of excellence for AI and data analysis. Specialists are exploring possibilities to use deep learning in corporate processes and in the field of mobility services. For example, they are developing new procedures for optimizing traffic flow in cities. Advanced AI systems are also among the prerequisites for developments such as intelligent human-robot cooperation.
Dr. Martin Hofmann, CIO of the Volkswagen Group, says: “Artificial intelligence is the key to the digital future of the Volkswagen Group. We want to develop and deploy high-performance AI systems ourselves. This is why we are expanding our expert knowledge required. Cooperation with NVIDIA will be a major step in this direction.”
[adrotate group=”2″]
“AI is the most powerful technological force of our era,” says Jensen Huang, CEO of NVIDIA. “Thanks to AI, data centers are changing dramatically and enterprise computing is being reinvented. NVIDIA’s deep learning solutions will enable Volkswagen to turn the enormous amounts of information in its data centers into valuable insight, and transform its business.”
In addition, Volkswagen has established a startup support program at its Data Lab. The program will provide technical and financial support for international startups developing machine learning and deep learning applications for the automotive industry. Together with NVIDIA, Volkswagen will be admitting five startups to the support program from this fall.
Both partners will also be launching a “Summer of Code” camp where high-performing students with qualifications in IT, mathematics or physics will have an opportunity to develop deep learning methods in teams and to implement them in a robotics environment.
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
Just before we flew to Computex 2017, we attended the AWS Masterclass on Artificial Intelligence. It offered us an in-depth look at AI concepts like machine learning, deep learning and neural networks. We also saw how Amazon Web Services (AWS) uses all that to create easy-to-use tools for developers to create their own AI applications at low cost and virtually no capital outlay.
The AWS Masterclass on Artificial Intelligence
AWS Malaysia flew in Olivier Klein, the AWS Asia Pacific Solutions Architect, to conduct the AWS Masterclass. During the two-hour session, he conveyed the ease by which the various AWS services and tools allow virtually anyone to create their own AI applications at lower cost and virtually no capital outlay.
The topic on artificial intelligence is rather wide-ranging, covering from the basic AI concepts all the way to demonstrations on how to use AWS services like Amazon Polly and Amazon Rekognition to easily and quickly create AI applications. We present to you – the complete AWS Masterclass on Artificial Intelligence!
The AWS Masterclass on AI is actually made up of 5 main topics. Here is a summary of those topics :
Topic
Duration
Remark
AWS Cloud and An Introduction to Artificial Intelligence, Machine Learning, Deep Learning
15 minutes
An overview on Amazon Web Services and the latest innovation in the data analytics, machine learning, deep learning and AI space.
The Road to Artificial Intelligence
20 minutes
Demystifying AI concepts and related terminologies, as well as the underlying technologies.
Let’s dive deeper into the concepts of machine learning, deep learning models, such as the neural networks, and how this leads to artificial intelligence.
Connecting Things and Sensing the Real World
30 minutes
As part of an AI that aligns with our physical world, we need to understand how Internet-of-Things (IoT) space helps to create natural interaction channels.
We will walk through real world examples and demonstration that include interactions with voice through Amazon Lex, Amazon Polly and the Alexa Voice Services, as well as understand visual recognitions with services such as Amazon Rekognition.
We will also bridge this with real-time data that is sensed from the physical world via AWS IoT.
Retrospective and Real-Time Data Analytics
30 minutes
Every AI must continuously “learn” and be “trained”” through past performance and feedback data. Retrospective and real-time data analytics are crucial to building intelligence model.
We will dive into some of the new trends and concepts, which our customers are using to perform fast and cost-effective analytics on AWS.
In the next two pages, we will dissect the video and share with you the key points from each segment of this AWS Masterclass.
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
The AWS Masterclass on AI Key Points (Part 1)
Here is an exhaustive list of key takeaway points from the AWS Masterclass on Artificial Intelligence, with their individual timestamps in the video :
Introduction To AWS Cloud
AWS has 16 regions around the world (0:51), with two or more availability zones per region (1:37), and 76 edge locations (1:56) to accelerate end connectivity to AWS services.
AWS offers 90+ cloud services (3:45), all of which use the On-Demand Model (4:38) – you pay only for what you use, whether that’s a GB of storage or transfer, or execution time for a computational process.
You don’t even need to plan for your requirements or inform AWS how much capacity you need (5:05). Just use and pay what you need.
AWS has a practice of passing their cost savings to their customers (5:59), cutting prices 61 times since 2006.
AWS keeps adding new services over the years (6:19), with over a thousand new services introduced in 2016 (7:03).
[adrotate group=”1″]
Introduction to Artificial Intelligence, Machine Learning, Deep Learning
Artificial intelligence is based on unsupervised machine learning (7:45), specifically deep learning models.
Insurance companies like AON use it for actuarial calculations (7:59), and services like Netflix use it to generate recommendations (8:04).
A lot of AI models have been built specifically around natural language understanding, and using vision to interact with customers, as well as predicting and understanding customer behaviour (9:23).
Here is a quick look at what the AWS services management console looks like (9:58).
This is how you launch 10 compute instances (virtual servers) in AWS (11:40).
The ability to access multiple instances quickly is very useful for AI training (12:40), because it gives the user access to large amounts of computational power, which can be quickly terminated (13:10).
Machine learning, or specifically artificial intelligence, is not new to Amazon.com, the parent company of AWS (14:14).
Amazon.com uses a lot of AI models (14:34) for recommendations and demand forecasting.
The visual search feature in Amazon app uses visual recognition and AI models to identify a picture you take (15:33).
Olivier introduces Amazon Go (16:07), a prototype grocery store in Seattle.
[adrotate group=”1″]
The Road to Artificial Intelligence
The first component of any artificial intelligence is the “ability to sense the real world” (18:46), connecting everything together.
Cheaper bandwidth (19:26) now allows more devices to be connected to the cloud, allowing more data to be collected for the purpose of training AI models.
Cloud computing platforms like AWS allow the storage and processing of all that sensor data in real time (19:53).
All of that information can be used in deep learning models (20:14) to create an artificial intelligence that understands, in a natural way, what we are doing, and what we want or need.
Olivier shows how machine learning can quickly solve a Rubik’s cube (20:47), which has 43 quintillion unique combinations.
You can even build a Raspberry Pi-powered machine (24:33) that can solve a Rubik’s cube puzzle in 0.9 seconds.
Some of these deep learning models are available on Amazon AI (25:11), which is a combination of different services (25:44).
Olivier shows what it means to “train a deep learning model” (28:19) using a neural network (29:15).
Deep learning is computationally-intensive (30:39), but once it derives a model that works well, the predictive aspect is not computationally-intensive (30:52).
A pre-trained AI model can be loaded into a low-powered device (31:02), allowing it to perform AI functions without requiring large amounts of bandwidth or computational power.
Olivier demonstrates the YOLO (You Only Look Once) project, which pre-trained an AI model with pictures of objects (31:58), which allows it to detect objects in any video.
The identification of objects is the baseline for autonomous driving systems (34:19), as used by Tu Simple.
Tu Simple also used a similar model to train a drone to detect and follow a person (35:28).
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
The AWS Masterclass on AI Key Points (Part 2)
Connecting Things and Sensing the Real World
Cloud services like AWS IoT (37:35) allow you to securely connect billions of IoT (Internet of Things) devices.
Olivier prefers to think of IoT as Intelligent Orchestrated Technology (37:52).
Olivier demonstrates how the combination of multiple data sources (maps, vehicle GPS, real-time weather reports) in Bangkok can be used to predict traffic as well as road conditions to create optimal routes (39:07), reducing traffic congestion by 30%.
The PetaBencana service in Jakarta uses picture recognition and IoT sensors to identify flooded roads (42:21) for better emergency response and disaster management.
Olivier demonstrates how easy it is to connect an IoT devices to the AWS IoT service (43:46), and use them to sense the environment and interact with.
Olivier shows how the capabilities of the Amazon Echo can be extended by creating an Alexa Skill using the AWS Lambda function (59:07).
Developers can create and publish Alexa Skills for sale in the Amazon marketplace (1:03:30).
Amazon Polly (1:04:10) renders life-like speech, while the Amazon Lex conversational engine (1:04:17) has natural language understanding and automatic speech recognition. Amazon Rekognition (1:04:29) performs image analysis.
Amazon Polly (1:04:50) turns text into life-like speech using deep learning to change the pitch and intonation according to the context. Olivier demonstrates Amazon Polly’s capabilities at 1:06:25.
Amazon Lex (1:11:06) is a web service that allows you to build conversational interfaces using natural language understanding (NLU) and automatic speech recognition (ASR) models like Alexa.
Amazon Lex does not just support spoken natural language understanding, it also recognisestext (1:12:09), which makes it useful for chatbots.
Olivier demonstrates that text recognition capabilities in a chatbot demo (1:13:50) of a customer applying for a credit card through Facebook.
Amazon Rekognition (1:21:37) is an image recognition and analysis service, which uses deep learning to identify objects in pictures.
Amazon Rekognition can even detect facial landmarks and sentiments (1:22:41), as well as image quality and other attributes.
You can actually try Amazon Rekognition out (1:23:24) by uploading photos at CodeFor.Cloud/image.
[adrotate group=”1″]
Retrospective and Real-Time Data Analytics
AI is a combination of 3 types of data analytics (1:28:10) – retrospective analysis and reporting + real-time processing + predictions to enable smart apps.
Cloud computing is extremely useful for machine learning (1:29:57) because it allows you to decouple storage and compute requirements for much lower costs.
Amazon Athena (1:31:56) allows you to query data stored in Amazon S3, without creating a compute instance to do it. You only pay for the TB of data that is processed by that query.
Best of all, you will get the same fast results even if your data set grows (1:32:31), because Amazon Athena will automatically parallelise your queries across your data set internally.
Olivier demonstrates (1:33:14) how Amazon Athena can be used to run queries on data stored in Amazon S3, as well as generate reports using Amazon QuickSight.
When it comes to data analytics, cloud computing allows you to quickly bring massive computing power to bear, achieving much faster results without additional cost (1:41:40).
The insurance company AON used this ability (1:42:44) to reduce an actuarial simulation that would normally take 10 days, to just 10 minutes.
Amazon Kinesis and Amazon Kinesis Analytics (1:45:10) allows the processing of real-time data.
A company called Dash is using this capability to analyse OBD data in real-time (1:47:23) to help improve fuel efficiency and predict potential breakdowns. It also notifies emergency services in case of a crash.
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
At the AMD Computex 2017 Press Conference, AMD President & CEO Dr. Lisa Su announced that AMD will launch the Radeon Vega Frontier Edition on 27 June 2017, and the Radeon RX Vega graphics cards at the end of July 2017. We figured this is a great time to revisit the new AMD Vega memory architecture.
Now, who better to tell us all about it than AMD Senior Fellow Jeffrey Cheng, who built the AMD Vega memory architecture? Check out this exclusive Q&A session from the AMD Tech Summit in Sonoma!
Updated @ 2017-06-11 :We clarified the difference between the AMD Vega’s 64-bit flat address space, and the 512 TB addressable memory. We also added new key points, and time stamps for the key points.
Originally posted @ 2017-02-04
Don’t forget to also check out the following AMD Vega-related articles :
Jeffrey Cheng is an AMD Senior Fellow in the area of memory architecture. The AMD Vega memory architecture refers to how the AMD Vega GPU manages memory utilisation and handles large datasets. It does not deal with the AMD Vega memory hardware design, which includes the High Bandwidth Cache and HBM2 technology.
AMD Vega Memory Architecture Q&A Summary
Here are the key takeaway points from the Q&A session with Jeffrey Cheng :
Large amounts of DRAM can be used to handle big datasets, but this is not the best solution because DRAM is costly and consumes lots of power (see 2:54).
At any given moment, the amount of data processed by the GPU is limited, so it doesn’t make sense to store a large dataset in DRAM. It would be better to cache the data required by the GPU on very fast memory (e.g. HBM2), and intelligently move them according to the GPU’s requirements (see 5:40).
The AMD Vega’s heterogenous memory architecture allows for easy integration of future memory technologies like storage-class memory (flash memory that can be accessed in bytes, instead of blocks) (see 8:13).
The AMD Vega has a 64-bit flat address space for its shaders (see 12:08, 12:36 and 18:21), but like NVIDIA, AMD is (very likely) limiting the addressable memory to 49-bits, giving it 512 TB of addressable memory.
AMD Vega has full access to the CPU’s 48-bit address space, with additional bits beyond that used to handle its own internal memory, storage and registers (see 12:16). This ties back to the High Bandwidth Cache Controller and heterogenous memory architecture, which allows the use of different memory and storage types.
Game developers currently try to manage data and memory usage, often extremely conservatively to support graphics cards with limited amounts of graphics memory (see 16:29).
The memory architectural advantages of AMD Vega will initially have little impact on gaming performance (due to the current conservative approach of game developers). This will change when developers hand over data and memory management to the GPU. (see 24:42).[adrotate group=”2″]
The improved memory architecture in AMD Vega will mainly benefit AI applications (e.g. deep machine learning) with their large datasets (see 24:52).
Don’t forget to also check out the following AMD Vega-related articles :
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
NVIDIA just announced the Jetson TX2 embedded AI supercomputer, based on the latest NVIDIA Pascal microarchitecture. It promises to offer twice the performance of the previous-generation Jetson TX1, in the same package. In this tech report, we will share with you the full details of the new Pascal-based NVIDIA Jetson TX2!
GPUs In Artificial Intelligence
Artificial intelligence is the new frontier in GPU compute technology. Whether they are used to power training or inference engines, AI research has benefited greatly from the massive amounts of compute power in modern GPUs.
The market is led by NVIDIA with their Tesla accelerators that run on their proprietary CUDA platform. AMD, on the other hand, is a relative newcomer with their Radeon Instinct accelerators designed to run on the open-source ROCm (Radeon Open Compute) platform.
The NVIDIA Jetson
GPUs today offer so much compute performance that NVIDIA has been able to create the NVIDIA Jetson family of embedded AI supercomputers. They differ from their Tesla big brother in their size, power efficiency and purpose. The NVIDIA Jetson modules are specifically built for “inference at the edge” or “AI at the edge“.
Unlike AI processing in the datacenters or in the cloud, AI in the edge refers to autonomous artificial intelligence processing, where there is poor or no Internet access or access must be restricted for privacy or security reasons. Therefore, the processor must be powerful enough for the AI application to run autonomously.
Whether it’s to automate robots in a factory, or to tackle industrial accidents like at the Fukushima Daiichi nuclear plant, AI at the edge is meant to allow for at least some autonomous capability right in the field. The AI in the edge processors must also be frugal in using power, as power or battery life is often limited.
[adrotate banner=”5″]
Hence, processors designed for AI on the edge applications must be small, power-efficient and yet, fast enough to run AI inference in real time. The NVIDIA Jetson family of embedded AI supercomputers promises to tick all of those boxes. Let’s take a look :
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
The NVIDIA Jetson TX2
The NVIDIA Jetson TX2 is the second-generation Jetson embedded AI module, based on the latest NVIDIA Pascal microarchitecture. It supersedes (but not replace) the previous-generation Jetson TX1, which was built on the NVIDIA Maxwell microarchitecture and released in November 2015.
Thanks to the faster and more power-efficient Pascal microarchitecture, the NVIDIA Jetson TX2 promises to be twice as energy-efficient as the Jetson TX1.
This means the developers switching to the Jetson TX2 can now opt to maximise power efficiency, or to maximise performance. In the Max-Q mode, the Jetson TX2 will use less than 7.5 W, and offer Jetson TX1-equivalent performance. In the Max-P mode, the Jetson TX2 will use less than 15 W, and offer up to twice the performance of the Jetson TX1.
NVIDIA Jetson Specification Comparison
The NVIDIA Jetson modules are actually built around the NVIDIA Tegra SoCs, instead of their GeForce GPUs. The Tegra SoC is a System On A Chip, which integrates an ARM CPU, an NVIDIA GPU, a chipset and a memory controller on a single package.
The Tegra SoC and the other components on a 50 x 87 mm board are what constitutes the NVIDIA Jetson module. The Jetson TX1 uses the Tegra X1 SoC, while the new Jetson TX2 uses the Tegra P1 SoC.
For those who have been following our coverage of the AMD Radeon Instinct, and its support for packed math, the NVIDIA Jetson TX2 and TX1 modules support FP16 operations too.
[adrotate banner=”5″]
NVIDIA Jetson TX2 Price & Availability
The NVIDIA Jetson TX2 Developer Kit is available for pre-order in the US and Europe right now, with a US$ 599 retail price and a US$ 299 education price. Shipping will start on March 14, 2017. This developer’s kit will be made available in APAC and other regions in April 2017.
The NVIDIA Jetson TX2 module itself will only be made available in the second quarter of 2017. It will be priced at US$ 399 per module, in quantities of 1,000 modules or more.
Note that the Jetson TX2 modules are exactly the same size and uses the same 400-pin connector. They are drop-in compatible replacements for the Jetson TX1 modules.
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
NVIDIA Jetson TX1 Price Adjustments
With the launch of the Jetson TX2, NVIDIA is adjusting the price of the Jetson TX1. The Jetson TX1 will continue to sell alongside the new Jetson TX2.
The NVIDIA Jetson TX1 production has been reduced to US$ 299, down from US$ 399. Again, this is in quantities of 1,000 modules or more.
NVIDIA Jetpack 3.0
The NVIDIA Jetson is more than just a processor module. It is a platform that is made up of developer tools and codes, as well as APIs. Like AMD offers their MIOpen deep learning library, NVIDIA offers Jetpack.
In conjunction with the launch of the Jetson TX2, NVIDIA also announced the NVIDIA Jetpack 3.0. It promises to offer twice the system performance of Jetpack 2.3.
Jetpack 3.0 is not just for the new Jetson TX2. It will offer a nice boost in performance for existing Jetson TX1 users and applications.
[adrotate banner=”5″]
The Presentation Slides
For those who want the full set of NVIDIA Jetson TX2 slides, here they are :
Support Tech ARP!
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
The AMD Tech Summit held in Sonoma, California from December 7-9, 2016 was not only very exclusive, it was highly secretive. The first major announcement we have been allowed to reveal is the new AMD Radeon Instinct heterogenous computing platform.
In this article, you will hear from AMD what the Radeon Instinct platform is all about. As usual, we have a ton of videos from the event, so it will be as if you were there with us. Enjoy! 🙂
Originally published @ 2016-12-12
Updated @ 2017-01-11 : Two of the videos were edited to comply with the NDA. Now that the NDA on AMD Vega has been lifted, we replaced the two videos with their full, unedited versions. We also made other changes, including adding links to the other AMD Tech Summit articles.
Updated @ 2017-01-20 : Replaced an incorrect slide, and a video featuring that slide. Made other small updates to the article.
The AMD Radeon Instinct Platform Summarised
For those who want the quick low-down on AMD Radeon Instinct, here are the key takeaway points :
The AMD Radeon Instinct platform is made up of two components – hardware and software.
The hardware components are the AMD Radeon Instinct accelerators built around the current Polaris and the upcoming Vega GPUs.
The software component is the AMD Radeon Open Compute (ROCm) platform, which includes the new MIOpen open-source deep learning library.
The first three Radeon Instinct accelerator cards are the MI6, MI8 and MI25 Vega with NCU.
The AMD Radeon Instinct MI6 is a passively-cooled inference accelerator with 5.7 TFLOPS of FP16 processing power, 224 GB/s of memory bandwidth, and a TDP of <150 W. It will come with 16 GB of GDDR5 memory.
The AMD Radeon Instinct MI8 is a small form-factor (SFF) accelerator with 8.2 TFLOPS of processing power, 512 GB/s of memory bandwidth, and a TDP of <175 W. It will come with 4 GB of HBM memory.
The AMD Radeon Instinct MI25 Vega with NCU is a passively-cooled training accelerator with 25 TFLOPS of processing power, support for 2X packed math, a High Bandwidth Cache and Controller, and a TDP of <300 W.
The Radeon Instinct accelerators will all be built exclusively by AMD.
The Radeon Instinct accelerators will all support MxGPU SRIOV hardware virtualisation.
The Radeon Instinct accelerators are all passively cooled.
The Radeon Instinct accelerators will all have large BAR (Base Address Register) support for multiple GPUs.
The upcoming AMD Zen “Naples” server platform is designed to supported multiple Radeon Instinct accelerators through a high-speed network fabric.
The ROCm platform is not only open source, it will support a multitude of standards in addition to MIOpen.
The MIOpen deep learning library is open source, and will be available in Q1 2017.
The MIOpen deep learning library is optimised for Radeon Instinct, allowing for 3X better performance in machine learning.
AMD Radeon Instinct accelerators will be significantly faster than NVIDIA Titan X GPUs based on the Maxwell and Pascal architectures.
In the subsequent pages, we will give you the full low-down on the Radeon Instinct platform, with the following presentations by AMD :
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
Why Is Heterogenous Computing Important?
Dr. Lisa Su, kicked things off with an inside look at her two-year long journey as AMD President and CEO. Then she revealed why Heterogenous Computing is an important part of AMD’s future going forward. She also mentioned the success of the recently-released Radeon Software Crimson ReLive Edition.
Here Are The New AMD Radeon Instinct Accelerators!
Next, Raja Koduri, Senior Vice President and Chief Architect of the Radeon Technologies Group, officially revealed the new AMD Radeon Instinct accelerators.
The MIOpen Deep Learning Library For Radeon Instinct
MIOpen is a new deep learning library optimised for Radeon Instinct. It is open source and will become part of the Radeon Open Compute (ROCm) platform. It will be available in Q1 2017.
[adrotate banner=”5″]
The Performance Advantage Of Radeon Instinct & MIOpen
MIOpen is optimised for Radeon Instinct, offering 3X better performance in machine learning. It allows the Radeon Instinct accelerators to be significantly faster than NVIDIA Titan X GPUs based on the Maxwell and Pascal architectures.
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
The Radeon Instinct MI25 Training Demonstration
Raja Koduri roped in Ben Sander, Senior Fellow at AMD, to show off the Radeon Instinct MI25 running a training demo.
The Radeon Instinct MI8 Visual Inference Demonstration
The visual inference demo is probably much easier to grasp, as it is visual in nature. AMD used the Radeon Instinct MI8 in this example.
The Radeon Instinct On The Zen “Naples” Platform
The upcoming AMD Zen “Naples” server platform is designed to supported multiple AMD Radeon Instinct accelerators through a high-speed network fabric.
The Radeon Open Compute (ROCm) Platform Discussion
To illustrate the importance of heterogenous computing on Radeon Instinct, Greg Stoner (ROCm Senior Director at AMD), hosted a panel of AMD partners and early adopters in using the Radeon Open Compute (ROCm) platform.
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
Closing Remarks On Radeon Instinct
Finally, Raja Koduri concluded the launch of the Radeon Instinct Initiative with some closing remarks on the recent Radeon Software Crimson ReLive Edition.
The Complete AMD Radeon Instinct Tech Briefing
This is the complete AMD Radeon Instinct tech briefing. Our earlier video was edited to comply with the AMD Vega NDA (which has now expired).
[adrotate banner=”5″]
The Complete AMD Radeon Instinct Tech Briefing Slides
Here are the Radeon Instinct presentation slides for your perusal.
Support Tech ARP!
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
SEOUL, 28 December 2016 — To advance the functionality of today’s home appliances to a whole new level, LG Electronics (LG) is set to deliver an unparalleled level of performance and convenience into the home with deep learning technology to be unveiled at CES 2017.
LG deep learning will allow smart home appliances to better understand their users by gathering and studying customers’ lifestyle patterns over time. This process never ends and improves over time to provide customers with new solutions to everyday problems.
Robot Cleaner: Improved Performance Through Memory
Using multiple sensors and LG deep learning technology, LG’s newest robot vacuum cleaner will recognize objects around the room and react accordingly. By capturing surface images of the room, the intelligent cleaner remembers obstacles and learns to avoid them over time. It even recognizes electrical wires and slippers so they don’t end up jamming the roller brush and requiring human extraction assistance. LG robot vacuum can tell the difference between a human and a chair and asks the obstructing person to kindly move out of the way whereas it will simply maneuver around a chair.
Refrigerator: Smarter Convenience
LG deep learning is also enhancing the convenience LG’s smart refrigerator brings to consumers. By analyzing usage and eating patterns, LG’s deep learning refrigerator performs a variety of tasks by “predicting” the family’s activities based on their past behavior, such as automatically filling the ice tray at the time of the day when cold drinks are most in demand. In the summer, LG’s smart refrigerator can initiate the 4-stage sterilization system on its own to extend food life when it senses temperature and humidity conditions that may contribute to food spoilage.
Air Conditioner: Even Better Energy Saving And Performance
LG’s smart air conditioner equipped with LG deep learning technology analyzes the daily behavior patterns of its homeowners, including the parts of the home most occupied at certain times throughout the day. With this information, LG’s deep learning enabled air conditioner is able to assess how to provide the most comfortable temperatures quickly and efficiently, providing fast cooling to specific areas. During the weekend the living room may be the place to be, requiring the most cooling, but on weekdays the kitchen may be the center of activity.
Washing Machine: Optimal Performance In Any Situation
[adrotate banner=”4″]
The new technology helps LG’s washing machine learn about the local environment and the user’s everyday activities in order to provide the optimal washing performance. For example, in areas where the water contains excessive calcium carbonate, LG’s smart washing machine adjusts the water temperature and the amount of water used to counter the effects of hard water on clothes. In areas where dust storms are common, the washing machine automatically adds another rinse cycle for even cleaner clothes.
“Deep learning technology is the next phase in the evolution of smart appliances, and as an industry leader, we have the responsibility of being an early mover,” said Song Dae-hyun, president of LG Electronics and Home Appliance & Air Solutions Company. “But even more important than the advanced capabilities of these appliances will be how companies behave when entrusted with data of this nature. At LG, we believe performance and convenience do not mean having to sacrifice security and privacy. They can and should exist simultaneously.”
LG’s advanced deep learning appliances will be on display at CES 2017 from Jan. 5-8 in Booth #11100 in Central Hall of the Las Vegas Convention Center.
Support Tech ARP!
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
Singapore, 30 November 2016 — NVIDIA today announced that Singapore Management University (SMU) is the first organisation in Singapore and Southeast Asiato deploy an NVIDIA DGX-1 deep learning supercomputer.
Deployed at the SMU Living Analytics Research Center (LARC), the supercomputer will further research on applying artificial intelligence (AI) for Singapore’s Smart Nation project. Established in 2011, LARC aims to innovate technologies and software platforms that are relevant to Singapore’s Smart Nation efforts. LARC is supported and funded by the National Research Foundation (NRF).
NVIDIA DGX-1
The NVIDIA DGX-1 is the world’s first deep learning supercomputer to meet the computing demands of AI. It enables researchers and data scientists to easily harness the power of GPU-accelerated computing to create a new class of computers that learn, see and perceive the world as humans do.
Providing through put equivalent to 250 conventional servers in a single box, the supercomputer delivers the highest levels of computing power to drive next-generation AI applications, allowing researchers to dramatically reduce the time to train larger, more sophisticated deep neural networks.
Built on NVIDIA Tesla P100 GPUs that use the latest Pascal GPU architecture, the DGX-1 supercomputer will enable SMU to conduct a range of AI research projects for Smart Nation. One of the featured projects is a food AI application to achieve smart food consumption and healthy lifestyle, which requires the analysis of a large number of food photos.
[adrotate banner=”4″]
“This project involves the processing of large amounts of unstructured and visual data. Food photo recognition is not possible without the DGX-1 solution, which applies cutting-edge deep learning technologies and yields excellent recognition accuracy,” said Professor Steven Hoi, School of Information Systems, SMU.
The first phase of the food AI project is able to recognise 100 of the most popular local dishes in Singapore. The next phase is to expand the current food database to about 1,000 popular food dishes in Singapore. In addition to the recognition of food photos, the team will also analyse food data in supermarkets to help with the recommendation of healthy food options.Once developed, the food AI solution will be made available to developers through an API for them to build smart food consumption solutions.
“SMU has been an NVIDIA GPU Research Center using Tesla GPUs for several years. The NVIDIA DGX-1 will give SMU researchers the performance and deep learning capabilities needed to work on their Smart Nation projects, which will further advance Singapore’s aspirations,” said Raymond Teh, vice president of sales and marketing for Asia Pacific, NVIDIA.
Support Tech ARP!
If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!
Melbourne Australia, 15 September 2016 — Step into a virtual world and experience how virtual reality (VR) is driving research, applications and games at GPU Technology Conference Extension (GTCx) Australia on October 4 and 5, 2016.
See And Experience Deep Learning And VR Technologies @ GTCx Australia
Set up conjunction with the conference, the VR Village will feature VR rooms and kiosks where participants can try out the latest demos running on HTC Vive and Oculus. These demos include NVIDIA’s very own carnival game Funhouse, Everest, Iray Lightfield, Iray VR Panoramas, Endeavor Point Cloud, Bullet Train, and Earthlight VR, a first person space explorer game developed by Melbourne-based Opaque Multimedia.
NVIDIA will also host a Deep Learning CDO (chief data officer) Roundtable with the theme “Next-Generation Analytics, Deep and Machine Learning with GPUs”. This session will discuss how CDOs of data intensive companies can leverage NVIDIA GPUs to undertake machine and deep learning.
Those keen to learn more about deep learning can take part in the NVIDIA Deep Learning Institute. They can discover how advanced deep learning techniques are being applied to rich data sets to help solve big problems. Upon completion of the NVIDIA Deep Learning institute lab, participants will receive a certificate of attendance and free online training credits.
[adrotate banner=”4″]
Twenty shortlisted submissions of research posters from both academia and industry, describing ongoing GPU-enabled research, exciting new research projects, and encouraging preliminary results, will be exhibited electronically. The Best Poster award winner will get a fully paid trip to GTC US next year and NVIDIA hardware.
GTCx Australia is an extension of GTC, the world’s largest and most important event for GPU developers held annually in San Jose, United States. To be held at Melbourne Convention and Exhibition Centre, the event is designed for C-level, professionals, developers, and researchers across many industries — from finance and big data analytics to academia and entertainment.
Themed “Deep Learning AI Revolution -The Next Computing Platform”, it will feature keynote addresses and three tracks covering accelerated computing, Pro VR/Pro Viz and deep learning.
Support Tech ARP!
If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!
Australia, 31 August 2016 — NVIDIA is bringing the Deep Learning AI (Artificial Intelligence) revolution to Melbourne in the form of GPU Technology Conference Extension (GTCx) Australia on October 4 and 5, 2016.
GTCx Australia is an extension of GTC, the world’s largest and most important event for GPU developers held annually in San Jose, United States. To be held at Melbourne Convention and Exhibition Centre, the event is designed for C-level, professionals, developers, and researchers across many industries — from finance and big data analytics to academia and entertainment.
Themed Deep Learning AI Revolution – The Next Computing Platform, it will feature keynote addresses, hands-on lab, posters, exhibition, a VR Village and three tracks covering accelerated computing, Pro VR/Pro Viz and deep learning.
Deep Learning AI Revolution – The Next Computing Platform
The opening keynote will be by NVIDIA Fellow Dr David B Kirk, a former NVIDIA Chief Scientist who led the development of graphics architecture and technology. Kirk was honoured by the California Institute of Technology (Caltech) in 2009 with a Distinguished Alumni Award, its highest honour, for his work in the graphics-technology industry. He is the inventor of more than 75 patents and patent applications relating to graphics design and has published many articles on graphics technology and parallel programming.
Another notable speaker is Associate Professor Mark Sagar, Director of the Laboratory for Animate Technologies based at the Auckland Bioengineering Institute. Sagar previously worked as the Special Projects Supervisor at Weta Digital where he was involved in the creation of technology for digital characters in blockbusters such Avatar, King Kong and Spiderman 2. His pioneering work in computer-generated faces was recognised with two consecutive Oscars at the 2010 and 2011 Sci-tech awards, a branch of the Academy Awards that recognises movie
science and technological achievements.
[adrotate banner=”4″]
As a testament of Australia as a leader in adopting and applying GPU technologies in many areas, the line-up of speakers also includes Professor Anton Van Den Hengel (Director, Australian Centre for Visual Technologies), Professor Tom Drummond (Leader of Computer Vision Lab, Monash University), Dr. Jose Alvarez (Computer Vision Researcher Data61, CSIRO), Dr. Mark Suresh Joshi (Professor, University of Melbourne), Dr. Wojtek James Goscinski (Manager, High Performance Computing, Monash eResearch Center, Monash University), and Dr. Werner Scholz (Chief Technology Officer and Head of R&D, Xenon Technology Group).
In conjunction with GTCx Australia, NVIDIA is also organising on October 5 the NVIDIA Deep Learning Institute where participants can learn directly from trained instructor-led labs. They can discover how advanced deep learning techniques are being applied to rich data sets to help solve big problems. Upon completion of the NVIDIA Deep Learning institute lab, participants will receive a certificate of attendance and free online training credits.
Support Tech ARP!
If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!
April 6, 2016 — NVIDIA today unveiled the NVIDIA DGX-1, the world’s first deep learning supercomputer to meet the unlimited computing demands of artificial intelligence.
The NVIDIA DGX-1 is the first system designed specifically for deep learning — it comes fully integrated with hardware, deep learning software and development tools for quick, easy deployment. It is a turnkey system that contains a new generation of GPU accelerators, delivering the equivalent throughput of 250 x86 servers.
The NVIDIA DGX-1 deep learning system enables researchers and data scientists to easily harness the power of GPU-accelerated computing to create a new class of intelligent machines that learn, see and perceive the world as humans do. It delivers unprecedented levels of computing power to drive next-generation AI applications, allowing researchers to dramatically reduce the time to train larger, more sophisticated deep neural networks.
NVIDIA designed the DGX-1 for a new computing model to power the AI revolution that is sweeping across science, enterprises and increasingly all aspects of daily life. Powerful deep neural networks are driving a new kind of software created with massive amounts of data, which require considerably higher levels of computational performance.
“Artificial intelligence is the most far-reaching technological advancement in our lifetime,” said Jen-Hsun Huang, CEO and co-founder of NVIDIA. “It changes every industry, every company, everything. It will open up markets to benefit everyone. Data scientists and AI researchers today spend far too much time on home-brewed high performance computing solutions. The DGX-1 is easy to deploy and was created for one purpose: to unlock the powers of superhuman capabilities and apply them to problems that were once unsolvable.”
Powered by Five Breakthroughs
The NVIDIA DGX-1 deep learning system is built on NVIDIA Tesla P100 GPUs, based on the new NVIDIA Pascal GPU architecture. It provides the throughput of 250 CPU-based servers, networking, cables and racks — all in a single box.
The DGX-1 features four other breakthrough technologies that maximise performance and ease of use. These include the NVIDIA NVLink high-speed interconnect for maximum application scalability; 16nm FinFET fabrication technology for unprecedented energy efficiency; Chip on Wafer on Substrate with HBM2 for big data workloads; and new half-precision instructions to deliver more than 21 teraflops of peak performance for deep learning.
Together, these major technological advancements enable DGX-1 systems equipped with Tesla P100 GPUs to deliver over 12x faster training than four-way NVIDIA Maxwell architecturebased solutions from just one year ago.
[adrotate group=”2″]
The Pascal architecture has strong support from the artificial intelligence ecosystem.
“NVIDIA GPU is accelerating progress in AI. As neural nets become larger and larger, we not only need faster GPUs with larger and faster memory, but also much faster GPU-to-GPU communication, as well as hardware that can take advantage of reduced-precision arithmetic. This is precisely what Pascal delivers,” said Yann LeCun, director of AI Research at Facebook.
Andrew Ng, chief scientist at Baidu, said: “AI computers are like space rockets: The bigger the better. Pascal’s throughput and interconnect will make the biggest rocket we’ve seen yet.” NVIDIA Launches World’s First Deep Learning Supercomputer
“Microsoft is developing super deep neural networks that are more than 1,000 layers,” said Xuedong Huang, chief speech scientist at Microsoft Research. “NVIDIA Tesla P100’s impressive horsepower will enable Microsoft’s CNTK to accelerate AI breakthroughs.”
Comprehensive Deep Learning Software Suite
The NVIDIA DGX-1 system includes a complete suite of optimised deep learning software that allows researchers and data scientists to quickly and easily train deep neural networks. The DGX-1 software includes the NVIDIA Deep Learning GPU Training System (DIGITS), a complete, interactive system for designing deep neural networks (DNNs).
It also includes the newly released NVIDIA CUDA Deep Neural Network library (cuDNN) version 5, a GPUaccelerated library of primitives for designing DNNs. It also includes optimised versions of several widely used deep learning frameworks — Caffe, Theano and Torch. The DGX-1 additionally provides access to cloud management tools, software updates and a repository for containerised applications.
NVIDIA DGX-1 Specifications
[adrotate group=”2″]
Up to 170 teraflops of half-precision (FP16) peak performance
Eight Tesla P100 GPU accelerators, 16GB memory per GPU
NVLink Hybrid Mesh Cube
7TB SSD DL Cache
Dual 10GbE, Quad InfiniBand 100Gb networking
3U – 3200W
Optional support services for the NVIDIA DGX-1 improve productivity and reduce downtime for production systems. Hardware and software support provides access to NVIDIA deep learning expertise, and includes cloud management services, software upgrades and updates, and priority resolution of critical issues.
NVIDIA DGX-1 Availability
General availability for the NVIDIA DGX-1 deep learning system in the United States is in June, and in other regions beginning in the third quarter direct from NVIDIA and select systems integrators.
If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!
While NVIDIA is best known for our hardware platforms, our software plays a key role advancing the state of the art of GPU-accelerated computing.
This body of work — the NVIDIA SDK — today got a significant update, announced at our annual GPU Technology Conference. It takes advantage of our new Pascal architecture and makes it easier than ever for developers to create great solutions on our platforms.
Our goal is to make more of our software capabilities available to even more developers. Over a million developers have already downloaded our CUDA toolkit, and there are more than 400 GPU-accelerated applications that benefit from our software libraries, in addition to hundreds more game titles.
Here’s a look at the software updates we’re introducing in seven key areas:
1) Deep Learning
What’s new — cuDNN 5, our GPU-accelerated library of primitives for deep neural networks, now includes Pascal GPU support; acceleration of recurrent neural networks, which are used for video and other sequential data; and additional enhancements used in medical, oil & gas and other industries.
Why it matters — Deep learning developers rely on cuDNN’s optimized routines so they can focus on designing and training neural network models, rather than low-level performance tuning. cuDNN accelerates leading deep learning frameworks like Google TensorFlow, UC Berkeley’s Caffe, University of Montreal’s Theano and NYU’s Torch. These, in turn, power deep learning solutions used by Amazon, Facebook, Google and others.
2) Accelerated Computing
What’s new — CUDA 8, the latest version of our parallel computing platform, gives developers direct access to powerful new Pascal features such as unified memory and NVLink. Also included in this release is a new graph analytics library — nvGRAPH — which can be used for robotic path planning, cyber security and logistics analysis, expanding the application of GPU acceleration in the realm of big data analytics.
One new feature developers will appreciate is critical path analysis, which automatically identifies latent bottlenecks in code for CPUs and GPUs. And for visualizing volume and surface datasets, NVIDIA IndeX 1.4 is now available as a plug-in for Kitware ParaView, bringing interactive visualization of large volumes with high-quality rendering to ParaView users.
Why it matters — CUDA has been called “the backbone of GPU computing.” We’ve sold millions of CUDA-enabled GPUs to date. As a result, many of the most important scientific applications are based on CUDA, and CUDA has played a role in major discoveries, such as understanding how HIV protects its genetic materials using a protein shell, and unraveling the mysteries of the human genome by discovering 3D loops and other genetic folding patterns.
3) Self-Driving Cars
What’s new — At GTC, we also announced our end-to-end HD mapping solution for self-driving cars (see “How HD Maps Will Show Self-Driving Cars the Way”). We built this state-of-the-art system on our DriveWorks software development kit, part of our deep learning platform for the automotive industry.
Why it matters — Incorporating perception, localization, planning and visualization algorithms, DriveWorks provides libraries, tools and reference applications for automakers, tier 1 suppliers and startups developing autonomous vehicle computing pipelines. DriveWorks now includes an end-to-end HD mapping solution, making it easier and faster to create and update highly detailed maps. Along with NVIDIA DIGITS and NVIDIA DRIVENET, these technologies will make driving safer, more efficient and more enjoyable.
[adrotate banner=”5″]
4) Design Visualization
What’s new — At GTC, we’ve brought NVIDIA Iray — our photorealistic rendering solution — to the world of VR with the introduction of new cameras within Iray that let users create VR panoramas and view their creations with unprecedented accuracy in virtual reality (see “NVIDIA Brings Interactive Photorealism to VR with Iray”). We also announced Adobe’s support of NVIDIA’s Materials Definition Language, bringing the possibility of physically based materials to a wide range of creative professionals.
Why it matters — NVIDIA Iray is used in a wide array of industries to give designers the ability to create photorealistic models of their work quickly and to speed their products to market. We’ve licensed it to leading software manufacturers such as Dassault Systèmes and Siemens PLM. Iray is also available from NVIDIA as a plug-in for popular software like Autodesk 3ds Max and Maya.
5) Autonomous Machines
What’s new — We’re bringing deep learning capabilities to devices that will interact with — and learn from — the environment around them. Our cuDNN version 5, noted above, improves deep learning inference performance for common deep neural networks, allowing embedded devices to make decisions faster and work with higher resolution sensors. NVIDIA GPU Inference Engine (GIE) is a high-performance neural network inference solution for application deployment. Developers can use GIE to generate optimized implementations of trained neural network models that deliver the fastest inference performance on NVIDIA GPUs.
Why it matters — Robots, drones, submersibles and other intelligent devices require autonomous capabilities. The Jetpack SDK — which powers the Jetson TX1 Developer Kit — includes libraries and APIs for advanced computer vision and deep learning, enabling developers to build extraordinarily capable autonomous machines that can see, understand and even interact with their environments.
6) Gaming
What’s new — We recently announced three new technologies for NVIDIA GameWorks, our combination of development tools, sample code and advanced libraries for real-time graphics and simulation for games. They include Volumetric Lighting, Voxel-based Ambient Occlusion and Hybrid Frustum Traced Shadows.
Why it matters — Developers are already using these new libraries for AAA game titles like Fallout 4. And GameWorks technology is in many of the major game engines, such as Unreal Engine, Unity and Stingray, which are also increasingly being used for non-gaming applications like architectural walk-throughs, training and even automotive design.
7) Virtual Reality
What’s new — We’re continuing to add features to VRWorks — our suite of APIs, sample code and libraries for VR developers. For example, Multi-Res Shading accelerates performance by up to 50 percent by rendering each part of an image at a resolution that better matches the pixel density of the warped VR image. VRWorks Direct Mode treats VR headsets as head-mounted displays accessible only to VR applications, rather than a normal Windows monitor in desktop mode.
Why it matters — VRWorks helps headset and application developers achieve the highest performance, lowest latency and plug-and-play compatibility. You can see how developers are using what VRWorks has to offer at GTC, where we’re demonstrating these new technologies with partners such as Sólfar Studios (Everest VR), Fusion Studios (Mars 2030), Oculus and HTC.
Support Tech ARP!
If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!
For as long as we have been designing computers, AI has been the final frontier. Building intelligent machines that can perceive the world as we do, understand our language, and learn from examples has been the life’s work of computer scientists for over five decades. Yet, it took the combination of Yann LeCun’s Convolutional Neural Net, Geoff Hinton’s back-propagation and Stochastic Gradient Descent approach to training, and Andrew Ng’s large-scale use of GPUs to accelerate Deep Neural Networks (DNNs) to ignite the big bang of modern AI — deep learning.
At the time, NVIDIA was busy advancing GPU-accelerated computing, a new computing model that uses massively parallel graphics processors to accelerate applications also parallel in nature. Scientists and researchers jumped on to GPUs to do molecular-scale simulations to determine the effectiveness of a life-saving drug, to visualize our organs in 3D (reconstructed from light doses of a CT scan), or to do galactic-scale simulations to discover the laws that govern our universe. One researcher, using our GPUs for quantum chromodynamics simulations, said to me: “Because of NVIDIA’s work, I can now do my life’s work, in my lifetime.” This is wonderfully rewarding. It has always been our mission to give people the power to make a better future. NVIDIA GPUs have democratized supercomputing and researchers have now discovered that power.
Photo credit : Virtual Desktop
In 2011, AI researchers discovered NVIDIA GPUs. The Google Brain project had just achieved amazing results — it learned to recognize cats and people by watching movies on YouTube. But it required 2,000 CPUs in servers powered and cooled in one of Google’s giant data centers. Few have computers of this scale. Enter NVIDIA and the GPU. Bryan Catanzaro in NVIDIA Research teamed with Andrew Ng’s team at Stanford to use GPUs for deep learning. As it turned out, 12 NVIDIA GPUs could deliver the deep-learning performance of 2,000 CPUs. Researchers at NYU, the University of Toronto, and the Swiss AI Lab accelerated their DNNs on GPUs. Then, the fireworks started.
Deep Learning Performs Miracles
Alex Krizhevsky of the University of Toronto won the 2012 ImageNet computer image recognition competition. Krizhevsky beat — by a huge margin — handcrafted software written by computer vision experts. Krizhevsky and his team wrote no computer vision code. Rather, using deep learning, their computer learned to recognize images by itself. They designed a neural network called AlexNet and trained it with a million example images that required trillions of math operations on NVIDIA GPUs. Krizhevksy’s AlexNet had beaten the best human-coded software.
The AI race was on. By 2015, another major milestone was reached.
Using deep learning, Google and Microsoft both beat the best human score in the ImageNet challenge. Not a human-written program, but a human. Shortly thereafter, Microsoft and the China University of Science and Technology announced a DNN that achieved IQ test scores at the college post-graduate level.
Then Baidu announced that a deep learning system called Deep Speech 2 had learned both English and Mandarin with a single algorithm. And all top results of the 2015 ImageNet competition were based on deep learning, running on GPU-accelerated deep neural networks, and many beating human-level accuracy.
In 2012, deep learning had beaten human-coded software. By 2015, deep learning had achieved “superhuman” levels of perception.
A New Computing Platform for a New Software Model
Computer programs contain commands that are largely executed sequentially. Deep learning is a fundamentally new software model where billions of software-neurons and trillions of connections are trained, in parallel.
Running DNN algorithms and learning from examples, the computer is essentially writing its own software. This radically different software model needs a new computer platform to run efficiently. Accelerated computing is an ideal approach and the GPU is the ideal processor.
As Nature recently noted, early progress in deep learning was “made possible by the advent of fast graphics processing units (GPUs) that were convenient to program and allowed researchers to train networks 10 or 20 times faster.”
A combination of factors is essential to create a new computing platform — performance, programming productivity, and open accessibility.
Performance. NVIDIA GPUs are naturally great at parallel workloads and speed up DNNs by 10-20x, reducing each of the many training iterations from weeks to days. We didn’t stop there. By collaborating with AI developers, we continued to improve our GPU designs, system architecture, compilers, and algorithms, and sped up training deep neural networks by 50x in just three years — a much faster pace than Moore’s Law. We expect another 10x boost in the next few years.
Programmability. AI innovation is on a breakneck pace. Ease of programming and developer productivity are paramount. The programmability and richness of NVIDIA’s CUDA platform allow researchers to innovate quickly — building new configurations of CNNs, DNNs, deep inception networks, RNNs, LSTMs, and reinforcement learning networks.
Accessibility. Developers want to create anywhere and deploy everywhere. NVIDIA GPUs are available all over the world, from every PC OEM; in desktops, notebooks, servers, or supercomputers; and in the cloud from Amazon, IBM, and Microsoft. All major AI development frameworks are NVIDIA GPU accelerated — from internet companies, to research, to startups. No matter the AI development system preferred, it will be faster with GPU acceleration.
We have also created GPUs for just about every computing form-factor so that DNNs can power intelligent machines of all kinds. GeForce is for PC. Tesla is for cloud and supercomputers. Jetson is for robots and drones. And DRIVE PX is for cars. All share the same architecture and accelerate deep learning.
Every Industry Wants Intelligence
Baidu, Google, Facebook, Microsoft were the first adopters of NVIDIA GPUs for deep learning. This AI technology is how they respond to your spoken word, translate speech or text to another language, recognize and automatically tag images, and recommend newsfeeds, entertainment, and products that are tailored to what each of us likes and cares about.
Startups and established companies are now racing to use AI to create new products and services, or improve their operations. In just two years, the number of companies NVIDIA collaborates with on deep learning has jumped nearly 35x to over 3,400 companies.
Industries such as healthcare, life sciences, energy, financial services, automotive, manufacturing, and entertainment will benefit by inferring insight from mountains of data. And, with Facebook, Google, and Microsoft opening their deep-learning platforms for all to use, AI-powered applications will spread fast. In light of this trend, Wired recently heralded the “rise of the GPU.”
Self-driving cars. Whether to augment humans with a superhuman co-pilot, or revolutionize personal mobility services, or reduce the need for sprawling parking lots within cities, self-driving cars have the potential to do amazing social good. Driving is complicated. Unexpected things happen. Freezing rain turns the road into a skating rink. The road to your destination is closed. A child runs out in front of the car.
You can’t write software that anticipates every possible scenario a self-driving car might encounter. That’s the value of deep learning; it can learn, adapt, and improve. We are building an end-to-end deep learning platform called NVIDIA DRIVE PX for self-driving cars — from the training system to the in-car AI computer. The results are very exciting. A future with superhuman computer co-pilots and driverless shuttles is no longer science fiction.
Robots. FANUC, a leading manufacturing robot maker, recently demonstrated an assembly-line robot that learned to “pick” randomly oriented objects out of a bin. The GPU-powered robot learned by trial and error. This deep-learning technology was developed by Preferred Networks, which was recently featured in a The Wall Street Journal article headlined, “Japan Seeks Tech Revival with Artificial Intelligence.”
Healthcare and Life Sciences. Deep Genomics is applying GPU-based deep learning to understand how genetic variations can lead to disease. Arterys uses GPU-powered deep learning to speed analysis of medical images. Its technology will be deployed in GE Healthcare MRI machines to help diagnose heart disease. Enlitic is using deep learning to analyze medical images to identify tumors, nearly invisible fractures, and other medical conditions.
These are just a handful of examples. There are literally thousands.
Accelerating AI with GPUs: A New Computing Model
[adrotate banner=”4″]
Deep-learning breakthroughs have sparked the AI revolution. Machines powered by AI deep neural networks solve problems too complex for human coders. They learn from data and improve with use. The same DNN can be trained by even non-programmers to solve new problems. Progress is exponential. Adoption is exponential.
And we believe the impact to society will also be exponential. A recent study by KPMG predicts that computerized driver assistance technologies will help reduce car accidents 80% in 20 years — that’s nearly 1 million lives a year saved. Deep-learning AI will be its cornerstone technology.
The impact to the computer industry will also be exponential. Deep learning is a fundamentally new software model. So we need a new computer platform to run it — an architecture that can efficiently execute programmer-coded commands as well as the massively parallel training of deep neural networks. We are betting that GPU-accelerated computing is the horse to ride. Popular Science recently called the GPU “the workhorse of modern A.I.” We agree.
Support Tech ARP!
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
Audi. BMW. Ford. Mercedes-Benz. Volvo. Some of the world’s biggest automotive names are flocking to DRIVE, our powerful engine for in-vehicle artificial intelligence.
So are a group of fast-moving, smaller innovators that are shaking up the auto industry. Companies such as ZMP, Preferred Networks and AdasWorks are using DRIVE PX to give automobiles astonishing new capabilities.
Unveiled Monday at CES 2016, in Las Vegas, DRIVE PX 2 provides supercomputer-class performance — up to 24 trillion operations per second for artificial intelligence applications — in a case the size of a shoebox.
Here’s a look at just three of the companies working with DRIVE PX:
Bringing Autonomous Driving to Taxis
Tokyo-based ZMP — which is working to help create autonomous taxis, among other projects — is using deep learning technology and NVIDIA DRIVE PX to dramatically improve accuracy of detection and decision-making algorithms for autonomous driving.
“ZMP is achieving remarkable results using deep neural networks on NVIDIA GPUs for pedestrian detection,” said Hisashi Taniguchi, CEO of ZMP. “We will expand our use of deep learning on NVIDIA GPUs to realize our driverless Robot Taxi service.”
In Gear with Toyota
[adrotate group=”2″]
Preferred Networks is one of the best-known machine learning startups in Japan. The Tokyo-based company is working closely with Toyota — which purchased a 3% stake in Preferred Networks just a few weeks ago — to give cars autonomous driving capabilities.
With the NVIDIA deep learning platform, Preferred Networks has greatly improved performance on a variety of applications, such as image recognition for automotive and surveillance cameras, automated control of robotics, and health diagnostics, according to Preferred Networks founder Daisuke Okanohara.
“The remarkable thing is that we did it all with a single NVIDIA GPU-powered deep neural network, in a very short time,” Okanohara said.
Eyes on the Road
We’re also working with AdasWorks, a Budapest-based developer of artificial intelligence-based software for automated driving, to bring the power of our GPUs to Volvo Cars.
olvo will use the NVIDIA DRIVE PX 2 deep learning-based computing platform to power a fleet of 100 Volvo XC90 SUVs that will hit public roads next year, driven by actual customers as part of the the Swedish carmaker’s Drive Me autonomous-car pilot program.
AdasWorks worked with Volvo to help create a system that processes data from multiple sensors in real time to provide 360-degree detection of lanes, vehicles, pedestrians, signs and more, enabling a variety of autopilot functions.
NVIDIA DRIVE is more than just a component automakers can bolt into their cars. It’s an end-to-end solution for deep learning that includes a wide variety of tools and technologies, such as our DIGITS software for neural network training.
To see how it all comes together, visit our booth at CES. We’re in the North Hall, right in the middle of this year’s automotive action.
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
Jan. 5, 2016—Volvo Cars will use the NVIDIA DRIVE™ PX 2 deep learning- based computing engine to power a fleet of 100 Volvo XC90 SUVs starting to hit the road next year in the Swedish carmaker’s Drive Me autonomous car pilot programme, NVIDIA announced today.
Autonomous technology is an important contributor to Volvo’s Vision 2020 – its guiding principles for creating safer vehicles. This work has resulted in world-leading advancements in autonomous and semi-autonomous driving, and a new safety benchmark for the automotive industry.
“Our vision is that no one should be killed or seriously injured in a new Volvo by the year 2020,” said Marcus Rothoff, director of the Autonomous Driving Programme at Volvo Cars. “NVIDIA’s high-performance and responsive automotive platform is an important step towards our vision and perfect for our autonomous drive programme and the Drive Me project.”
The Volvo XC90 Drive Me Project
Volvo’s Drive Me autonomous pilot programme will equip the Volvo XC90 luxury cars with the NVIDIA DRIVE PX 2 engine, which uses deep learning to navigate the complexities of driving. The cars will operate autonomously on roads around Gothenburg, the carmaker’s hometown, and semi-autonomously elsewhere.
“Volvo’s Drive Me project is the ideal application of our DRIVE PX 2 engine and deep learning,” said Rob Csongor, vice president and general manager of Automotive at NVIDIA. “We are bringing years of work by thousands of NVIDIA engineers to help Volvo achieve its safety goals and move self-driving cars from Gothenburg to the rest of the globe.”
Recognising Objects Beyond Reach of Human Algorithms
The NVIDIA DRIVE PX 2 engine enables cars to utilise deep learning – a form of artificial intelligence – to recognise objects in their environment, anticipate potential threats and navigate safely. With 8 teraflops of processing power – equivalent to 250 MacBook Pros – it processes data from multiple sensors in real time, providing 360-degree detection of lanes, vehicles, pedestrians, signs and more, to enable a variety of autopilot functions.
Recent deep-learning breakthroughs have greatly enhanced computers’ ability to perceive the outside world. Using vast amounts of data and processing power, they can write software to recognise complex objects at a level beyond the reach of human-coded algorithms.
Much deep learning work is powered by NVIDIA’s supercomputing GPUs. For example, Microsoft and Google have used GPUs to create image-recognition systems that beat a well-trained human in the ImageNet Large Scale Visual Recognition Challenge. And Microsoft researchers recently trained a deep neural net that beat a human in IQ tests.
Map Localisation and Path Planning
[adrotate group=”2″]
For map localisation and path planning, the system can compare real-time situational awareness with a known high-definition map, enabling it to plan a safe route and drive precisely along it, adjusting to ever-changing circumstances.
DRIVE PX 2 will also perform other critical functions such as stitching camera inputs to create a complete surround-view of the car.
Because self-driving cars require massive computing resources to interpret the data from multiple sensors, most early prototypes have contained a trunk full of computers. In contrast, DRIVE PX 2, which carries out the same functions, is the size of a tablet.
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
Jan. 5, 2016—Accelerating the race to autonomous cars, NVIDIA today launched NVIDIA DRIVE PX 2 – the world’s most powerful engine for in-vehicle artificial intelligence.
NVIDIA DRIVE PX 2 allows the automotive industry to use artificial intelligence to tackle the complexities inherent in autonomous driving. It utilises deep learning on NVIDIA’s most advanced GPUs for 360-degree situational awareness around the car, to determine precisely where the car is and to compute a safe, comfortable trajectory.
“Drivers deal with an infinitely complex world,” said Jen-Hsun Huang, co-founder and CEO, NVIDIA. “Modern artificial intelligence and GPU breakthroughs enable us to finally tackle the daunting challenges of self-driving cars.
“NVIDIA’s GPU is central to advances in deep learning and supercomputing. We are leveraging these to create the brain of future autonomous vehicles that will be continuously alert, and eventually achieve superhuman levels of situational awareness. Autonomous cars will bring increased safety, new convenient mobility services and even beautiful urban designs – providing a powerful force for a better future.”
NVIDIA DRIVE PX 2 Deep Learning
Created to address the needs of NVIDIA’s automotive partners for an open development platform, DRIVE PX 2 provides unprecedented amounts of processing power for deep learning, equivalent to that of 100 MacBook Pros.
Its two next-generation Tegra® processors plus two next-generation discrete GPUs, based on the Pascal™ architecture, deliver up to 24 trillion deep learning operations per second, which are specialised instructions that accelerate the math used in deep learning network inference. That’s over 10 times more computational horsepower than the previous-generation product.
DRIVE PX 2’s deep learning capabilities enable it to quickly learn how to address the challenges of everyday driving, such as unexpected road debris, erratic drivers and construction zones. Deep learning also addresses numerous problem areas where traditional computer vision techniques are insufficient – such as poor weather conditions like rain, snow and fog, and difficult lighting conditions like sunrise, sunset and extreme darkness.
For general purpose floating point operations, DRIVE PX 2’s multi-precision GPU architecture is capable of up to eight trillion operations per second. That’s over four times more than the previous-generation product. This enables partners to address the full breadth of autonomous driving algorithms, including sensor fusion, localisation and path planning. It also provides high precision compute when needed for layers of deep learning networks.
Deep Learning in Self-Driving Cars
[adrotate group=”2″]
Self-driving cars use a broad spectrum of sensors to understand their surroundings. DRIVE PX 2 can process the inputs of 12 video cameras, plus lidar, radar and ultrasonic sensors. It fuses them to accurately detect objects, identify them, determine where the car is relative to the world around it, and then calculate its optimal path for safe travel.
This complex work is facilitated by NVIDIA DriveWorks™, a suite of software tools, libraries and modules that accelerates development and testing of autonomous vehicles. DriveWorks enables sensor calibration, acquisition of surround data, synchronisation, recording and then processing streams of sensor data through a complex pipeline of algorithms running on all of the DRIVE PX 2’s specialised and general-purpose processors.
Software modules are included for every aspect of the autonomous driving pipeline, from object detection, classification and segmentation to map localisation and path planning.
End-to-End Solution for Deep Learning
NVIDIA delivers an end-to-end solution – consisting of NVIDIA DIGITS™ and DRIVE PX 2 – for both training a deep neural network, as well as deploying the output of that network in a car.
DIGITS is a tool for developing, training and visualising deep neural networks that can run on any NVIDIA GPU-based system – from PCs and supercomputers to Amazon Web Services and the recently announced Facebook Big Sur Open Rack-compatible hardware. The trained neural net model runs on NVIDIA DRIVE PX 2 within the car.
Strong Market Adoption
Since NVIDIA delivered the first-generation DRIVE PX last summer, more than 50 automakers, tier 1 suppliers, developers and research institutions have adopted NVIDIA’s AI platform for autonomous driving development. They are praising its performance, capabilities and ease of development.
“Using NVIDIA’s DIGITS deep learning platform, in less than four hours we achieved over 96 percent accuracy using Ruhr University Bochum’s traffic sign database. While others invested years of development to achieve similar levels of perception with classical computer vision algorithms, we have been able to do it at the speed of light.” — Matthias Rudolph, director of Architecture Driver Assistance Systems at Audi
“BMW is exploring the use of deep learning for a wide range of automotive use cases, from autonomous driving to quality inspection in manufacturing. The ability to rapidly train deep neural networks on vast amounts of data is critical. Using an NVIDIA GPU cluster equipped with NVIDIA DIGITS, we are achieving excellent results.” — Uwe Higgen, head of BMW Group Technology Office USA
“Due to deep learning, we brought the vehicle’s environment perception a significant step closer to human performance and exceed the performance of classic computer vision.” — Ralf G. Herrtwich, director of Vehicle Automation at Daimler
“Deep learning on NVIDIA DIGITS has allowed for a 30X enhancement in training pedestrian detection algorithms, which are being further tested and developed as we move them onto NVIDIA DRIVE PX.” — Dragos Maciuca, technical director of Ford Research and Innovation Center
NVIDIA DRIVE PX 2 Availability
The DRIVE PX 2 development engine will be generally available in the fourth quarter of 2016. Availability to early access development partners will be in the second quarter.
If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!
Dec. 11, 2015—NVIDIA today announced that Facebook will power its next-generation computing system with the NVIDIA® Tesla® Accelerated Computing Platform, enabling it to drive a broad range of machine learning applications.
While training complex deep neural networks to conduct machine learning can take days or weeks on even the fastest computers, the Tesla platform can slash this by 10-20x. As a result, developers can innovate more quickly and train networks that are more sophisticated, delivering improved capabilities to consumers.
Facebook is the first company to adopt NVIDIA Tesla M40 GPU accelerators, introduced last month, to train deep neural networks. They will play a key role in the new “Big Sur” computing platform, Facebook AI Research’s (FAIR) purpose-built system designed specifically for neural network training.
“Deep learning has started a new era in computing,” said Ian Buck, vice president of accelerated computing at NVIDIA. “Enabled by big data and powerful GPUs, deep learning algorithms can solve problems never possible before. Huge industries from web services and retail to healthcare and cars will be revolutionised. We are thrilled that NVIDIA GPUs have been adopted as the engine of deep learning. Our goal is to provide researchers and companies with the most productive platform to advance this exciting work.”
In addition to reducing neural network training time, GPUs offer a number of other advantages. Their architectural compatibility from generation to generation provides seamless speed-ups for future GPU upgrades. And the Tesla platform’s growing global adoption facilitates open collaboration with researchers around the world, fueling new waves of discovery and innovation in the machine learning field.
Big Sur Optimised for Machine Learning
NVIDIA worked with Facebook engineers on the design of Big Sur, optimising it to deliver maximum performance for machine learning workloads, including the training of large neural networks across multiple Tesla GPUs.
[adrotate banner=”4″]Two times faster than Facebook’s existing system, Big Sur will enable the company to train twice as many neural networks – and to create neural networks that are twice as large – which will help develop more accurate models and new classes of advanced applications.
“The key to unlocking the knowledge necessary to develop more intelligent machines lies in the capability of our computing systems,” said Serkan Piantino, engineering director for FAIR. “Most of the major advances in machine learning and AI in the past few years have been contingent on tapping into powerful GPUs and huge data sets to build and train advanced models.”
The addition of Tesla M40 GPUs will help Facebook make new advancements in machine learning research and enable teams across its organisation to use deep neural networks in a variety of products and services.
First Open Sourced AI Computing Architecture
Big Sur represents the first time a computing system specifically designed for machine learning and artificial intelligence (AI) research will be released as an open source solution.
Committed to doing its AI work in the open and sharing its findings with the community, Facebook intends to work with its partners to open source Big Sur specifications via the Open Compute Project. This unique approach will make it easier for AI researchers worldwide to share and improve techniques, enabling future innovation in machine learning by harnessing the power of GPU accelerated computing.