Tag Archives: Deep machine learning

China Still Has Access To High-Speed NVIDIA AI Chips!

Military institutions, AI research institutes and universities in China are still able to source and buy NVIDIA AI chips, albeit in small quantities!

 

AMD + NVIDIA Banned From Selling AI Chips To China!

Both AMD and NVIDIA were ordered by the US government to stop selling high-performance AI chips to both China and Russia on 26 August 2022. This ban was introduced to prevent both countries from using those high-performance AI chips for military purposes.

With immediate effect, the US government banned the export of all AI chips that are equal to, or faster than, the NVIDIA A100 (and H100), or the AMD Instinct MI250 chips. NVIDIA then created slower A800 and H800 AI chips for the Chinese market, but even they were also banned in October 2023.

Recommended : AMD, NVIDIA Banned From Selling AI Chips To China!

 

China Still Has Access To High-Speed NVIDIA AI Chips!

Despite the ongoing ban on the sale of high-performance AI chips to China and Russia, it appears that Chinese military-linked research institutes are still able to source and buy NVIDIA AI chips, albeit in small quantities!

According to a Reuters report on 14 January 2024, public tender documents show that dozens of military institutions, AI research institutes and universities in China with links to the military, have purchased and received high-performance NVIDIA AI chips like the A100 and the H100, as well as the slower A800 and H800 AI chips.

  • Harbin Institute of Technology purchased six NVIDIA A100 chips in May 2023, to train a deep-learning model
  • University of Electronic Science and Technology of China purchased on NVIDIA A100 in December 2022, for an unspecified purpose.

Both universities are subject to the US export restrictions, although the sale of those AI chips are not illegal in China.

More than 100 tenders were identified, in which Chinese state entities successfully purchased NVIDIA A100 and H100 chips, and dozens of tenders show successful purchases of the slower A800 chips.

  • Tsinghua University purchased two H100 chips in December 2023, as well as about eighty A100 chips since September 2022.
  • A Ministry of Industry and Information Technology laboratory purchased a H100 chip in December 2023.
  • An unnamed People’s Liberation Army (PLA) entity based in Wuxi sought to purchase three A100 chips in October 2023, and one H100 chip in January 2024
  • Shandong Artificial Intelligence Institute purchased five A100 chips from Shandong Chengxiang Electronic Technology in December 2023
  • Chongqing University purchased an NVIDIA A100 chip in January 2024.

Recommended : Can StopNCII Remove All Nude / Deep Fake Photos?!

To be clear – neither NVIDIA or its approved retailers were found to have supplied those chips. NVIDIA said that it complies with all applicable export control laws, and requires its customers to do the same:

If we learn that a customer has made an unlawful resale to third parties, we’ll take immediate and appropriate action.

– NVIDIA spokesperson

Even though Chinese state entities appear to be able to purchase high-performance AI chips, the Reuters report also shows the effectiveness of the American AI chip ban.

The training of large artificial intelligence models require thousands of high-performance AI chips, and China does not seem to be able to procure more than a handful of these critical chips.

That does not mean China is slowing down its AI initiatives. Instead of relying on “gray imports” of AMD or NVIDIA AI chips, Chinese entities are doing their best to switch to local alternatives. In 2023, HUAWEI received orders for some 5,000 of its Ascent 910B chips.

Chinese mega-companies like Baidu, Alibaba, and Tencent also have their own in-house AI chips like the Kunlunxin Gen 2, Hanguang 800, and Zixiao.

 

Please Support My Work!

Support my work through a bank transfer /  PayPal / credit card!

Name : Adrian Wong
Bank Transfer : CIMB 7064555917 (Swift Code : CIBBMYKL)
Credit Card / Paypal : https://paypal.me/techarp

Dr. Adrian Wong has been writing about tech and science since 1997, even publishing a book with Prentice Hall called Breaking Through The BIOS Barrier (ISBN 978-0131455368) while in medical school.

He continues to devote countless hours every day writing about tech, medicine and science, in his pursuit of facts in a post-truth world.

[/su_note]

 

Recommended Reading

Go Back To > Business | ComputerTech ARP

 

Support Tech ARP!

Please support us by visiting our sponsors, participating in the Tech ARP Forums, or donating to our fund. Thank you!

How NVIDIA A800 Bypasses US Chip Ban On China!

Find out how NVIDIA created the new A800 GPU to bypass the US ban on sale of advanced chips to China!

 

NVIDIA Offers A800 GPU To Bypass US Ban On China!

Two months after it was banned by the US government from selling high-performance AI chips to China, NVIDIA introduced a new A800 GPU designed to bypass those restrictions.

The new NVIDIA A800 is based on the same Ampere microarchitecture as the A100, which was used as the performance baseline by the US government.

Despite its numerically larger model number (the lucky number 8 was probably picked to appeal to the Chinese), this is a detuned part, with slightly reduced performance to meet export control limitations.

The NVIDIA A800 GPU, which went into production in Q3, is another alternative product to the NVIDIA A100 GPU for customers in China.

The A800 meets the U.S. government’s clear test for reduced export control and cannot be programmed to exceed it.

NVIDIA is probably hoping that the slightly slower NVIDIA A800 GPU will allow it to continue supplying China with A100-level chips that are used to power supercomputers and high-performance datacenters for artificial intelligence applications.

As I will show you in the next section, except in very high-end applications, there won’t be truly significant performance difference between the A800 and the A100. So NVIDIA customers who want or need the A100 will have no issue opting for the A800 instead.

However, this can only be a stopgap fix, as NVIDIA is stuck selling A100-level chips to China until and unless the US government changes its mind.

Read more : AMD, NVIDIA Banned From Selling AI Chips To China!

 

How Fast Is The NVIDIA A800 GPU?

The US government considers the NVIDIA A100 as the performance baseline for its export control restrictions on China.

Any chip equal or faster to that Ampere-based chip, which was launched on May 14, 2020, is forbidden to be sold or exported to China. But as they say, the devil is in the details.

The US government didn’t specify just how much slower chips must be, to qualify for export to China. So NVIDIA could technically get away by slightly detuning the A100, while offering almost the same performance level.

And that was what NVIDIA did with the A800 – it is basically the A100 with a 33% slower NVLink interconnect speed. NVIDIA also limited the maximum number of GPUs supported in a single server to 8.

That only slightly reduces the performance of A800 servers, compare to A100 servers, while offering the same amount of GPU compute performance. Most users will not notice the difference.

The only significant impediment is on the very high-end – Chinese companies are now restricted to a maximum of eight GPUs per server, instead of up to sixteen.

To show you what I mean, I dug into the A800 specifications, and compared them to the A100 below:

NVIDIA A100 vs A800 : 80GB PCIe Version

Specifications A100
80GB PCIe
A800
80GB PCIe
FP64 9.7 TFLOPS
FP64 Tensor Core 19.5 TFLOPS
FP32 19.5 TFLOPS
Tensor Float 32 156 TFLOPS
BFLOAT 16 Tensor Core 312 TFLOPS
FP16 Tensor Core 312 TFLOPS
INT8 Tensor Core 624 TOPS
GPU Memory 80 GB HBM2
GPU Memory Bandwifth 1,935 GB/s
TDP 300 W
Multi-Instance GPU Up to 7 MIGs @ 10 GB
Interconnect NVLink : 600 GB/s
PCIe Gen4 : 64 GB/s
NVLink : 400 GB/s
PCIe Gen4 : 64 GB/s
Server Options 1-8 GPUs

NVIDIA A100 vs A800 : 80GB SXM Version

Specifications A100
80GB SXM
A800
80GB SXM
FP64 9.7 TFLOPS
FP64 Tensor Core 19.5 TFLOPS
FP32 19.5 TFLOPS
Tensor Float 32 156 TFLOPS
BFLOAT 16 Tensor Core 312 TFLOPS
FP16 Tensor Core 312 TFLOPS
INT8 Tensor Core 624 TOPS
GPU Memory 80 GB HBM2
GPU Memory Bandwifth 2,039 GB/s
TDP 400 W
Multi-Instance GPU Up to 7 MIGs @ 10 GB
Interconnect NVLink : 600 GB/s
PCIe Gen4 : 64 GB/s
NVLink : 400 GB/s
PCIe Gen4 : 64 GB/s
Server Options 4/ 8 / 16 GPUs 4 / 8 GPUs

NVIDIA A100 vs A800 : 40GB PCIe Version

Specifications A100
40GB PCIe
A800
40GB PCIe
FP64 9.7 TFLOPS
FP64 Tensor Core 19.5 TFLOPS
FP32 19.5 TFLOPS
Tensor Float 32 156 TFLOPS
BFLOAT 16 Tensor Core 312 TFLOPS
FP16 Tensor Core 312 TFLOPS
INT8 Tensor Core 624 TOPS
GPU Memory 40 GB HBM2
GPU Memory Bandwifth 1,555 GB/s
TDP 250 W
Multi-Instance GPU Up to 7 MIGs @ 10 GB
Interconnect NVLink : 600 GB/s
PCIe Gen4 : 64 GB/s
NVLink : 400 GB/s
PCIe Gen4 : 64 GB/s
Server Options 1-8 GPUs

 

Please Support My Work!

Support my work through a bank transfer /  PayPal / credit card!

Name : Adrian Wong
Bank Transfer : CIMB 7064555917 (Swift Code : CIBBMYKL)
Credit Card / Paypal : https://paypal.me/techarp

Dr. Adrian Wong has been writing about tech and science since 1997, even publishing a book with Prentice Hall called Breaking Through The BIOS Barrier (ISBN 978-0131455368) while in medical school.

He continues to devote countless hours every day writing about tech, medicine and science, in his pursuit of facts in a post-truth world.

 

Recommended Reading

Go Back To > Business | ComputerTech ARP

 

Support Tech ARP!

Please support us by visiting our sponsors, participating in the Tech ARP Forums, or donating to our fund. Thank you!

The Human-Machine Partnership by Erik Brynjolfsson + Rana el Kaliouby

At the Dell Technologies World 2019, we were lucky enough to snag a seat at the talk by MIT Professor Erik Brynjolfsson; and MIT alumni and Affectiva CEO, Rana el Kaliouby, on human-machine partnership.

We managed to record the incredibly insightful session for everyone who could not make it for this exclusive guru session. This is a video you must not miss!

 

The DTW 2019 Guru Sessions

One of the best reasons to attend Dell Technologies World 2019 are the guru sessions. If you are lucky enough to reserve a seat, you will have the opportunity to listen to some of the world’s most brilliant thinkers and doers.

 

The Human-Machine Partnership

The talk on human-machine partnership by Professor Brynjolfsson and Ms. Rana was the first of several guru sessions at Dell Technologies World 2019.

Entitled “How Emerging Technologies & Human Machine Partnerships Will Transform the Economy“, it focused on how technology changed human society, and what the burgeoning efforts in artificial intelligence will mean for humanity.

Here are the key points from their guru session on the human-machine partnership :

Erik Brynjolfsson (00:05 to 22:05) on the Human-Machine Partnership

  • You cannot replace old technologies with new technologies, without rethinking the organisation or institution.
  • We are now undergoing a triple revolution
    – a rebalancing of mind and machine through Big Data and Artificial Intelligence
    – a shift from products to (digital) platforms
    – a shift from the core to crowd-based decision making
  • Shifting to data-driven decision-making based on Big Data results in higher productivity and greater profitability.
  • Since 2015, computers can now recognise objects better than humans, thanks to rapid advances in machine learning.
  • Even machine-based speech recognition has become as accurate as humans from 2017 onwards.
  • While new AI capabilities are opening up new possibilities in many fields, they are also drastically reducing or eliminating the need for humans.
  • Unlike platforms of the past, the new digital networks leverage “two-sided networks“. In many cases, one network is used to subsidise the other network, or make it free-to-use.
  • Shifting to crowd-based decision-making introduces diversity in the ways of thinking, gaining new perspectives and breakthroughs in problem-solving.
  • Digital innovations have greatly expanded the economy, but it doesn’t mean that everyone will benefit. In fact, there has been a great decoupling between the productivity and median income of the American worker in the past few decades.

Rana el Kaliouby (22:08 to 45:05) on the Human-Machine Partnership

  • Human communication is mostly conveyed indirectly – 93% is non-verbal. Half of that are facial expression and gestures, the other half is vocal intonation.
  • Affectiva has the world’s largest emotion repository, with 5 billion frames of 8 million faces from 87 countries.
  • Facial expressions are largely universal, but there is a need diversity of their data to avoid bias in their models. For example, there are gender differences that vary by culture.
  • They use computer vision, machine learning and deep learning to create an Emotional AI model that learns from all those facial expressions to accurately determine a person’s emotions.
  • Emotional artificial intelligence has many real-world or potential uses
    – detecting dangerous driving, allowing for proactive measures to be taken
    – personalising the ride in a future robot-taxi or autonomous car
    – the creation of more engaging and effective social robots in retail and hospitality industries
    – help autistic children understand how facial expressions correspond to emotions, and learn social cues.

 

Erik Brynjolfsson + Rana el Kaliouby

Professor Erik Brynjolfsson holds many hats. He is currently :

  • Professor at the MIT Sloan School of Management,
  • Director of the MIT Initiative on the Digital Economy,
  • Director of the MIT Center for Digital Business, and
  • Research Associate at the National Bureau of Economic Research

Rana el Kaliouby was formerly a computer scientist at MIT, helping to form their Autism & Communication Technology Initiative. She currently serves as CEO of Affectiva, a spin-off from MIT’s Media Lab that focuses on emotion recognition technology.

 

Recommended Reading

Go Back To > Enterprise + Business | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


Microsoft Build 2019 : New Azure Technologies Unveiled!

A host of new Microsoft Azure technologies for developers have been announced at the Microsoft Build 2019 conference, which took place in Seattle. Here is a primer on what they announced!

 

Microsoft Build 2019 : New Azure Technologies Unveiled!

With nearly 6,000 developers and content creators attending Microsoft Build 2019 in Seattle, Microsoft announced a series of new Azure services like hybrid loud and edge computing to support them. They include advanced technologies such as,

  • Artificial Intelligence (AI)
  • Mixed reality
  • IoT (Internet of Things)
  • Blockchain

 

Microsoft Build 2019 : New Azure AI Technologies

First of all, they unveiled a new set of Microsoft Azure AI technologies to help developers and data scientists utilize AI as a solution :

  • Azure Cognitive Services, which iwll enable applications to see, hear, respond, translate, reason and more.
  • Microsoft will add the “Decision” function to Cognitive Services to help users make decisions through highly specific and customized recommendations.
  • Azure Search will also be further enhanced with an AI feature.

 

Microsoft Build 2019 : New Microsoft Azure Machine Learning Innovations

Microsoft Azure Machine Learning has been enhanced with new machine learning innovations designed to simplify the building, training and deployment of machine learning models. They include :

  • MLOps capabilities with Azure DevOps
  • Automated ML advancements
  • Visual machine learning interface

Microsoft Build 2019 : New Edge Computing Solutions

Microsoft also aims to boost edge computing by introducing these new solutions:

  • Azure SQL Database Edge
  • IoT Plug and Play
  • HoloLens 2 Developer Bundle
  • Unreal Engine 4

Microsoft Build 2019 : Azure Blockchain Service

The Azure Blockchain Workbench, which Microsoft released last year to support development of blockchain applications, has been further enhanced this year with the Azure Blockchain Service.

Azure Blockchain Service is a tool that simplifies the formation and management of consortium blockchain networks so companies only need to focus on app development.

J.P Morgan’s Ethereum platform was introduced by Microsoft as the first ledger available in the Azure Blockchain Service.

 

Recommended Reading

Go Back To > Business + Enterprise | Home

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


Samsung ConZNet Algorithm Tops Two AI Challenges!

Samsung Research, the advanced Research & Development (R&D) hub of Samsung Electronics, has dedicated substantial effort in creating ground-breaking AI technologies. And it has succeeded with the ConZNet algorithm. Here’s the low down!

 

Samsung ConZNet Algorithm Tops Two AI Challenges

The Samsung Research R&D team used their ConZNet algorithm to rank first in the MAchine Reading COmprehension (MS MARCO by MS Microsoft) competition, and won “Best Performance” in TriviaQA which was hosted by the University of Washington.

MS MARCO and TriviaQA are among the most actively researched and used machine reading comprehension competitions in the world. In these competitions, AI algorithms are tested in their capabilities of processing natural language in human question and answers, while also providing written text in various types of documents such as news articles and blog posts.

Competitions such as MS MARCO and TriviaQA allow contestants to participate at any time, and rankings are altered according to real-time test results.

What Is The ConZNet Algorithm?

The Samsung Research’s ConZNet algorithm advances machine intelligence by giving reasonable feedback for outcomes, similar to a stick-and-carrot (or reinforcement) strategy in the learning process. ConZNet takes natural language into account such as how people deliver queries and answers online which was the key factor in determining the winners of these competitions.

What Are The Potential Uses Of ConZNet?

[adrotate group=”2″]

With this, there is very high potential in introducing Samsung Research’s AI algorithm to other departments in Samsung Electronics such as Home Appliances and Smartphones.

Apart from that, departments dealing with customer services are also showing high interest in the AI, especially since AI-based customer services like chatbots have emerged as hot topics in recent times.

Samsung AI Centers

Samsung also revealed that they have begun launching global AI Centers, to collaborate with leading AI experts. Eventually, they hope the AI technologies developed by Samsung Research will be adopted and integrated into Samsung Electronics products and services.

Go Back To > Enterprise + Business | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Sophos Intercept X with Predictive Protection Explained!

Sophos today announced the availability of Intercept X with malware detection powered by advanced deep learning neural networks. Join us for a briefing by Sumit Bansal, Sophos Managing Director for ASEAN and Korea!

 

Sophos Intercept X with Predictive Protection

Combined with new active-hacker mitigation, advanced application lockdown, and enhanced ransomware protection, this latest release of the Sophos Intercept X endpoint protection delivers previously unseen levels of detection and prevention.

Deep learning is the latest evolution of machine learning. It delivers a massively scalable detection model that is able to learn the entire observable threat landscape. With the ability to process hundreds of millions of samples, deep learning can make more accurate predictions at a faster rate with far fewer false-positives when compared to traditional machine learning.

This new version of Sophos Intercept X also includes innovations in anti-ransomware and exploit prevention, and active-hacker mitigations such as credential theft protection. As anti-malware has improved, attacks have increasingly focused on stealing credentials in order to move around systems and networks as a legitimate user, and Intercept X detects and prevents this behavior.

Deployed through the cloud-based management platform Sophos Central, Intercept X can be installed alongside existing endpoint security software from any vendor, immediately boosting endpoint protection. When used with the Sophos XG Firewall, Intercept X can introduce synchronized security capabilities to further enhance protection.

 

New Sophos Intercept X Features

Deep Learning Malware Detection

  • Deep learning model detects known and unknown malware and potentially unwanted applications (PUAs) before they execute, without relying on signatures
  • The model is less than 20 MB and requires infrequent updates

Active Adversary Mitigations

  • Credential theft protection – Preventing theft of authentication passwords and hash information from memory, registry, and persistent storage, as leveraged by such attacks as Mimikatz
  • Code cave utilization – Detects the presence of code deployed into another application, often used for persistence and antivirus avoidance
  • APC protection – Detects abuse of Application Procedure Calls (APC) often used as part of the AtomBombing code injection technique and more recently used as the method of spreading the WannaCry worm and NotPetya wiper via EternalBlue and DoublePulsar (adversaries abuse these calls to get another process to execute malicious code)

New and Enhanced Exploit Prevention Techniques

[adrotate group=”2″]
  • Malicious process migration – Detects remote reflective DLL injection used by adversaries to move between processes running on the system
  • Process privilege escalation – Prevents a low-privilege process from being escalated to a higher privilege, a tactic used to gain elevated system access

Enhanced Application Lockdown

  • Browser behavior lockdown – Intercept X prevents the malicious use of PowerShell from browsers as a basic behavior lockdown
  • HTA application lockdown – HTML applications loaded by the browser will have the lockdown mitigations applied as if they were a browser

Go Back To > Events | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

NVIDIA TITAN V – The First Desktop Volta Graphics Card!

NVIDIA CEO Jensen Huang (recently anointed as Fortune 2017 Businessperson of the Year) made as surprise reveal at the NIPS conference – the NVIDIA TITAN V. This is the first desktop graphics card to be built on the latest NVIDIA Volta microarchitecture, and the first to use HBM2 memory.

In this article, we will share with you everything we know about the NVIDIA TITAN V, and how it compares against its TITANic predecessors. We will also share with you what we think could be a future NVIDIA TITAN Vp graphics card!

Updated @ 2017-12-10 : Added a section on gaming with the NVIDIA TITAN V [1].

Originally posted @ 2017-12-09

 

NVIDIA Volta

NVIDIA Volta isn’t exactly new. Back in GTC 2017, NVIDIA revealed NVIDIA Volta, the NVIDIA GV100 GPU and the first NVIDIA Volta-powered product – the NVIDIA Tesla V100. Jensen even highlighted the Tesla V100 in his Computex 2017 keynote, more than 6 months ago!

Yet there has been no desktop GPU built around NVIDIA Volta. NVIDIA continued to churn out new graphics cards built around the Pascal architecture – GeForce GTX 1080 Ti and GeForce GTX 1070 Ti. That changed with the NVIDIA TITAN V.

 

NVIDIA GV100

The NVIDIA GV100 is the first NVIDIA Volta-based GPU, and the largest they have ever built. Even using the latest 12 nm FFN (FinFET NVIDIA) process, it is still a massive chip at 815 mm²! Compare that to the GP100 (610 mm² @ 16 nm FinFET) and GK110 (552 mm² @ 28 nm).

That’s because the GV100 is built using a whooping 21.1 billion transistors. In addition to 5376 CUDA cores and 336 Texture Units, it boasts 672 Tensor cores and 6 MB of L2 cache. All those transistors require a whole lot more power – to the tune of 300 W.

[adrotate group=”1″]

 

The NVIDIA TITAN V

That’s V for Volta… not the Roman numeral V or V for Vendetta. Powered by the NVIDIA GV100 GPU, the TITAN V has 5120 CUDA cores, 320 Texture Units, 640 Tensor cores, and a 4.5 MB L2 cache. It is paired with 12 GB of HBM2 memory (3 x 4GB stacks) running at 850 MHz.

The blowout picture of the NVIDIA TITAN V reveals even more details :

  • It has 3 DisplayPorts and one HDMI port.
  • It has 6-pin + 8-pin PCIe power inputs.
  • It has 16 power phases, and what appears to be the Founders Edition copper heatsink and vapour chamber cooler, with a gold-coloured shroud.
  • There is no SLI connector, only what appears to be an NVLink connector.

Here are more pictures of the NVIDIA TITAN V, courtesy of NVIDIA.

 

Can You Game On The NVIDIA TITAN V? New!

Right after Jensen announced the TITAN V, the inevitable question was raised on the Internet – can it run Crysis / PUBG?

The NVIDIA TITAN V is the most powerful GPU for the desktop PC, but that does not mean you can actually use it to play games. NVIDIA notably did not mention anything about gaming, only that the TITAN V is “ideal for developers who want to use their PCs to do work in AI, deep learning and high performance computing.

[adrotate group=”2″]

In fact, the TITAN V is not listed in their GeForce Gaming section. The most powerful graphics card in the GeForce Gaming section remains the TITAN Xp.

Then again, the TITAN V uses the same NVIDIA Game Ready Driver as GeForce gaming cards, starting with version 388.59. Even so, it is possible that some or many games may not run well or properly on the TITAN V.

Of course, all this is speculative in nature. All that remains to crack this mystery is for someone to buy the TITAN V and use it to play some games!

Next Page > Specification Comparison, NVIDIA TITAN Vp?, The Official Press Release

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The NVIDIA TITAN V Specification Comparison

Let’s take a look at the known specifications of the NVIDIA TITAN V, compared to the TITAN Xp (launched earlier this year), and the TITAN X (launched late last year). We also inserted the specifications of a hypothetical NVIDIA TITAN Vp, based on a full GV100.

SpecificationsFuture TITAN Vp?NVIDIA TITAN VNVIDIA TITAN XpNVIDIA TITAN X
MicroarchitectureNVIDIA VoltaNVIDIA VoltaNVIDIA PascalNVIDIA Pascal
GPUGV100GV100GP102-400GP102-400
Process Technology12 nm FinFET+12 nm FinFET+16 nm FinFET16 nm FinFET
Die Size815 mm²815 mm²471 mm²471 mm²
Tensor Cores672640NoneNone
CUDA Cores5376512038403584
Texture Units336320240224
ROPsNANA9696
L2 Cache Size6 MB4.5 MB3 MB4 MB
GPU Core ClockNA1200 MHz1405 MHz1417 MHz
GPU Boost ClockNA1455 MHz1582 MHz1531 MHz
Texture FillrateNA384.0 GT/s
to
465.6 GT/s
355.2 GT/s
to
379.7 GT/s
317.4 GT/s
to
342.9 GT/s
Pixel FillrateNANA142.1 GP/s
to
151.9 GP/s
136.0 GP/s
to
147.0 GP/s
Memory TypeHBM2HBM2GDDR5XGDDR5X
Memory SizeNA12 GB12 GB12 GB
Memory Bus3072-bit3072-bit384-bit384-bit
Memory ClockNA850 MHz1426 MHz1250 MHz
Memory BandwidthNA652.8 GB/s547.7 GB/s480.0 GB/s
TDP300 watts250 watts250 watts250 watts
Multi GPU CapabilityNVLinkNVLinkSLISLI
Launch PriceNAUS$ 2999US$ 1200US$ 1200

 

The NVIDIA TITAN Vp?

In case you are wondering, the TITAN Vp does not exist. It is merely a hypothetical future model that we think NVIDIA may introduce mid-cycle, like the NVIDIA TITAN Xp.

Our TITAN Vp is based on the full capabilities of the NVIDIA GV100 GPU. That means it will have 5376 CUDA cores with 336 Texture Units, 672 Tensor cores and 6 MB of L2 cache. It will also have a higher TDP of 300 watts.

[adrotate group=”1″]

 

The Official NVIDIA TITAN V Press Release

December 9, 2017—NVIDIA today introduced TITAN V, the world’s most powerful GPU for the PC, driven by the world’s most advanced GPU architecture, NVIDIA Volta .

Announced by NVIDIA founder and CEO Jensen Huang at the annual NIPS conference, TITAN V excels at computational processing for scientific simulation. Its 21.1 billion transistors deliver 110 teraflops of raw horsepower, 9x that of its predecessor, and extreme energy efficiency.

“Our vision for Volta was to push the outer limits of high performance computing and AI. We broke new ground with its new processor architecture, instructions, numerical formats, memory architecture and processor links,” said Huang. “With TITAN V, we are putting Volta into the hands of researchers and scientists all over the world. I can’t wait to see their breakthrough discoveries.”

NVIDIA Supercomputing GPU Architecture, Now for the PC

TITAN V’s Volta architecture features a major redesign of the streaming multiprocessor that is at the center of the GPU. It doubles the energy efficiency of the previous generation Pascal design, enabling dramatic boosts in performance in the same power envelope.

New Tensor Cores designed specifically for deep learning deliver up to 9x higher peak teraflops. With independent parallel integer and floating-point data paths, Volta is also much more efficient on workloads with a mix of computation and addressing calculations. Its new combined L1 data cache and shared memory unit significantly improve performance while also simplifying programming.

Fabricated on a new TSMC 12-nanometer FFN high-performance manufacturing process customised for NVIDIA, TITAN V also incorporates Volta’s highly tuned 12GB HBM2 memory subsystem for advanced memory bandwidth utilisation.

 

Free AI Software on NVIDIA GPU Cloud

[adrotate group=”2″]

TITAN V’s incredible power is ideal for developers who want to use their PCs to do work in AI, deep learning and high performance computing.

Users of TITAN V can gain immediate access to the latest GPU-optimised AI, deep learning and HPC software by signing up at no charge for an NVIDIA GPU Cloud account. This container registry includes NVIDIA-optimised deep learning frameworks, third-party managed HPC applications, NVIDIA HPC visualisation tools and the NVIDIA TensorRT inferencing optimiser.

More Details : Now Everyone Can Use NVIDIA GPU Cloud!

 

Immediate Availability

TITAN V is available to purchase today for US$2,999 from the NVIDIA store in participating countries.

Go Back To > First PageArticles | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Volkswagen & NVIDIA In Deep Learning Partnership

For future-oriented digital topics, the Volkswagen Group remains committed to artificial intelligence (AI). This is why Volkswagen IT is cooperating with US technology company NVIDIA with a view to expanding its competence in the field of deep learning. At the Volkswagen Data Lab, IT experts are developing advanced AI systems with deep learning.

 

Volkswagen & NVIDIA In Deep Learning Partnership

At Volkswagen, the Data Lab has been named the Group’s center of excellence for AI and data analysis. Specialists are exploring possibilities to use deep learning in corporate processes and in the field of mobility services. For example, they are developing new procedures for optimizing traffic flow in cities. Advanced AI systems are also among the prerequisites for developments such as intelligent human-robot cooperation.

Dr. Martin Hofmann, CIO of the Volkswagen Group, says: “Artificial intelligence is the key to the digital future of the Volkswagen Group. We want to develop and deploy high-performance AI systems ourselves. This is why we are expanding our expert knowledge required. Cooperation with NVIDIA will be a major step in this direction.”

[adrotate group=”2″]

“AI is the most powerful technological force of our era,” says Jensen Huang, CEO of NVIDIA. “Thanks to AI, data centers are changing dramatically and enterprise computing is being reinvented. NVIDIA’s deep learning solutions will enable Volkswagen to turn the enormous amounts of information in its data centers into valuable insight, and transform its business.”

In addition, Volkswagen has established a startup support program at its Data Lab. The program will provide technical and financial support for international startups developing machine learning and deep learning applications for the automotive industry. Together with NVIDIA, Volkswagen will be admitting five startups to the support program from this fall.

Both partners will also be launching a “Summer of Code” camp where high-performing students with qualifications in IT, mathematics or physics will have an opportunity to develop deep learning methods in teams and to implement them in a robotics environment.

Go Back To > Automotive | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The AWS Masterclass on Artificial Intelligence by Olivier Klein

Just before we flew to Computex 2017, we attended the AWS Masterclass on Artificial Intelligence. It offered us an in-depth look at AI concepts like machine learning, deep learning and neural networks. We also saw how Amazon Web Services (AWS) uses all that to create easy-to-use tools for developers to create their own AI applications at low cost and virtually no capital outlay.

 

The AWS Masterclass on Artificial Intelligence

AWS Malaysia flew in Olivier Klein, the AWS Asia Pacific Solutions Architect, to conduct the AWS Masterclass. During the two-hour session, he conveyed the ease by which the various AWS services and tools allow virtually anyone to create their own AI applications at lower cost and virtually no capital outlay.

The topic on artificial intelligence is rather wide-ranging, covering from the basic AI concepts all the way to demonstrations on how to use AWS services like Amazon Polly and Amazon Rekognition to easily and quickly create AI applications. We present to you – the complete AWS Masterclass on Artificial Intelligence!

The AWS Masterclass on AI is actually made up of 5 main topics. Here is a summary of those topics :

Topic Duration Remark
AWS Cloud and An Introduction to Artificial Intelligence, Machine Learning, Deep Learning 15 minutes An overview on Amazon Web Services and the latest innovation in the data analytics, machine learning, deep learning and AI space.
The Road to Artificial Intelligence 20 minutes Demystifying AI concepts and related terminologies, as well as the underlying technologies.

Let’s dive deeper into the concepts of machine learning, deep learning models, such as the neural networks, and how this leads to artificial intelligence.

Connecting Things and Sensing the Real World 30 minutes As part of an AI that aligns with our physical world, we need to understand how Internet-of-Things (IoT) space helps to create natural interaction channels.

We will walk through real world examples and demonstration that include interactions with voice through Amazon Lex, Amazon Polly and the Alexa Voice Services, as well as understand visual recognitions with services such as Amazon Rekognition.

We will also bridge this with real-time data that is sensed from the physical world via AWS IoT.

Retrospective and Real-Time Data Analytics 30 minutes Every AI must continuously “learn” and be “trained”” through past performance and feedback data. Retrospective and real-time data analytics are crucial to building intelligence model.

We will dive into some of the new trends and concepts, which our customers are using to perform fast and cost-effective analytics on AWS.

In the next two pages, we will dissect the video and share with you the key points from each segment of this AWS Masterclass.

Next Page > Introduction To AWS Cloud & Artificial Intelligence, The Road To AI

[adrotate group=”1″]

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The AWS Masterclass on AI Key Points (Part 1)

Here is an exhaustive list of key takeaway points from the AWS Masterclass on Artificial Intelligence, with their individual timestamps in the video :

Introduction To AWS Cloud

  • AWS has 16 regions around the world (0:51), with two or more availability zones per region (1:37), and 76 edge locations (1:56) to accelerate end connectivity to AWS services.
  • AWS offers 90+ cloud services (3:45), all of which use the On-Demand Model (4:38) – you pay only for what you use, whether that’s a GB of storage or transfer, or execution time for a computational process.
  • You don’t even need to plan for your requirements or inform AWS how much capacity you need (5:05). Just use and pay what you need.
  • AWS has a practice of passing their cost savings to their customers (5:59), cutting prices 61 times since 2006.
  • AWS keeps adding new services over the years (6:19), with over a thousand new services introduced in 2016 (7:03).
[adrotate group=”1″]

Introduction to Artificial Intelligence, Machine Learning, Deep Learning

  • Artificial intelligence is based on unsupervised machine learning (7:45), specifically deep learning models.
  • Insurance companies like AON use it for actuarial calculations (7:59), and services like Netflix use it to generate recommendations (8:04).
  • A lot of AI models have been built specifically around natural language understanding, and using vision to interact with customers, as well as predicting and understanding customer behaviour (9:23).
  • Here is a quick look at what the AWS services management console looks like (9:58).
  • This is how you launch 10 compute instances (virtual servers) in AWS (11:40).
  • The ability to access multiple instances quickly is very useful for AI training (12:40), because it gives the user access to large amounts of computational power, which can be quickly terminated (13:10).
  • Machine learning, or specifically artificial intelligence, is not new to Amazon.com, the parent company of AWS (14:14).
  • Amazon.com uses a lot of AI models (14:34) for recommendations and demand forecasting.
  • The visual search feature in Amazon app uses visual recognition and AI models to identify a picture you take (15:33).
  • Olivier introduces Amazon Go (16:07), a prototype grocery store in Seattle.
[adrotate group=”1″]

The Road to Artificial Intelligence

  • The first component of any artificial intelligence is the “ability to sense the real world” (18:46), connecting everything together.
  • Cheaper bandwidth (19:26) now allows more devices to be connected to the cloud, allowing more data to be collected for the purpose of training AI models.
  • Cloud computing platforms like AWS allow the storage and processing of all that sensor data in real time (19:53).
  • All of that information can be used in deep learning models (20:14) to create an artificial intelligence that understands, in a natural way, what we are doing, and what we want or need.
  • Olivier shows how machine learning can quickly solve a Rubik’s cube (20:47), which has 43 quintillion unique combinations.
  • You can even build a Raspberry Pi-powered machine (24:33) that can solve a Rubik’s cube puzzle in 0.9 seconds.
  • Some of these deep learning models are available on Amazon AI (25:11), which is a combination of different services (25:44).
  • Olivier shows what it means to “train a deep learning model” (28:19) using a neural network (29:15).
  • Deep learning is computationally-intensive (30:39), but once it derives a model that works well, the predictive aspect is not computationally-intensive (30:52).
  • A pre-trained AI model can be loaded into a low-powered device (31:02), allowing it to perform AI functions without requiring large amounts of bandwidth or computational power.
  • Olivier demonstrates the YOLO (You Only Look Once) project, which pre-trained an AI model with pictures of objects (31:58), which allows it to detect objects in any video.
  • The identification of objects is the baseline for autonomous driving systems (34:19), as used by Tu Simple.
  • Tu Simple also used a similar model to train a drone to detect and follow a person (35:28).

Next Page > Sensing The Real World, Retrospective & Real-Time Analysis

[adrotate group=”1″]

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The AWS Masterclass on AI Key Points (Part 2)

Connecting Things and Sensing the Real World

  • Cloud services like AWS IoT (37:35) allow you to securely connect billions of IoT (Internet of Things) devices.
  • Olivier prefers to think of IoT as Intelligent Orchestrated Technology (37:52).
  • Olivier demonstrates how the combination of multiple data sources (maps, vehicle GPS, real-time weather reports) in Bangkok can be used to predict traffic as well as road conditions to create optimal routes (39:07), reducing traffic congestion by 30%.
  • The PetaBencana service in Jakarta uses picture recognition and IoT sensors to identify flooded roads (42:21) for better emergency response and disaster management.
  • Olivier demonstrates how easy it is to connect an IoT devices to the AWS IoT service (43:46), and use them to sense the environment and interact with.
  • Olivier shows how the capabilities of the Amazon Echo can be extended by creating an Alexa Skill using the AWS Lambda function (59:07).
  • Developers can create and publish Alexa Skills for sale in the Amazon marketplace (1:03:30).
  • Amazon Polly (1:04:10) renders life-like speech, while the Amazon Lex conversational engine (1:04:17) has natural language understanding and automatic speech recognition. Amazon Rekognition (1:04:29) performs image analysis.
  • Amazon Polly (1:04:50) turns text into life-like speech using deep learning to change the pitch and intonation according to the context. Olivier demonstrates Amazon Polly’s capabilities at 1:06:25.
  • Amazon Lex (1:11:06) is a web service that allows you to build conversational interfaces using natural language understanding (NLU) and automatic speech recognition (ASR) models like Alexa.
  • Amazon Lex does not just support spoken natural language understanding, it also recognises text (1:12:09), which makes it useful for chatbots.
  • Olivier demonstrates that text recognition capabilities in a chatbot demo (1:13:50) of a customer applying for a credit card through Facebook.
  • Amazon Rekognition (1:21:37) is an image recognition and analysis service, which uses deep learning to identify objects in pictures.
  • Amazon Rekognition can even detect facial landmarks and sentiments (1:22:41), as well as image quality and other attributes.
  • You can actually try Amazon Rekognition out (1:23:24) by uploading photos at CodeFor.Cloud/image.
[adrotate group=”1″]

Retrospective and Real-Time Data Analytics

  • AI is a combination of 3 types of data analytics (1:28:10) – retrospective analysis and reporting + real-time processing + predictions to enable smart apps.
  • Cloud computing is extremely useful for machine learning (1:29:57) because it allows you to decouple storage and compute requirements for much lower costs.
  • Amazon Athena (1:31:56) allows you to query data stored in Amazon S3, without creating a compute instance to do it. You only pay for the TB of data that is processed by that query.
  • Best of all, you will get the same fast results even if your data set grows (1:32:31), because Amazon Athena will automatically parallelise your queries across your data set internally.
  • Olivier demonstrates (1:33:14) how Amazon Athena can be used to run queries on data stored in Amazon S3, as well as generate reports using Amazon QuickSight.
  • When it comes to data analytics, cloud computing allows you to quickly bring massive computing power to bear, achieving much faster results without additional cost (1:41:40).
  • The insurance company AON used this ability (1:42:44) to reduce an actuarial simulation that would normally take 10 days, to just 10 minutes.
  • Amazon Kinesis and Amazon Kinesis Analytics (1:45:10) allows the processing of real-time data.
  • A company called Dash is using this capability to analyse OBD data in real-time (1:47:23) to help improve fuel efficiency and predict potential breakdowns. It also notifies emergency services in case of a crash.

Go Back To > First PageArticles | Home

[adrotate group=”1″]

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

AMD Vega Memory Architecture Q&A With Jeffrey Cheng

At the AMD Computex 2017 Press Conference, AMD President & CEO Dr. Lisa Su announced that AMD will launch the Radeon Vega Frontier Edition on 27 June 2017, and the Radeon RX Vega graphics cards at the end of July 2017. We figured this is a great time to revisit the new AMD Vega memory architecture.

Now, who better to tell us all about it than AMD Senior Fellow Jeffrey Cheng, who built the AMD Vega memory architecture? Check out this exclusive Q&A session from the AMD Tech Summit in Sonoma!

Updated @ 2017-06-11 : We clarified the difference between the AMD Vega’s 64-bit flat address space, and the 512 TB addressable memory. We also added new key points, and time stamps for the key points.

Originally posted @ 2017-02-04

Don’t forget to also check out the following AMD Vega-related articles :

 

The AMD Vega Memory Architecture

Jeffrey Cheng is an AMD Senior Fellow in the area of memory architecture. The AMD Vega memory architecture refers to how the AMD Vega GPU manages memory utilisation and handles large datasets. It does not deal with the AMD Vega memory hardware design, which includes the High Bandwidth Cache and HBM2 technology.

 

AMD Vega Memory Architecture Q&A Summary

Here are the key takeaway points from the Q&A session with Jeffrey Cheng :

  • Large amounts of DRAM can be used to handle big datasets, but this is not the best solution because DRAM is costly and consumes lots of power (see 2:54).
  • AMD chose to design a heterogenous memory architecture to support various memory technologies like HBM2 and even non-volatile memory (e.g. Radeon Solid State Graphics) (see 4:40 and 8:13).[adrotate group=”2″]
  • At any given moment, the amount of data processed by the GPU is limited, so it doesn’t make sense to store a large dataset in DRAM. It would be better to cache the data required by the GPU on very fast memory (e.g. HBM2), and intelligently move them according to the GPU’s requirements (see 5:40).
  • The AMD Vega’s heterogenous memory architecture allows for easy integration of future memory technologies like storage-class memory (flash memory that can be accessed in bytes, instead of blocks) (see 8:13).
  • The AMD Vega has a 64-bit flat address space for its shaders (see 12:0812:36 and 18:21), but like NVIDIA, AMD is (very likely) limiting the addressable memory to 49-bits, giving it 512 TB of addressable memory.
  • AMD Vega has full access to the CPU’s 48-bit address space, with additional bits beyond that used to handle its own internal memory, storage and registers (see 12:16). This ties back to the High Bandwidth Cache Controller and heterogenous memory architecture, which allows the use of different memory and storage types.

  • Game developers currently try to manage data and memory usage, often extremely conservatively to support graphics cards with limited amounts of graphics memory (see 16:29).
  • With the introduction of AMD Vega, AMD wants game developers to leave data and memory management to the GPU. Its High Bandwidth Cache Controller and heterogenous memory system will automatically handle it for them (see 17:19).
  • The memory architectural advantages of AMD Vega will initially have little impact on gaming performance (due to the current conservative approach of game developers). This will change when developers hand over data and memory management to the GPU. (see 24:42).[adrotate group=”2″]
  • The improved memory architecture in AMD Vega will mainly benefit AI applications (e.g. deep machine learning) with their large datasets (see 24:52).

Don’t forget to also check out the following AMD Vega-related articles :

Go Back To > Computer Hardware + Systems | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The Complete AMD Radeon Instinct Tech Briefing Rev. 3.0

The AMD Tech Summit held in Sonoma, California from December 7-9, 2016 was not only very exclusive, it was highly secretive. The first major announcement we have been allowed to reveal is the new AMD Radeon Instinct heterogenous computing platform.

In this article, you will hear from AMD what the Radeon Instinct platform is all about. As usual, we have a ton of videos from the event, so it will be as if you were there with us. Enjoy! 🙂

Originally published @ 2016-12-12

Updated @ 2017-01-11 : Two of the videos were edited to comply with the NDA. Now that the NDA on AMD Vega has been lifted, we replaced the two videos with their full, unedited versions. We also made other changes, including adding links to the other AMD Tech Summit articles.

Updated @ 2017-01-20 : Replaced an incorrect slide, and a video featuring that slide. Made other small updates to the article.

 

The AMD Radeon Instinct Platform Summarised

For those who want the quick low-down on AMD Radeon Instinct, here are the key takeaway points :

  • The AMD Radeon Instinct platform is made up of two components – hardware and software.
  • The hardware components are the AMD Radeon Instinct accelerators built around the current Polaris and the upcoming Vega GPUs.
  • The software component is the AMD Radeon Open Compute (ROCm) platform, which includes the new MIOpen open-source deep learning library.
  • The first three Radeon Instinct accelerator cards are the MI6, MI8 and MI25 Vega with NCU.
  • The AMD Radeon Instinct MI6 is a passively-cooled inference accelerator with 5.7 TFLOPS of FP16 processing power, 224 GB/s of memory bandwidth, and a TDP of <150 W. It will come with 16 GB of GDDR5 memory.
  • The AMD Radeon Instinct MI8 is a small form-factor (SFF) accelerator with 8.2 TFLOPS of processing power, 512 GB/s of memory bandwidth, and a TDP of <175 W. It will come with 4 GB of HBM memory.
  • The AMD Radeon Instinct MI25 Vega with NCU is a passively-cooled training accelerator with 25 TFLOPS of processing power, support for 2X packed math, a High Bandwidth Cache and Controller, and a TDP of <300 W.
  • The Radeon Instinct accelerators will all be built exclusively by AMD.
  • The Radeon Instinct accelerators will all support MxGPU SRIOV hardware virtualisation.
  • The Radeon Instinct accelerators are all passively cooled.
  • The Radeon Instinct accelerators will all have large BAR (Base Address Register) support for multiple GPUs.
  • The upcoming AMD Zen “Naples” server platform is designed to supported multiple Radeon Instinct accelerators through a high-speed network fabric.
  • The ROCm platform is not only open source, it will support a multitude of standards in addition to MIOpen.
  • The MIOpen deep learning library is open source, and will be available in Q1 2017.
  • The MIOpen deep learning library is optimised for Radeon Instinct, allowing for 3X better performance in machine learning.
  • AMD Radeon Instinct accelerators will be significantly faster than NVIDIA Titan X GPUs based on the Maxwell and Pascal architectures.

In the subsequent pages, we will give you the full low-down on the Radeon Instinct platform, with the following presentations by AMD :

[adrotate banner=”4″]

We also prepared the complete video and slides of the Radeon Instinct tech briefing for your perusal :

Next Page > Heterogenous Computing, The Radeon Instinct Accelerators, MIOpen, Performance

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Why Is Heterogenous Computing Important?

Dr. Lisa Su, kicked things off with an inside look at her two-year long journey as AMD President and CEO. Then she revealed why Heterogenous Computing is an important part of AMD’s future going forward. She also mentioned the success of the recently-released Radeon Software Crimson ReLive Edition.

 

Here Are The New AMD Radeon Instinct Accelerators!

Next, Raja Koduri, Senior Vice President and Chief Architect of the Radeon Technologies Group, officially revealed the new AMD Radeon Instinct accelerators.

 

The MIOpen Deep Learning Library For Radeon Instinct

MIOpen is a new deep learning library optimised for Radeon Instinct. It is open source and will become part of the Radeon Open Compute (ROCm) platform. It will be available in Q1 2017.

[adrotate banner=”5″]

 

The Performance Advantage Of Radeon Instinct & MIOpen

MIOpen is optimised for Radeon Instinct, offering 3X better performance in machine learning. It allows the Radeon Instinct accelerators to be significantly faster than NVIDIA Titan X GPUs based on the Maxwell and Pascal architectures.

Next Page > Radeon Instinct MI25 & MI8 Demos, Zen “Naples” Platform, The First Servers, ROCm Discussion

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The Radeon Instinct MI25 Training Demonstration

Raja Koduri roped in Ben Sander, Senior Fellow at AMD, to show off the Radeon Instinct MI25 running a training demo.

 

The Radeon Instinct MI8 Visual Inference Demonstration

The visual inference demo is probably much easier to grasp, as it is visual in nature. AMD used the Radeon Instinct MI8 in this example.

 

The Radeon Instinct On The Zen “Naples” Platform

The upcoming AMD Zen “Naples” server platform is designed to supported multiple AMD Radeon Instinct accelerators through a high-speed network fabric.

[adrotate banner=”5″]

 

The First Radeon Instinct Servers

This is not a vapourware launch. Raja Koduri revealed the first slew of Radeon Instinct servers that will hit the market in H1 2017.

 

The Radeon Open Compute (ROCm) Platform Discussion

To illustrate the importance of heterogenous computing on Radeon Instinct, Greg Stoner (ROCm Senior Director at AMD), hosted a panel of AMD partners and early adopters in using the Radeon Open Compute (ROCm) platform.

Next Page > Closing Remarks On Radeon Instinct, The Complete Radeon Instinct Tech Briefing Video & Slides

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Closing Remarks On Radeon Instinct

Finally, Raja Koduri concluded the launch of the Radeon Instinct Initiative with some closing remarks on the recent Radeon Software Crimson ReLive Edition.

 

The Complete AMD Radeon Instinct Tech Briefing

This is the complete AMD Radeon Instinct tech briefing. Our earlier video was edited to comply with the AMD Vega NDA (which has now expired).

[adrotate banner=”5″]

 

The Complete AMD Radeon Instinct Tech Briefing Slides

Here are the Radeon Instinct presentation slides for your perusal.

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

NVIDIA DGX-1 Deep Learning Supercomputer Launched

April 6, 2016 — NVIDIA today unveiled the NVIDIA DGX-1, the world’s first deep learning supercomputer to meet the unlimited computing demands of artificial intelligence.

The NVIDIA DGX-1 is the first system designed specifically for deep learning — it comes fully integrated with hardware, deep learning software and development tools for quick, easy deployment. It is a turnkey system that contains a new generation of GPU accelerators, delivering the equivalent throughput of 250 x86 servers.

The NVIDIA DGX-1 deep learning system enables researchers and data scientists to easily harness the power of GPU-accelerated computing to create a new class of intelligent machines that learn, see and perceive the world as humans do. It delivers unprecedented levels of computing power to drive next-generation AI applications, allowing researchers to dramatically reduce the time to train larger, more sophisticated deep neural networks.

NVIDIA designed the DGX-1 for a new computing model to power the AI revolution that is sweeping across science, enterprises and increasingly all aspects of daily life. Powerful deep neural networks are driving a new kind of software created with massive amounts of data, which require considerably higher levels of computational performance.

“Artificial intelligence is the most far-reaching technological advancement in our lifetime,” said Jen-Hsun Huang, CEO and co-founder of NVIDIA. “It changes every industry, every company, everything. It will open up markets to benefit everyone. Data scientists and AI researchers today spend far too much time on home-brewed high performance computing solutions. The DGX-1 is easy to deploy and was created for one purpose: to unlock the powers of superhuman capabilities and apply them to problems that were once unsolvable.”

 

Powered by Five Breakthroughs

The NVIDIA DGX-1 deep learning system is built on NVIDIA Tesla P100 GPUs, based on the new NVIDIA Pascal GPU architecture. It provides the throughput of 250 CPU-based servers, networking, cables and racks — all in a single box.

The DGX-1 features four other breakthrough technologies that maximise performance and ease of use. These include the NVIDIA NVLink high-speed interconnect for maximum application scalability; 16nm FinFET fabrication technology for unprecedented energy efficiency; Chip on Wafer on Substrate with HBM2 for big data workloads; and new half-precision instructions to deliver more than 21 teraflops of peak performance for deep learning.

Together, these major technological advancements enable DGX-1 systems equipped with Tesla P100 GPUs to deliver over 12x faster training than four-way NVIDIA Maxwell architecturebased solutions from just one year ago.

[adrotate group=”2″]

The Pascal architecture has strong support from the artificial intelligence ecosystem.

“NVIDIA GPU is accelerating progress in AI. As neural nets become larger and larger, we not only need faster GPUs with larger and faster memory, but also much faster GPU-to-GPU communication, as well as hardware that can take advantage of reduced-precision arithmetic. This is precisely what Pascal delivers,” said Yann LeCun, director of AI Research at Facebook.

Andrew Ng, chief scientist at Baidu, said: “AI computers are like space rockets: The bigger the better. Pascal’s throughput and interconnect will make the biggest rocket we’ve seen yet.” NVIDIA Launches World’s First Deep Learning Supercomputer

“Microsoft is developing super deep neural networks that are more than 1,000 layers,” said Xuedong Huang, chief speech scientist at Microsoft Research. “NVIDIA Tesla P100’s impressive horsepower will enable Microsoft’s CNTK to accelerate AI breakthroughs.”

 

Comprehensive Deep Learning Software Suite

The NVIDIA DGX-1 system includes a complete suite of optimised deep learning software that allows researchers and data scientists to quickly and easily train deep neural networks. The DGX-1 software includes the NVIDIA Deep Learning GPU Training System (DIGITS), a complete, interactive system for designing deep neural networks (DNNs).

It also includes the newly released NVIDIA CUDA Deep Neural Network library (cuDNN) version 5, a GPUaccelerated library of primitives for designing DNNs. It also includes optimised versions of several widely used deep learning frameworks — Caffe, Theano and Torch. The DGX-1 additionally provides access to cloud management tools, software updates and a repository for containerised applications.

 

NVIDIA DGX-1 Specifications

[adrotate group=”2″]
  • Up to 170 teraflops of half-precision (FP16) peak performance
  • Eight Tesla P100 GPU accelerators, 16GB memory per GPU
  • NVLink Hybrid Mesh Cube
  • 7TB SSD DL Cache
  • Dual 10GbE, Quad InfiniBand 100Gb networking
  • 3U – 3200W

Optional support services for the NVIDIA DGX-1 improve productivity and reduce downtime for production systems. Hardware and software support provides access to NVIDIA deep learning expertise, and includes cloud management services, software upgrades and updates, and priority resolution of critical issues.

 

NVIDIA DGX-1 Availability

General availability for the NVIDIA DGX-1 deep learning system in the United States is in June, and in other regions beginning in the third quarter direct from NVIDIA and select systems integrators.

Go Back To > Enterprise | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

NVIDIA SDK Receives Major Update

by Greg Estes, NVIDIA

While NVIDIA is best known for our hardware platforms, our software plays a key role advancing the state of the art of GPU-accelerated computing.

This body of work — the NVIDIA SDK — today got a significant update, announced at our annual GPU Technology Conference. It takes advantage of our new Pascal architecture and makes it easier than ever for developers to create great solutions on our platforms.

Our goal is to make more of our software capabilities available to even more developers. Over a million developers have already downloaded our CUDA toolkit, and there are more than 400 GPU-accelerated applications that benefit from our software libraries, in addition to hundreds more game titles.

Here’s a look at the software updates we’re introducing in seven key areas:

 

1) Deep Learning

What’s new — cuDNN 5, our GPU-accelerated library of primitives for deep neural networks, now includes Pascal GPU support; acceleration of recurrent neural networks, which are used for video and other sequential data; and additional enhancements used in medical, oil & gas and other industries.

Why it matters — Deep learning developers rely on cuDNN’s optimized routines so they can focus on designing and training neural network models, rather than low-level performance tuning. cuDNN accelerates leading deep learning frameworks like Google TensorFlow, UC Berkeley’s Caffe, University of Montreal’s Theano and NYU’s Torch. These, in turn, power deep learning solutions used by Amazon, Facebook, Google and others.

 

2) Accelerated Computing

What’s new — CUDA 8, the latest version of our parallel computing platform, gives developers direct access to powerful new Pascal features such as unified memory and NVLink. Also included in this release is a new graph analytics library — nvGRAPH — which can be used for robotic path planning, cyber security and logistics analysis, expanding the application of GPU acceleration in the realm of big data analytics.

One new feature developers will appreciate is critical path analysis, which automatically identifies latent bottlenecks in code for CPUs and GPUs. And for visualizing volume and surface datasets, NVIDIA IndeX 1.4 is now available as a plug-in for Kitware ParaView, bringing interactive visualization of large volumes with high-quality rendering to ParaView users.

Why it matters — CUDA has been called “the backbone of GPU computing.” We’ve sold millions of CUDA-enabled GPUs to date. As a result, many of the most important scientific applications are based on CUDA, and CUDA has played a role in major discoveries, such as understanding how HIV protects its genetic materials using a protein shell, and unraveling the mysteries of the human genome by discovering 3D loops and other genetic folding patterns.

 

3) Self-Driving Cars

What’s new — At GTC, we also announced our end-to-end HD mapping solution for self-driving cars (see “How HD Maps Will Show Self-Driving Cars the Way”). We built this state-of-the-art system on our DriveWorks software development kit, part of our deep learning platform for the automotive industry.

Why it matters — Incorporating perception, localization, planning and visualization algorithms, DriveWorks provides libraries, tools and reference applications for automakers, tier 1 suppliers and startups developing autonomous vehicle computing pipelines. DriveWorks now includes an end-to-end HD mapping solution, making it easier and faster to create and update highly detailed maps. Along with NVIDIA DIGITS and NVIDIA DRIVENET, these technologies will make driving safer, more efficient and more enjoyable.

[adrotate banner=”5″]

 

4) Design Visualization

What’s new — At GTC, we’ve brought NVIDIA Iray — our photorealistic rendering solution — to the world of VR with the introduction of new cameras within Iray that let users create VR panoramas and view their creations with unprecedented accuracy in virtual reality (see “NVIDIA Brings Interactive Photorealism to VR with Iray”). We also announced Adobe’s support of NVIDIA’s Materials Definition Language, bringing the possibility of physically based materials to a wide range of creative professionals.

Why it matters — NVIDIA Iray is used in a wide array of industries to give designers the ability to create photorealistic models of their work quickly and to speed their products to market. We’ve licensed it to leading software manufacturers such as Dassault Systèmes and Siemens PLM. Iray is also available from NVIDIA as a plug-in for popular software like Autodesk 3ds Max and Maya.

 

5) Autonomous Machines

What’s new — We’re bringing deep learning capabilities to devices that will interact with — and learn from — the environment around them. Our cuDNN version 5, noted above, improves deep learning inference performance for common deep neural networks, allowing embedded devices to make decisions faster and work with higher resolution sensors. NVIDIA GPU Inference Engine (GIE) is a high-performance neural network inference solution for application deployment. Developers can use GIE to generate optimized implementations of trained neural network models that deliver the fastest inference performance on NVIDIA GPUs.

Why it matters — Robots, drones, submersibles and other intelligent devices require autonomous capabilities. The Jetpack SDK — which powers the Jetson TX1 Developer Kit — includes libraries and APIs for advanced computer vision and deep learning, enabling developers to build extraordinarily capable autonomous machines that can see, understand and even interact with their environments.

 

6) Gaming

What’s new — We recently announced three new technologies for NVIDIA GameWorks, our combination of development tools, sample code and advanced libraries for real-time graphics and simulation for games. They include Volumetric Lighting, Voxel-based Ambient Occlusion and Hybrid Frustum Traced Shadows.

Why it matters — Developers are already using these new libraries for AAA game titles like Fallout 4. And GameWorks technology is in many of the major game engines, such as Unreal Engine, Unity and Stingray, which are also increasingly being used for non-gaming applications like architectural walk-throughs, training and even automotive design.

 

7) Virtual Reality

What’s new — We’re continuing to add features to VRWorks — our suite of APIs, sample code and libraries for VR developers. For example, Multi-Res Shading accelerates performance by up to 50 percent by rendering each part of an image at a resolution that better matches the pixel density of the warped VR image. VRWorks Direct Mode treats VR headsets as head-mounted displays accessible only to VR applications, rather than a normal Windows monitor in desktop mode.

Why it matters — VRWorks helps headset and application developers achieve the highest performance, lowest latency and plug-and-play compatibility. You can see how developers are using what VRWorks has to offer at GTC, where we’re demonstrating these new technologies with partners such as Sólfar Studios (Everest VR), Fusion Studios (Mars 2030), Oculus and HTC.

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

NVIDIA : Accelerating Artificial Intelligence With GPUs

by Jen-Hsun Huang

 

The Big Bang

For as long as we have been designing computers, AI has been the final frontier. Building intelligent machines that can perceive the world as we do, understand our language, and learn from examples has been the life’s work of computer scientists for over five decades. Yet, it took the combination of Yann LeCun’s Convolutional Neural Net, Geoff Hinton’s back-propagation and Stochastic Gradient Descent approach to training, and Andrew Ng’s large-scale use of GPUs to accelerate Deep Neural Networks (DNNs) to ignite the big bang of modern AI — deep learning.

At the time, NVIDIA was busy advancing GPU-accelerated computing, a new computing model that uses massively parallel graphics processors to accelerate applications also parallel in nature.  Scientists and researchers jumped on to GPUs to do molecular-scale simulations to determine the effectiveness of a life-saving drug, to visualize our organs in 3D (reconstructed from light doses of a CT scan), or to do galactic-scale simulations to discover the laws that govern our universe. One researcher, using our GPUs for quantum chromodynamics simulations, said to me: “Because of NVIDIA’s work, I can now do my life’s work, in my lifetime.” This is wonderfully rewarding. It has always been our mission to give people the power to make a better future. NVIDIA GPUs have democratized supercomputing and researchers have now discovered that power.

Photo credit : Virtual Desktop

In 2011, AI researchers discovered NVIDIA GPUs. The Google Brain project had just achieved amazing results — it learned to recognize cats and people by watching movies on YouTube. But it required 2,000 CPUs in servers powered and cooled in one of Google’s giant data centers. Few have computers of this scale. Enter NVIDIA and the GPU. Bryan Catanzaro in NVIDIA Research teamed with Andrew Ng’s team at Stanford to use GPUs for deep learning. As it turned out, 12 NVIDIA GPUs could deliver the deep-learning performance of 2,000 CPUs. Researchers at NYU, the University of Toronto, and the Swiss AI Lab accelerated their DNNs on GPUs. Then, the fireworks started.

 

Deep Learning Performs Miracles

Alex Krizhevsky of the University of Toronto won the 2012 ImageNet computer image recognition competition. Krizhevsky beat — by a huge margin — handcrafted software written by computer vision experts. Krizhevsky and his team wrote no computer vision code. Rather, using deep learning, their computer learned to recognize images by itself. They designed a neural network called AlexNet and trained it with a million example images that required trillions of math operations on NVIDIA GPUs. Krizhevksy’s AlexNet had beaten the best human-coded software.

The AI race was on. By 2015, another major milestone was reached.

Using deep learning, Google and Microsoft both beat the best human score in the ImageNet challenge. Not a human-written program, but a human. Shortly thereafter, Microsoft and the China University of Science and Technology announced a DNN that achieved IQ test scores at the college post-graduate level.

Then Baidu announced that a deep learning system called Deep Speech 2 had learned both English and Mandarin with a single algorithm. And all top results of the 2015 ImageNet competition were based on deep learning, running on GPU-accelerated deep neural networks, and many beating human-level accuracy.

In 2012, deep learning had beaten human-coded software. By 2015, deep learning had achieved “superhuman” levels of perception.

 

A New Computing Platform for a New Software Model

Computer programs contain commands that are largely executed sequentially. Deep learning is a fundamentally new software model where billions of software-neurons and trillions of connections are trained, in parallel.

Running DNN algorithms and learning from examples, the computer is essentially writing its own software. This radically different software model needs a new computer platform to run efficiently. Accelerated computing is an ideal approach and the GPU is the ideal processor.

As Nature recently noted, early progress in deep learning was “made possible by the advent of fast graphics processing units (GPUs) that were convenient to program and allowed researchers to train networks 10 or 20 times faster.”

A combination of factors is essential to create a new computing platform — performance, programming productivity, and open accessibility.

Performance. NVIDIA GPUs are naturally great at parallel workloads and speed up DNNs by 10-20x, reducing each of the many training iterations from weeks to days. We didn’t stop there. By collaborating with AI developers, we continued to improve our GPU designs, system architecture, compilers, and algorithms, and sped up training deep neural networks by 50x in just three years — a much faster pace than Moore’s Law. We expect another 10x boost in the next few years.

Programmability. AI innovation is on a breakneck pace. Ease of programming and developer productivity are paramount. The programmability and richness of NVIDIA’s CUDA platform allow researchers to innovate quickly — building new configurations of CNNs, DNNs, deep inception networks, RNNs, LSTMs, and reinforcement learning networks.

Accessibility. Developers want to create anywhere and deploy everywhere. NVIDIA GPUs are available all over the world, from every PC OEM; in desktops, notebooks, servers, or supercomputers; and in the cloud from Amazon, IBM, and Microsoft. All major AI development frameworks are NVIDIA GPU accelerated — from internet companies, to research, to startups. No matter the AI development system preferred, it will be faster with GPU acceleration.

We have also created GPUs for just about every computing form-factor so that DNNs can power intelligent machines of all kinds. GeForce is for PC.  Tesla is for cloud and supercomputers. Jetson is for robots and drones. And DRIVE PX is for cars. All share the same architecture and accelerate deep learning.

 

Every Industry Wants Intelligence

Baidu, Google, Facebook, Microsoft were the first adopters of NVIDIA GPUs for deep learning. This AI technology is how they respond to your spoken word, translate speech or text to another language, recognize and automatically tag images, and recommend newsfeeds, entertainment, and products that are tailored to what each of us likes and cares about.

Startups and established companies are now racing to use AI to create new products and services, or improve their operations. In just two years, the number of companies NVIDIA collaborates with on deep learning has jumped nearly 35x to over 3,400 companies.

Industries such as healthcare, life sciences, energy, financial services, automotive, manufacturing, and entertainment will benefit by inferring insight from mountains of data. And, with Facebook, Google, and Microsoft opening their deep-learning platforms for all to use, AI-powered applications will spread fast. In light of this trend, Wired recently heralded the “rise of the GPU.”

Self-driving cars. Whether to augment humans with a superhuman co-pilot, or revolutionize personal mobility services, or reduce the need for sprawling parking lots within cities, self-driving cars have the potential to do amazing social good. Driving is complicated. Unexpected things happen. Freezing rain turns the road into a skating rink. The road to your destination is closed. A child runs out in front of the car.

You can’t write software that anticipates every possible scenario a self-driving car might encounter. That’s the value of deep learning; it can learn, adapt, and improve. We are building an end-to-end deep learning platform called NVIDIA DRIVE PX for self-driving cars — from the training system to the in-car AI computer. The results are very exciting.  A future with superhuman computer co-pilots and driverless shuttles is no longer science fiction.

Robots. FANUCa leading manufacturing robot maker, recently demonstrated an assembly-line robot that learned to “pick” randomly oriented objects out of a bin. The GPU-powered robot learned by trial and error. This deep-learning technology was developed by Preferred Networks, which was recently featured in a The Wall Street Journal article headlined, “Japan Seeks Tech Revival with Artificial Intelligence.”

Healthcare and Life Sciences. Deep Genomics is applying GPU-based deep learning to understand how genetic variations can lead to disease. Arterys uses GPU-powered deep learning to speed analysis of medical images. Its technology will be deployed in GE Healthcare MRI machines to help diagnose heart disease. Enlitic is using deep learning to analyze medical images to identify tumors, nearly invisible fractures, and other medical conditions.

These are just a handful of examples. There are literally thousands.

 

Accelerating AI with GPUs: A New Computing Model

[adrotate banner=”4″]

Deep-learning breakthroughs have sparked the AI revolution. Machines powered by AI deep neural networks solve problems too complex for human coders. They learn from data and improve with use. The same DNN can be trained by even non-programmers to solve new problems. Progress is exponential. Adoption is exponential.

And we believe the impact to society will also be exponential. A recent study by KPMG predicts that computerized driver assistance technologies will help reduce car accidents 80% in 20 years — that’s nearly 1 million lives a year saved. Deep-learning AI will be its cornerstone technology.

The impact to the computer industry will also be exponential. Deep learning is a fundamentally new software model. So we need a new computer platform to run it — an architecture that can efficiently execute programmer-coded commands as well as the massively parallel training of deep neural networks. We are betting that GPU-accelerated computing is the horse to ride. Popular Science recently called the GPU “the workhorse of modern A.I.” We agree.

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Automotive Innovators Motoring to NVIDIA DRIVE

Written by Danny Shapiro, NVIDIA

Automotive Innovators Motoring to NVIDIA DRIVE

Audi. BMW. Ford. Mercedes-Benz. Volvo. Some of the world’s biggest automotive names are flocking to DRIVE, our powerful engine for in-vehicle artificial intelligence.

So are a group of fast-moving, smaller innovators that are shaking up the auto industry. Companies such as ZMP, Preferred Networks and AdasWorks are using DRIVE PX to give automobiles astonishing new capabilities.

Unveiled Monday at CES 2016, in Las Vegas, DRIVE PX 2 provides supercomputer-class performance — up to 24 trillion operations per second for artificial intelligence applications — in a case the size of a shoebox.

Here’s a look at just three of the companies working with DRIVE PX:

 

Bringing Autonomous Driving to Taxis

Tokyo-based ZMP — which is working to help create autonomous taxis, among other projects — is using deep learning technology and NVIDIA DRIVE PX to dramatically improve accuracy of detection and decision-making algorithms for autonomous driving.

“ZMP is achieving remarkable results using deep neural networks on NVIDIA GPUs for pedestrian detection,” said Hisashi Taniguchi, CEO of ZMP. “We will expand our use of deep learning on NVIDIA GPUs to realize our driverless Robot Taxi service.”

 

In Gear with Toyota

[adrotate group=”2″]

Preferred Networks is one of the best-known machine learning startups in Japan. The Tokyo-based company is working closely with Toyota — which purchased a 3% stake in Preferred Networks just a few weeks ago — to give cars autonomous driving capabilities.

With the NVIDIA deep learning platform, Preferred Networks has greatly improved performance on a variety of applications, such as image recognition for automotive and surveillance cameras, automated control of robotics, and health diagnostics, according to Preferred Networks founder Daisuke Okanohara.

“The remarkable thing is that we did it all with a single NVIDIA GPU-powered deep neural network, in a very short time,” Okanohara said.

 

Eyes on the Road

We’re also working with AdasWorks, a Budapest-based developer of artificial intelligence-based software for automated driving, to bring the power of our GPUs to Volvo Cars.

olvo will use the NVIDIA DRIVE PX 2 deep learning-based computing platform to power a fleet of 100 Volvo XC90 SUVs that will hit public roads next year, driven by actual customers as part of the the Swedish carmaker’s Drive Me autonomous-car pilot program.

AdasWorks worked with Volvo to help create a system that processes data from multiple sensors in real time to provide 360-degree detection of lanes, vehicles, pedestrians, signs and more, enabling a variety of autopilot functions.

NVIDIA DRIVE is more than just a component automakers can bolt into their cars. It’s an end-to-end solution for deep learning that includes a wide variety of tools and technologies, such as our DIGITS software for neural network training.

To see how it all comes together, visit our booth at CES. We’re in the North Hall, right in the middle of this year’s automotive action.

Go Back To > Automotive | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Volvo XC90 To Use NVIDIA DRIVE PX 2 Computer

Jan. 5, 2016—Volvo Cars will use the NVIDIA DRIVE™ PX 2 deep learning- based computing engine to power a fleet of 100 Volvo XC90 SUVs starting to hit the road next year in the Swedish carmaker’s Drive Me autonomous car pilot programme, NVIDIA announced today.

Autonomous technology is an important contributor to Volvo’s Vision 2020 – its guiding principles for creating safer vehicles. This work has resulted in world-leading advancements in autonomous and semi-autonomous driving, and a new safety benchmark for the automotive industry.

“Our vision is that no one should be killed or seriously injured in a new Volvo by the year 2020,” said Marcus Rothoff, director of the Autonomous Driving Programme at Volvo Cars. “NVIDIA’s high-performance and responsive automotive platform is an important step towards our vision and perfect for our autonomous drive programme and the Drive Me project.”

 

The Volvo XC90 Drive Me Project

Volvo’s Drive Me autonomous pilot programme will equip the Volvo XC90 luxury cars with the NVIDIA DRIVE PX 2 engine, which uses deep learning to navigate the complexities of driving. The cars will operate autonomously on roads around Gothenburg, the carmaker’s hometown, and semi-autonomously elsewhere.

“Volvo’s Drive Me project is the ideal application of our DRIVE PX 2 engine and deep learning,” said Rob Csongor, vice president and general manager of Automotive at NVIDIA. “We are bringing years of work by thousands of NVIDIA engineers to help Volvo achieve its safety goals and move self-driving cars from Gothenburg to the rest of the globe.”

 

Recognising Objects Beyond Reach of Human Algorithms

The NVIDIA DRIVE PX 2 engine enables cars to utilise deep learning – a form of artificial intelligence – to recognise objects in their environment, anticipate potential threats and navigate safely. With 8 teraflops of processing power – equivalent to 250 MacBook Pros – it processes data from multiple sensors in real time, providing 360-degree detection of lanes, vehicles, pedestrians, signs and more, to enable a variety of autopilot functions.

Recent deep-learning breakthroughs have greatly enhanced computers’ ability to perceive the outside world. Using vast amounts of data and processing power, they can write software to recognise complex objects at a level beyond the reach of human-coded algorithms.

Much deep learning work is powered by NVIDIA’s supercomputing GPUs. For example, Microsoft and Google have used GPUs to create image-recognition systems that beat a well-trained human in the ImageNet Large Scale Visual Recognition Challenge. And Microsoft researchers recently trained a deep neural net that beat a human in IQ tests.

 

Map Localisation and Path Planning

[adrotate group=”2″]

For map localisation and path planning, the system can compare real-time situational awareness with a known high-definition map, enabling it to plan a safe route and drive precisely along it, adjusting to ever-changing circumstances.

DRIVE PX 2 will also perform other critical functions such as stitching camera inputs to create a complete surround-view of the car.

Because self-driving cars require massive computing resources to interpret the data from multiple sensors, most early prototypes have contained a trunk full of computers. In contrast, DRIVE PX 2, which carries out the same functions, is the size of a tablet.

Go Back To > Automotive | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

NVIDIA DRIVE PX 2 AI Computer For Cars Launched

Jan. 5, 2016—Accelerating the race to autonomous cars, NVIDIA today launched NVIDIA DRIVE PX 2 – the world’s most powerful engine for in-vehicle artificial intelligence.

NVIDIA DRIVE PX 2 allows the automotive industry to use artificial intelligence to tackle the complexities inherent in autonomous driving. It utilises deep learning on NVIDIA’s most advanced GPUs for 360-degree situational awareness around the car, to determine precisely where the car is and to compute a safe, comfortable trajectory.

“Drivers deal with an infinitely complex world,” said Jen-Hsun Huang, co-founder and CEO, NVIDIA. “Modern artificial intelligence and GPU breakthroughs enable us to finally tackle the daunting challenges of self-driving cars.

“NVIDIA’s GPU is central to advances in deep learning and supercomputing. We are leveraging these to create the brain of future autonomous vehicles that will be continuously alert, and eventually achieve superhuman levels of situational awareness. Autonomous cars will bring increased safety, new convenient mobility services and even beautiful urban designs – providing a powerful force for a better future.”

 

NVIDIA DRIVE PX 2 Deep Learning

Created to address the needs of NVIDIA’s automotive partners for an open development platform, DRIVE PX 2 provides unprecedented amounts of processing power for deep learning, equivalent to that of 100 MacBook Pros.

Its two next-generation Tegra® processors plus two next-generation discrete GPUs, based on the Pascal™ architecture, deliver up to 24 trillion deep learning operations per second, which are specialised instructions that accelerate the math used in deep learning network inference. That’s over 10 times more computational horsepower than the previous-generation product.

DRIVE PX 2’s deep learning capabilities enable it to quickly learn how to address the challenges of everyday driving, such as unexpected road debris, erratic drivers and construction zones. Deep learning also addresses numerous problem areas where traditional computer vision techniques are insufficient – such as poor weather conditions like rain, snow and fog, and difficult lighting conditions like sunrise, sunset and extreme darkness.

For general purpose floating point operations, DRIVE PX 2’s multi-precision GPU architecture is capable of up to eight trillion operations per second. That’s over four times more than the previous-generation product. This enables partners to address the full breadth of autonomous driving algorithms, including sensor fusion, localisation and path planning. It also provides high precision compute when needed for layers of deep learning networks.

 

Deep Learning in Self-Driving Cars

[adrotate group=”2″]

Self-driving cars use a broad spectrum of sensors to understand their surroundings. DRIVE PX 2 can process the inputs of 12 video cameras, plus lidar, radar and ultrasonic sensors. It fuses them to accurately detect objects, identify them, determine where the car is relative to the world around it, and then calculate its optimal path for safe travel.

This complex work is facilitated by NVIDIA DriveWorks™, a suite of software tools, libraries and modules that accelerates development and testing of autonomous vehicles. DriveWorks enables sensor calibration, acquisition of surround data, synchronisation, recording and then processing streams of sensor data through a complex pipeline of algorithms running on all of the DRIVE PX 2’s specialised and general-purpose processors.

Software modules are included for every aspect of the autonomous driving pipeline, from object detection, classification and segmentation to map localisation and path planning.

 

End-to-End Solution for Deep Learning

NVIDIA delivers an end-to-end solution – consisting of NVIDIA DIGITS™ and DRIVE PX 2 – for both training a deep neural network, as well as deploying the output of that network in a car.

DIGITS is a tool for developing, training and visualising deep neural networks that can run on any NVIDIA GPU-based system – from PCs and supercomputers to Amazon Web Services and the recently announced Facebook Big Sur Open Rack-compatible hardware. The trained neural net model runs on NVIDIA DRIVE PX 2 within the car.

 

Strong Market Adoption

Since NVIDIA delivered the first-generation DRIVE PX last summer, more than 50 automakers, tier 1 suppliers, developers and research institutions have adopted NVIDIA’s AI platform for autonomous driving development. They are praising its performance, capabilities and ease of development.

“Using NVIDIA’s DIGITS deep learning platform, in less than four hours we achieved over 96 percent accuracy using Ruhr University Bochum’s traffic sign database. While others invested years of development to achieve similar levels of perception with classical computer vision algorithms, we have been able to do it at the speed of light.” — Matthias Rudolph, director of Architecture Driver Assistance Systems at Audi

“BMW is exploring the use of deep learning for a wide range of automotive use cases, from autonomous driving to quality inspection in manufacturing. The ability to rapidly train deep neural networks on vast amounts of data is critical. Using an NVIDIA GPU cluster equipped with NVIDIA DIGITS, we are achieving excellent results.” — Uwe Higgen, head of BMW Group Technology Office USA

“Due to deep learning, we brought the vehicle’s environment perception a significant step closer to human performance and exceed the performance of classic computer vision.” — Ralf G. Herrtwich, director of Vehicle Automation at Daimler

“Deep learning on NVIDIA DIGITS has allowed for a 30X enhancement in training pedestrian detection algorithms, which are being further tested and developed as we move them onto NVIDIA DRIVE PX.” — Dragos Maciuca, technical director of Ford Research and Innovation Center

 

NVIDIA DRIVE PX 2 Availability

The DRIVE PX 2 development engine will be generally available in the fourth quarter of 2016. Availability to early access development partners will be in the second quarter.

Go Back To > Automotive | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

NVIDIA GPUs Power Facebook’s Deep Machine Learning

Dec. 11, 2015—NVIDIA today announced that Facebook will power its next-generation computing system with the NVIDIA® Tesla® Accelerated Computing Platform, enabling it to drive a broad range of machine learning applications.

While training complex deep neural networks to conduct machine learning can take days or weeks on even the fastest computers, the Tesla platform can slash this by 10-20x. As a result, developers can innovate more quickly and train networks that are more sophisticated, delivering improved capabilities to consumers.

Facebook is the first company to adopt NVIDIA Tesla M40 GPU accelerators, introduced last month, to train deep neural networks. They will play a key role in the new “Big Sur” computing platform, Facebook AI Research’s (FAIR) purpose-built system designed specifically for neural network training.

“Deep learning has started a new era in computing,” said Ian Buck, vice president of accelerated computing at NVIDIA. “Enabled by big data and powerful GPUs, deep learning algorithms can solve problems never possible before. Huge industries from web services and retail to healthcare and cars will be revolutionised. We are thrilled that NVIDIA GPUs have been adopted as the engine of deep learning. Our goal is to provide researchers and companies with the most productive platform to advance this exciting work.”

In addition to reducing neural network training time, GPUs offer a number of other advantages. Their architectural compatibility from generation to generation provides seamless speed-ups for future GPU upgrades. And the Tesla platform’s growing global adoption facilitates open collaboration with researchers around the world, fueling new waves of discovery and innovation in the machine learning field.

 

Big Sur Optimised for Machine Learning

NVIDIA worked with Facebook engineers on the design of Big Sur, optimising it to deliver maximum performance for machine learning workloads, including the training of large neural networks across multiple Tesla GPUs.

[adrotate banner=”4″]Two times faster than Facebook’s existing system, Big Sur will enable the company to train twice as many neural networks – and to create neural networks that are twice as large – which will help develop more accurate models and new classes of advanced applications.

“The key to unlocking the knowledge necessary to develop more intelligent machines lies in the capability of our computing systems,” said Serkan Piantino, engineering director for FAIR. “Most of the major advances in machine learning and AI in the past few years have been contingent on tapping into powerful GPUs and huge data sets to build and train advanced models.”

The addition of Tesla M40 GPUs will help Facebook make new advancements in machine learning research and enable teams across its organisation to use deep neural networks in a variety of products and services.

 

First Open Sourced AI Computing Architecture

Big Sur represents the first time a computing system specifically designed for machine learning and artificial intelligence (AI) research will be released as an open source solution.

Committed to doing its AI work in the open and sharing its findings with the community, Facebook intends to work with its partners to open source Big Sur specifications via the Open Compute Project. This unique approach will make it easier for AI researchers worldwide to share and improve techniques, enabling future innovation in machine learning by harnessing the power of GPU accelerated computing.