Tag Archives: Inference

IBM To Fuel AI Digital Transformation In Malaysia!

IBM is offering IT infrastructure with a new level of security and reliability to fuel Malaysia’s AI digital transformation!

 

AI Digital Transformation In Malaysia Needs Better IT Infra!

Malaysia is at a pivotal point in its digital journey, with businesses future-proofing their organisations by leveraging AI and automation.

The digital economy is now a cornerstone of Malaysia’s economy, contributing 23.2% of the nation’s gross domestic product in 2021. The Malaysia Minister of Communications and Digital, Fahmi Fadzil, anticipates that this will increase to 25.5% by 2025, with a value of RM382 billion. The National Tech Association of Malaysia is even more optimistic, and believes that the digital economy will hit that economic contribution much earlier.

The increased urgency in adopting cutting-edge technologies like AI is driving demand for better IT infrastructure. Better not just in terms of performance and availability, but also security and reliability.

An IDC report forecasted that AI spending in the Asia Pacific region alone will skyrocket to US$78 billion by 2027. And these AI investments are predominantly being funnelled into infrastructure provisioning. That’s because ultimately, digital transformation in any nation is reliant on its IT infrastructure.

Recommended : IBM Expands Power10 Server Line With New Models!

 

IBM To Fuel AI Digital Transformation In Malaysia!

There is no one-size-fits-all approach to AI infrastructure. Organisations must provision the right infrastructure for the AI task at hand.

They have to not only look at the size and scale of the AI models and tasks, they also have to consider security and privacy issues, as well as regulatory compliance. A resilient infrastructure by design is also critical, with AI workloads becoming essential backbones to mission critical applications and workloads.

To that end, IBM Power Systems offer a secure and reliable platform for enterprises to perform inference, and run AI algorithms on their most sensitive data and transactions.

IBM Power Systems run on the Power10 core, which is designed for AI acceleration. Each Power10 core on the IBM Power S1022 can process up to 42% more batch queries per second than a comparable x86 server with a peak load of 40 concurrent users, while running large language AI models.

IBM Power10 systems also offers enterprise out-0f-the-box low-latency transactional capabilities and throughput, resiliency, continuous availability (99.999%) and concurrent replace and repair.

The IBM Power10 is also designed for greater efficiency. The Power E1080, for example, offers 3X the capacity with 52% lower power consumption for the same workload, compared to the Power E880C. It also offers 33% lower power consumption than the Power E980 for the same workload.

IBM claims that, in general, Power10 systems offer 6X more throughput per container cluster, and 40% to 50% lower cost than comparable x86 solutions.

On top of that, the IBM Power Systems are built to be secure by design, with a fully-integrated secured stack from processor chip to operating system, offering quantum-safe encryption and fully homomorphic encryption (FHE).

 

Please Support My Work!

Support my work through a bank transfer /  PayPal / credit card!

Name : Adrian Wong
Bank Transfer : CIMB 7064555917 (Swift Code : CIBBMYKL)
Credit Card / Paypal : https://paypal.me/techarp

Dr. Adrian Wong has been writing about tech and science since 1997, even publishing a book with Prentice Hall called Breaking Through The BIOS Barrier (ISBN 978-0131455368) while in medical school.

He continues to devote countless hours every day writing about tech, medicine and science, in his pursuit of facts in a post-truth world.

[/su_note]

 

Recommended Reading

Go Back To > Business | MoneyTech ARP

 

Support Tech ARP!

Please support us by visiting our sponsors, participating in the Tech ARP Forums, or donating to our fund. Thank you!

IBM z16 : Industry’s First Quantum-Safe System Explained!

IBM just introduced the z16 system, powered by their new Telum processor with an integrated AI accelerator!

Take a look at the z16, and find out why it is the industry’s first quantum-safe system!

 

IBM z16 : Industry’s First Quantum-Safe System!

On 25 April 2022, IBM officially unveiled their new z16 system in Malaysia – the industry’s first quantum-safe system.

IBM Vice President for Worldwide Sales of IBM Z and LinuxONE, Jose Castano, flew to Kuala Lumpur, to give us an exclusive briefing on the new z16 system, and tell us why it is the industry’s first quantum-safe system.

IBM Z and LinuxONE Security CTO Michael Jordan also briefed us on why quantum-safe computing will be critical for enterprises, as quantum computing improves.

Thanks to its Telum processor, the IBM z16 system delivers low and consistent latency for embedding AI into response time-sensitive transactions. This can enable customers to leverage AI inference to better control the outcome of transactions before they complete.

For example, they can leverage AI inference to mitigate risk in Clearing & Settlement applications, to predict which transactions have high risk exposure, and highlight questionable transactions, to prevent costly consequences.

In a use-case example, one international bank uses AI on IBM Z as part of their credit card authorization process instead of using an off-platform inference solution. As a result, the bank can detect fraud during its credit card transaction authorisation processing.

The IBM z16 will offer better AI inference capacity, thanks to its integrated AI accelerator offering up to 1 ms of latency, expanding use cases that include :

  • tax fraud and organised retail theft detection
  • real-time payments and alternative payment methods, including cryptocurrencies
  • speed up business or consumer loan approvals

As the industry’s first quantum-safe system, the IBM z16 is protected by lattice-based crypto graphs – an approach for constructing security primitives that help protect data and systems against current and future threats.

 

IBM z16 : Powered By The New Telum Processor!

The IBM z16 is built around the new IBM Telum processor, which is specifically designed for secure processing, and real-time AI inference.

Here are the key features of the IBM Telum processor that powers the new IBM z16 system :

  • Fabricated on the 7 nm process technology
  • Has 8 processor cores, clocked at over 5 GHz
  • Each processor core has a dedicated 32 MB private L2 cache
  • The eight 32 MB L2 cache can form a virtual 256 MB L3 cache, and a 2 GB L4 cache.
  • Transparent encryption of main memory, with 8-channel fault tolerant memory interface
  • Integrated AI accelerator with 6 TFLOPS compute capacity
  • Centralised AI accelerator architecture, with direct connection to the cache infrastructure

The Telum processor is designed to enable extremely low latency inference for response-time sensitive workloads. With planned system support for up to 200 TFLOPs, the AI acceleration is also designed to scale up to the requirements of the most demanding workloads.

Thanks to the Telum processor, the IBM z16 can process 300 billion inference requests per day, with just one millisecond of latency.

 

Please Support My Work!

Support my work through a bank transfer /  PayPal / credit card!

Name : Adrian Wong
Bank Transfer : CIMB 7064555917 (Swift Code : CIBBMYKL)
Credit Card / Paypal : https://paypal.me/techarp

Dr. Adrian Wong has been writing about tech and science since 1997, even publishing a book with Prentice Hall called Breaking Through The BIOS Barrier (ISBN 978-0131455368) while in medical school.

He continues to devote countless hours every day writing about tech, medicine and science, in his pursuit of facts in a post-truth world.

 

Recommended Reading

Go Back To > Enterprise | ComputerTech ARP

 

Support Tech ARP!

Please support us by visiting our sponsors, participating in the Tech ARP Forums, or donating to our fund. Thank you!

NVIDIA TensorRT 7 with Real-Time Conversational AI!

NVIDIA just launched TensorRT 7, introducing the capability for Real-Time Conversational AI!

Here is a primer on the NVIDIA TensorRT 7, and the new real-time conversational AI capability!

 

NVIDIA TensorRT 7 with Real-Time Conversational AI

NVIDIA TensorRT 7 is their seventh-generation inference software development kit. It introduces the capability for real-time conversational AI, opening the door for human-to-AI interactions.

TensorRT 7 features a new deep learning compiler designed to automatically optimise and accelerate the increasingly complex recurrent and transformer-based neural networks needed for AI speech applications.

This boosts the performance of conversational AI components by more than 10X, compared to running them on CPUs. This drives down the latency below the 300 millisecond (0.3 second) threshold considered necessary for real-time interactions.

 

TensorRT 7 Targets Recurrent Neural Networks

TensorRT 7 is designed to speed up AI models that are used to make predictions on time-series, sequence-data scenarios that use recurrent loop structures (RNNs).

RNNs are used not only for conversational AI speed networks, they also help with arrival time planning for cars and satellites, predictions of events in electronic medical records, financial asset forecasting and fraud detection.

The use of RNN has hitherto been limited to a few companies with the talent and manpower to hand-optimise the code to meet real-time performance requirements.

With TensorRT 7’s new deep learning compiler, developers now have the ability to automatically optimise these neural networks to deliver the best possible performance and lowest latencies.

The new compiler also optimises transformer-based models like BERT for natural language processing.

 

TensorRT 7 Availability

NVIDIA TensorRT 7 will be made available in the coming days for development and deployment for free to members of the NVIDIA Developer program.

The latest versions of plug-ins, parsers and samples are also available as open source from the TensorRT GitHub repository.

 

Recommended Reading

Go Back To > Software | Business | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


NVIDIA DRIVE Deep Neural Networks : Access Granted!

NVIDIA just announced that they will be providing the transportation industry access to their NVIDIA DRIVE Deep Neural Networks (DNNs) for autonomous vehicle development! Here are the details!

 

NVIDIA DRIVE Deep Neural Networks : Access Granted!

To accelerate the adoption of NVIDIA DRIVE by the transportation industry for autonomous vehicle development, NVIDIA is providing access to the NVIDIA DRIVE Deep Neural Networks.

What this means is autonomous vehicle developers will now be able to access all of NVIDIA”s pre-trained AI models and training code, and use them to improve their self-driving systems.

Using AI is central to the development of safe, self-driving cars. AI lets autonomous vehicles perceive and react to obstacles and potential dangers, or even changes in their surroundings.

Powering every self-driving car are dozens of Deep Neural Networks (DNNs) that tackle redundant and diverse tasks, to ensure accurate perception, localisation and path planning.

These DNNs cover tasks like traffic light and sign detection, object detection for vehicles, pedestrians and bicycles, and path perception, as well as gaze detection and gesture recognition within the vehicle.

 

Advanced NVIDIA DRIVE Tools

In addition to providing access to their DRIVE DNNs, NVIDIA also made available a suite of advanced NVIDIA DRIVE tools.

These NVIDIA DRIVE tools allow autonomous vehicle developers to customise and enhance the NVIDIA DRIVE DNNs using their own datasets and target feature set.

  • Active Learning improves model accuracy and reduces data collection costs by automating data selection using AI, rather than manual curation.
  • Federated Learning lets developers utilise datasets across countries, and with other developers while maintaining data privacy and protecting their own intellectual property.

  • Transfer Learning gives NVIDIA DRIVE customers the ability to speed up development of their own perception software by leveraging NVIDIA’s own autonomous vehicle development.

 

Recommended Reading

Go Back To > Automotive | Business | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


Intel Nervana AI Accelerators : Everything You Need To Know!

Intel just introduced their Nervana AI accelerators – the Nervana NNP-T1000 for training, and Nervana NNP-I1000 for inference!

Here is EVERYTHING you need to know about these two new Intel Nervana AI accelerators!

 

Intel Nervana Neural Network Processors

Intel Nervana neural network processors, NNPs for short, are designed to accelerated two key deep learning technologies – training and inference.

To target these two different tasks, Intel created two AI accelerator families – Nervana NNP-T that’s optimised for training, and Nervana NNP-I that’s optimised for inference.

They are both paired with a full software stack, developed with open components and deep learning framework integration.

 

Nervana NNP-T For Training

The Intel Nervana NNP-T1000 is not only capable of training even the most complex deep learning models, it is highly scalable – offering near linear scaling and efficiency.

By combining compute, memory and networking capabilities in a single ASIC, it allows for maximum efficiency with flexible and simple scaling.

Recommended : Intel NNP-T1000 PCIe + Mezzanine Cards Revealed!

Each Nervana NNP-T1000 is powered by up to 24 Tensor Processing Clusters (TPCs), and comes with 16 bi-directional Inter-Chip Links (ICL).

Its TPC supports 32-bit floating point (FP32) and brain floating point (bfloat16) formats, allowing for multiple deep learning primitives with maximum processing efficiency.

Its high-speed ICL communication fabric allows for near-linear scaling, directly connecting multiple NNP-T cards within servers, between servers and even inside and across racks.

  • High compute utilisation using Tensor Processing Clusters (TPC) with bfloat16 numeric format
  • Both on-die SRAM and on-package High-Bandwidth Memory (HBM) keep data local, reducing movement
  • Its Inter-Chip Links (ICL) glueless fabric architecture and fully-programmable router achieves near-linear scaling across multiple cards, systems and PODs
  • Available in PCIe and OCP Open Accelerator Module (OAM) form factors
  • Offers a programmable Tensor-based instruction set architecture (ISA)
  • Supports common open-source deep learning frameworks like TensorFlow, PaddlePaddle and PyTorch

 

Intel Nervana NNP-T Accelerator Models

The Intel Nervana NNP-T is currently available in two form factors – a dual-slot PCI Express card, and a OAM Mezzanine Card, with these specifications :

Specifications Intel Nervana NNP-T1300 Intel Nervana NNP-T1400
Form Factor Dual-slot PCIe Card OAM Mezzanine Card
Compliance PCIe CEM OAM 1.0
Compute Cores 22 TPCs 24 TPCs
Frequency 950 MHz 1100 MHz
SRAM 55 MB on-chip, with ECC 60 MB on-chip, with ECC
Memory 32 GB HBM2, with ECC 32 GB HBM2, with ECC
Memory Bandwidth 2.4 Gbps (300 MB/s)
Inter-Chip Link (ICL) 16 x 112 Gbps (448 GB/s)
ICL Topology Ring Ring, Hybrid Cube Mesh,
Fully Connected
Multi-Chassis Scaling Yes Yes
Multi-Rack Scaling Yes Yes
I/O to Host CPU PCIe Gen3 / Gen4 x16
Thermal Solution Passive, Integrated Passive Cooling
TDP 300 W 375 W
Dimensions 265.32 mm x 111.15 mm 165 mm x 102 mm

 

Nervana NNP-I For Inference

The Intel Nervana NNP-I1000, on the other hand, is optimised for multi-modal inferencing of near-real-time, high-volume compute.

Each Nervana NNP-I1000 features 12 Inference Compute Engines (ICE), which are paired with two Intel CPU cores, a large on-die 75 MB SRAM cache and an on-die Network-on-Chip (NoC).

Recommended : Intel NNP-I1000 PCIe + M.2 Cards Revealed!

Intel Nervana NNP-I1000 PCIe + M.2 Cards Revealed!

It offers mixed-precision support, with a special focus on low-precision applications for near-real-time performance.

Like the NNP-T, the NNP-I comes with a full software stack that is built with open components, including direct integration with deep learning frameworks.

Intel Nervana NNP-I Accelerator Models

The NNP-I1000 comes in a 12 W M.2 form factor, or a 75 W PCI Express card, to accommodate exponentially larger and more complex models, or to run dozens of models and networks in parallel.

Specifications Intel Nervana NNP-I1100 Intel Nervana NNP-I1300
Form Factor M.2 Card PCI Express Card
Compute 1 x Intel Nervana NNP-I1000 2 x Intel Nervana NNP-I1000
SRAM 75 MB 2 x 75 MB
Int8 Performance Up to 50 TOPS Up to 170 TOPS
TDP 12 W 75 W

 

Recommended Reading

[adrotate group=”2″]

Go Back To > Business + Enterprise | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


NVIDIA Jetson Xavier NX : World’s Smallest AI Supercomputer

On 7 November 2019, NVIDIA introduced the Jetson Xavier NX – the world’s smallest AI supercomputer designed for robotics and embedded computing applications at the edge!

Here is EVERYTHING you need to know about the new NVIDIA Jetson Xavier NX!

 

NVIDIA Jetson Xavier NX : World’s Smallest AI Supercomputer

At just 70 x 45 mm, the new NVIDIA Jetson Xavier NX is smaller than a credit card. Yet it delivers server-class AI performance at up to 21 TOPS, while consuming as little as 10 watts of power.

Short for Nano Xavier, the NX is a low-power version of the Xavier SoC that came up tops in the MLPerf Inference benchmarks.

Recommended : NVIDIA Wins MLPerf Inference Benchmarks For DC + Edge!

With its small size and low-power, it opens up the possibility of adding AI on-the-edge computing capabilities to small commercial robots, drones, industrial IoT systems, network video recorders and portable medical devices.

The Jetson Xavier NX can be configured to deliver up to 14 TOPS at 10 W, or 21 TOPS at 15 W. It is powerful enough to run multiple neural networks in parallel, and process data from multiple high-resolution sensors simultaneously.

The NVIDIA Jetson Xavier NX runs on the same CUDA-X AI software architecture as all other Jetson processors, and is supported by the NVIDIA JetPack software development kit.

It is pin-compatible with the Jetson Nano, offering up to 15X higher performance than the Jetson TX2 in a smaller form factor.

It is not available for a few more months, but developers can begin development today using the Jetson AGX Xavier Developer Kit, with a software patch to emulate Jetson Xavier NX.

 

NVIDIA Jetson Xavier NX Specifications

Specifications NVIDIA Jetson Xavier NX
CPU NVIDIA Carmel
– 6 x Arm 64-bit cores
– 6 MB L2 + 4 MB L3 caches
GPU NVIDIA Volta
– 384 CUDA cores, 48 Tensor cores, 2 NVDLA cores
AI Performance 21 TOPS : 15 watts
14 TOPS : 10 watts
Memory Support 128-bit LPDDR4x-3200
– Up to 8 GB, 51.2 GB/s
Video Support Encoding : Up to 2 x 4K30 streams
Decoding : Up to 2 x 4K60 streams
Camera Support Up to six CSI cameras (32 via virtual channels)
Up to 12 lanes (3×4 or 6×2) MIPI CSI-2
Connectivity Gigabit Ethernet
OS Support Ubuntu-based Linux
Module Size 70 x 45 mm (Nano)

 

NVIDIA Jetson Xavier NX Price + Availability

The NVIDIA Jetson Xavier NX will be available in March 2020 from NVIDIA’s distribution channels, priced at US$399.

 

Recommended Reading

Go Back To > Enterprise | Software | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


NVIDIA Wins MLPerf Inference Benchmarks For DC + Edge!

The MLPerf Inference 0.5 benchmarks are officially released today, with NVIDIA declaring that they aced them for both datacenter and edge computing workloads.

Find out how well NVIDIA did, and why it matters!

 

The MLPerf Inference Benchmarks

MLPerf Inference 0.5 is the industry’s first independent suite of five AI inference benchmarks.

Applied across a range of form factors and four inference scenarios, the new MLPerf Inference Benchmarks test the performance of established AI applications like image classification, object detection and translation.

 

NVIDIA Wins MLPerf Inference Benchmarks For Datacenter + Edge

Thanks to the programmability of its computing platforms to cater to diverse AI workloads, NVIDIA was the only company to submit results for all five MLPerf Inference Benchmarks.

According to NVIDIA, their Turing GPUs topped all five benchmarks for both datacenter scenarios (server and offline) among commercially-available processors.

Meanwhile, their Jetson Xavier scored highest among commercially-available edge and mobile SoCs under both edge-focused scenarios – single stream and multi-stream.

The new NVIDIA Jetson Xavier NX that was announced today is a low-power version of the Xavier SoC that won the MLPerf Inference 0.5 benchmarks.

All of NVIDIA’s MLPerf Inference Benchmark results were achieved using NVIDIA TensorRT 6 deep learning inference software.

 

Recommended Reading

Go Back To > Enterprise | Software | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


The Alibaba Hanguang 800 (含光 800) AI NPU Explained!

At the Apsara Computing Conference 2019, Alibaba Group unveiled details of their first AI inference NPU – the Hanguang 800 (含光 800).

Here is EVERYTHING you need to know about the Alibaba Hanguang 800 AI inference NPU!

Updated @ 2019-09-27 : Added more details, including a performance comparison against its main competitors.

Originally posted @ 2019-09-25

 

What Is The Alibaba Hanguang 800?

The Alibaba Hanguang 800 is a neural processing unit (NPU) for AI inference applications. It was specifically designed to accelerate machine learning and AI inference tasks.

 

What Does Hanguang Mean?

The name 含光 (Hanguang) literally means “contains light“.

While the name may suggest that it uses photonics, that light-based technology is still at least a decade from commercialisation.

 

What Are The Hanguang 800 Specifications?

Not much is known about the Hanguang 800, other than that it has 17 billion transistors, and is fabricated on the 12 nm process technology.

Also, it is designed for inferencing only, unlike the HUAWEI Ascend 910 AI chip which can handle both training and inference.

Recommended : 3rd Gen X-Dragon Architecture by Alibaba Cloud Explained!

 

Who Designed The Hanguang 800?

The Hanguang 800 was developed over a period of 7 months, by Alibaba’s research unit, T-Head, followed by a 3-month tape-out.

T-Head, whose Chinese name is Pintougehoney badger in English, is responsible for designing chips for cloud and edge computing under Alibaba Cloud / Aliyun.

Earlier this year, T-Head revealed a high-performance IoT processor called XuanTie 910.

Based on the RISC-V open-source instruction set, 16-core XuanTie 910 is targeted at heavy-duty IoT applications like edge servers, networking gateways, and self-driving automobiles.

 

How Fast Is Hanguang 800?

Alibaba claims that the Hanguang 800 “largely” outpaces the industry average performance, with image processing efficiency about 12X better than GPUs :

  • Single chip performance : 78,563 images per second (IPS)
  • Computational efficiency : 500 IPS per watt (Resnet-50 Inference Test)
Hanguang 800 Habana Goya Cambricon MLU270 NVIDIA T4 NVIDIA P4
Fab Process 12 nm 16 nm 16 nm 12 nm 16 nm
Transistors 17 billion NA NA 13.6 billion 7.2 billion
Performance
(ResNet-50)
78,563 IPS 15,433 IPS 10,000 IPS 5,402 IPS 1,721 IPS
Peak Efficiency
(ResNet-50)
500 IPS/W 150 IPS/W 143 IPS/W 78 IPS/W 52 IPS/W

Recommended : 2nd Gen EPYC – Everything You Need To Know Summarised!

 

Where Will Hanguang 800 Be Used?

The Hanguang 800 chip will be used exclusively by Alibaba to power their own business operations, especially in product search and automatic translation, personalised recommendations and advertising.

According to Alibaba, merchants upload a billion product images to Taobao every day. It used to take their previous platform an hour to categorise those pictures, and then tailor search and personalise recommendations for millions of Taobao customers.

With Hanguang 800, they claim that the Taboo platform now takes just 5 minutes to complete the task – a 12X reduction in time!

Alibaba Cloud will also be using it in their smart city projects. They are already using it in Hangzhou, where they previously used 40 GPUs to process video feeds with a latency of 300 ms.

After migrating to four Hanguang 800 NPUs, they were able to process the same video feeds with half the latency – just 150 ms.

 

Can We Buy Or Rent The Hanguang 800?

No, Alibaba will not be selling the Hanguang 800 NPU. Instead, they are offering it as a new AI cloud computing service.

Developers can now make a request for a Hanguang 800 cloud compute quota, which Alibaba Cloud claims is 100% more cost-effective than traditional GPUs.

 

Are There No Other Alternatives For Alibaba?

In our opinion, this is Alibaba’s way of preparing for an escalation of the US-Chinese trade war that has already savaged HUAWEI.

While Alibaba certainly have a few AI inference accelerator alternatives, from AMD and NVIDIA for example, it makes sense for them to spend money and time developing their own AI inference chip.

In the long term, the Chinese government wants to build a domestic capability to design and fabricate their own computer chips for national security reasons.

Recommended : The HUAWEI Trump Ban – Everything You Need To Know!

 

Recommended Reading

Go Back To > Business + Enterprise | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


AMD 7nm Vega Presentation + Demo + First Look!

One of the biggest revelations at the AMD Computex 2018 press conference is how well along AMD is with their 7nm efforts. Everything appears to be chugging along as planned. AMD not only shared new details about the 7nm Vega GPU, they also showed off an actual sample!

 

The 7nm Vega Revealed!

Let’s start with this presentation on the 7nm Vega by David Wang, Senior Vice-President of Engineering at the Radeon Technologies Group. Gilbert Leung then demonstrated the performance of the 7nm Vega GPU, which has 32 GB of HBM2 memory, running Cinema4D R19 with Radeon ProRender.

Here are the key points from his presentation :

  • The AMD graphics roadmap from 2017 has not changed. The AMD Vega architecture will get a 7nm die shrink this year, before an architectural change with AMD Navi in 2019.
  • The 7nm die shrink will double power efficiency, and increase performance by 1.35X.
  • The first 7nm Vega GPU will be used in their Radeon Instinct Vega 7nm accelerator, just like how the first Vega GPUs were used in their first generation Radeon Instinct accelerators.

  • In addition to the 7nm die shrink, the Radeon Instinct Vega 7nm accelerator will feature the AMD Infinity Fabric interconnect for better multi GPU performance.
  • The Radeon Instinct Vega 7nm accelerator will also support hardware virtualisation for better security and performance in virtualised environments.

  • The Radeon Instinct Vega 7nm accelerator will come with new deep learning operations, that will not only help accelerate training and inference, but also blockchain applications.
  • The 7nm Vega GPU is sampling right now, and will launch in the second half of 2018 as the Radeon Instinct Vega 7nm accelerator.
[adrotate group=”1″]

 

First Look At 7nm Vega + 7nm EPYC!

In this video, Dr. Lisa Su shows off engineering samples of the 7nm EPYC processor (on the left), and the 7nm Vega GPU (on the right).

Go Back To > Computer Hardware + Systems | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The NVIDIA Jetson TX2 (Pascal) Tech Report

NVIDIA just announced the Jetson TX2 embedded AI supercomputer, based on the latest NVIDIA Pascal microarchitecture. It promises to offer twice the performance of the previous-generation Jetson TX1, in the same package. In this tech report, we will share with you the full details of the new Pascal-based NVIDIA Jetson TX2!

 

GPUs In Artificial Intelligence

Artificial intelligence is the new frontier in GPU compute technology. Whether they are used to power training or inference engines, AI research has benefited greatly from the massive amounts of compute power in modern GPUs.

The market is led by NVIDIA with their Tesla accelerators that run on their proprietary CUDA platform. AMD, on the other hand, is a relative newcomer with their Radeon Instinct accelerators designed to run on the open-source ROCm (Radeon Open Compute) platform.

 

The NVIDIA Jetson

GPUs today offer so much compute performance that NVIDIA has been able to create the NVIDIA Jetson family of embedded AI supercomputers. They differ from their Tesla big brother in their size, power efficiency and purpose. The NVIDIA Jetson modules are specifically built for “inference at the edge” or “AI at the edge“.

 

Unlike AI processing in the datacenters or in the cloud, AI in the edge refers to autonomous artificial intelligence processing, where there is poor or no Internet access or access must be restricted for privacy or security reasons. Therefore, the processor must be powerful enough for the AI application to run autonomously.

Whether it’s to automate robots in a factory, or to tackle industrial accidents like at the Fukushima Daiichi nuclear plant, AI at the edge is meant to allow for at least some autonomous capability right in the field. The AI in the edge processors must also be frugal in using power, as power or battery life is often limited.

[adrotate banner=”5″]

Hence, processors designed for AI on the edge applications must be small, power-efficient and yet, fast enough to run AI inference in real time. The NVIDIA Jetson family of embedded AI supercomputers promises to tick all of those boxes. Let’s take a look :

Next Page > The NVIDIA Jetson TX2, Specification Comparison, Price & Availability

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The NVIDIA Jetson TX2

The NVIDIA Jetson TX2 is the second-generation Jetson embedded AI module, based on the latest NVIDIA Pascal microarchitecture. It supersedes (but not replace) the previous-generation Jetson TX1, which was built on the NVIDIA Maxwell microarchitecture and released in November 2015.

Thanks to the faster and more power-efficient Pascal microarchitecture, the NVIDIA Jetson TX2 promises to be twice as energy-efficient as the Jetson TX1.

This means the developers switching to the Jetson TX2 can now opt to maximise power efficiency, or to maximise performance. In the Max-Q mode, the Jetson TX2 will use less than 7.5 W, and offer Jetson TX1-equivalent performance. In the Max-P mode, the Jetson TX2 will use less than 15 W, and offer up to twice the performance of the Jetson TX1.

 

NVIDIA Jetson Specification Comparison

The NVIDIA Jetson modules are actually built around the NVIDIA Tegra SoCs, instead of their GeForce GPUs. The Tegra SoC is a System On A Chip, which integrates an ARM CPU, an NVIDIA GPU, a chipset and a memory controller on a single package.

The Tegra SoC and the other components on a 50 x 87 mm board are what constitutes the NVIDIA Jetson module. The Jetson TX1 uses the Tegra X1 SoC, while the new Jetson TX2 uses the Tegra P1 SoC.

For those who have been following our coverage of the AMD Radeon Instinct, and its support for packed math, the NVIDIA Jetson TX2 and TX1 modules support FP16 operations too.

[adrotate banner=”5″]

 

NVIDIA Jetson TX2 Price & Availability

The NVIDIA Jetson TX2 Developer Kit is available for pre-order in the US and Europe right now, with a US$ 599 retail price and a US$ 299 education price. Shipping will start on March 14, 2017. This developer’s kit will be made available in APAC and other regions in April 2017.

The NVIDIA Jetson TX2 module itself will only be made available in the second quarter of 2017. It will be priced at US$ 399 per module, in quantities of 1,000 modules or more.

Note that the Jetson TX2 modules are exactly the same size and uses the same 400-pin connector. They are drop-in compatible replacements for the Jetson TX1 modules.

Next Page > NVIDIA Jetson TX1 Price Adjustments, NVIDIA Jetpack 3.0, The Slides

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

NVIDIA Jetson TX1 Price Adjustments

With the launch of the Jetson TX2, NVIDIA is adjusting the price of the Jetson TX1. The Jetson TX1 will continue to sell alongside the new Jetson TX2.

The NVIDIA Jetson TX1 Developer Kit has been reduced to US$ 499, down from US$ 599.

The NVIDIA Jetson TX1 production has been reduced to US$ 299, down from US$ 399. Again, this is in quantities of 1,000 modules or more.

 

NVIDIA Jetpack 3.0

The NVIDIA Jetson is more than just a processor module. It is a platform that is made up of developer tools and codes, as well as APIs. Like AMD offers their MIOpen deep learning library, NVIDIA offers Jetpack.

In conjunction with the launch of the Jetson TX2, NVIDIA also announced the NVIDIA Jetpack 3.0. It promises to offer twice the system performance of Jetpack 2.3.

Jetpack 3.0 is not just for the new Jetson TX2. It will offer a nice boost in performance for existing Jetson TX1 users and applications.

[adrotate banner=”5″]

 

The Presentation Slides

For those who want the full set of NVIDIA Jetson TX2 slides, here they are :

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!