Tag Archives: High Performance Computing

How NVIDIA A800 Bypasses US Chip Ban On China!

Find out how NVIDIA created the new A800 GPU to bypass the US ban on sale of advanced chips to China!

 

NVIDIA Offers A800 GPU To Bypass US Ban On China!

Two months after it was banned by the US government from selling high-performance AI chips to China, NVIDIA introduced a new A800 GPU designed to bypass those restrictions.

The new NVIDIA A800 is based on the same Ampere microarchitecture as the A100, which was used as the performance baseline by the US government.

Despite its numerically larger model number (the lucky number 8 was probably picked to appeal to the Chinese), this is a detuned part, with slightly reduced performance to meet export control limitations.

The NVIDIA A800 GPU, which went into production in Q3, is another alternative product to the NVIDIA A100 GPU for customers in China.

The A800 meets the U.S. government’s clear test for reduced export control and cannot be programmed to exceed it.

NVIDIA is probably hoping that the slightly slower NVIDIA A800 GPU will allow it to continue supplying China with A100-level chips that are used to power supercomputers and high-performance datacenters for artificial intelligence applications.

As I will show you in the next section, except in very high-end applications, there won’t be truly significant performance difference between the A800 and the A100. So NVIDIA customers who want or need the A100 will have no issue opting for the A800 instead.

However, this can only be a stopgap fix, as NVIDIA is stuck selling A100-level chips to China until and unless the US government changes its mind.

Read more : AMD, NVIDIA Banned From Selling AI Chips To China!

 

How Fast Is The NVIDIA A800 GPU?

The US government considers the NVIDIA A100 as the performance baseline for its export control restrictions on China.

Any chip equal or faster to that Ampere-based chip, which was launched on May 14, 2020, is forbidden to be sold or exported to China. But as they say, the devil is in the details.

The US government didn’t specify just how much slower chips must be, to qualify for export to China. So NVIDIA could technically get away by slightly detuning the A100, while offering almost the same performance level.

And that was what NVIDIA did with the A800 – it is basically the A100 with a 33% slower NVLink interconnect speed. NVIDIA also limited the maximum number of GPUs supported in a single server to 8.

That only slightly reduces the performance of A800 servers, compare to A100 servers, while offering the same amount of GPU compute performance. Most users will not notice the difference.

The only significant impediment is on the very high-end – Chinese companies are now restricted to a maximum of eight GPUs per server, instead of up to sixteen.

To show you what I mean, I dug into the A800 specifications, and compared them to the A100 below:

NVIDIA A100 vs A800 : 80GB PCIe Version

Specifications A100
80GB PCIe
A800
80GB PCIe
FP64 9.7 TFLOPS
FP64 Tensor Core 19.5 TFLOPS
FP32 19.5 TFLOPS
Tensor Float 32 156 TFLOPS
BFLOAT 16 Tensor Core 312 TFLOPS
FP16 Tensor Core 312 TFLOPS
INT8 Tensor Core 624 TOPS
GPU Memory 80 GB HBM2
GPU Memory Bandwifth 1,935 GB/s
TDP 300 W
Multi-Instance GPU Up to 7 MIGs @ 10 GB
Interconnect NVLink : 600 GB/s
PCIe Gen4 : 64 GB/s
NVLink : 400 GB/s
PCIe Gen4 : 64 GB/s
Server Options 1-8 GPUs

NVIDIA A100 vs A800 : 80GB SXM Version

Specifications A100
80GB SXM
A800
80GB SXM
FP64 9.7 TFLOPS
FP64 Tensor Core 19.5 TFLOPS
FP32 19.5 TFLOPS
Tensor Float 32 156 TFLOPS
BFLOAT 16 Tensor Core 312 TFLOPS
FP16 Tensor Core 312 TFLOPS
INT8 Tensor Core 624 TOPS
GPU Memory 80 GB HBM2
GPU Memory Bandwifth 2,039 GB/s
TDP 400 W
Multi-Instance GPU Up to 7 MIGs @ 10 GB
Interconnect NVLink : 600 GB/s
PCIe Gen4 : 64 GB/s
NVLink : 400 GB/s
PCIe Gen4 : 64 GB/s
Server Options 4/ 8 / 16 GPUs 4 / 8 GPUs

NVIDIA A100 vs A800 : 40GB PCIe Version

Specifications A100
40GB PCIe
A800
40GB PCIe
FP64 9.7 TFLOPS
FP64 Tensor Core 19.5 TFLOPS
FP32 19.5 TFLOPS
Tensor Float 32 156 TFLOPS
BFLOAT 16 Tensor Core 312 TFLOPS
FP16 Tensor Core 312 TFLOPS
INT8 Tensor Core 624 TOPS
GPU Memory 40 GB HBM2
GPU Memory Bandwifth 1,555 GB/s
TDP 250 W
Multi-Instance GPU Up to 7 MIGs @ 10 GB
Interconnect NVLink : 600 GB/s
PCIe Gen4 : 64 GB/s
NVLink : 400 GB/s
PCIe Gen4 : 64 GB/s
Server Options 1-8 GPUs

 

Please Support My Work!

Support my work through a bank transfer /  PayPal / credit card!

Name : Adrian Wong
Bank Transfer : CIMB 7064555917 (Swift Code : CIBBMYKL)
Credit Card / Paypal : https://paypal.me/techarp

Dr. Adrian Wong has been writing about tech and science since 1997, even publishing a book with Prentice Hall called Breaking Through The BIOS Barrier (ISBN 978-0131455368) while in medical school.

He continues to devote countless hours every day writing about tech, medicine and science, in his pursuit of facts in a post-truth world.

 

Recommended Reading

Go Back To > Business | ComputerTech ARP

 

Support Tech ARP!

Please support us by visiting our sponsors, participating in the Tech ARP Forums, or donating to our fund. Thank you!

IBM z16 : Industry’s First Quantum-Safe System Explained!

IBM just introduced the z16 system, powered by their new Telum processor with an integrated AI accelerator!

Take a look at the z16, and find out why it is the industry’s first quantum-safe system!

 

IBM z16 : Industry’s First Quantum-Safe System!

On 25 April 2022, IBM officially unveiled their new z16 system in Malaysia – the industry’s first quantum-safe system.

IBM Vice President for Worldwide Sales of IBM Z and LinuxONE, Jose Castano, flew to Kuala Lumpur, to give us an exclusive briefing on the new z16 system, and tell us why it is the industry’s first quantum-safe system.

IBM Z and LinuxONE Security CTO Michael Jordan also briefed us on why quantum-safe computing will be critical for enterprises, as quantum computing improves.

Thanks to its Telum processor, the IBM z16 system delivers low and consistent latency for embedding AI into response time-sensitive transactions. This can enable customers to leverage AI inference to better control the outcome of transactions before they complete.

For example, they can leverage AI inference to mitigate risk in Clearing & Settlement applications, to predict which transactions have high risk exposure, and highlight questionable transactions, to prevent costly consequences.

In a use-case example, one international bank uses AI on IBM Z as part of their credit card authorization process instead of using an off-platform inference solution. As a result, the bank can detect fraud during its credit card transaction authorisation processing.

The IBM z16 will offer better AI inference capacity, thanks to its integrated AI accelerator offering up to 1 ms of latency, expanding use cases that include :

  • tax fraud and organised retail theft detection
  • real-time payments and alternative payment methods, including cryptocurrencies
  • speed up business or consumer loan approvals

As the industry’s first quantum-safe system, the IBM z16 is protected by lattice-based crypto graphs – an approach for constructing security primitives that help protect data and systems against current and future threats.

 

IBM z16 : Powered By The New Telum Processor!

The IBM z16 is built around the new IBM Telum processor, which is specifically designed for secure processing, and real-time AI inference.

Here are the key features of the IBM Telum processor that powers the new IBM z16 system :

  • Fabricated on the 7 nm process technology
  • Has 8 processor cores, clocked at over 5 GHz
  • Each processor core has a dedicated 32 MB private L2 cache
  • The eight 32 MB L2 cache can form a virtual 256 MB L3 cache, and a 2 GB L4 cache.
  • Transparent encryption of main memory, with 8-channel fault tolerant memory interface
  • Integrated AI accelerator with 6 TFLOPS compute capacity
  • Centralised AI accelerator architecture, with direct connection to the cache infrastructure

The Telum processor is designed to enable extremely low latency inference for response-time sensitive workloads. With planned system support for up to 200 TFLOPs, the AI acceleration is also designed to scale up to the requirements of the most demanding workloads.

Thanks to the Telum processor, the IBM z16 can process 300 billion inference requests per day, with just one millisecond of latency.

 

Please Support My Work!

Support my work through a bank transfer /  PayPal / credit card!

Name : Adrian Wong
Bank Transfer : CIMB 7064555917 (Swift Code : CIBBMYKL)
Credit Card / Paypal : https://paypal.me/techarp

Dr. Adrian Wong has been writing about tech and science since 1997, even publishing a book with Prentice Hall called Breaking Through The BIOS Barrier (ISBN 978-0131455368) while in medical school.

He continues to devote countless hours every day writing about tech, medicine and science, in his pursuit of facts in a post-truth world.

 

Recommended Reading

Go Back To > Enterprise | ComputerTech ARP

 

Support Tech ARP!

Please support us by visiting our sponsors, participating in the Tech ARP Forums, or donating to our fund. Thank you!

AMD Instinct MI100 : 11.5 TFLOPS In A Single Card!

AMD just announced the Instinct MI100 – the world’s fastest HPC GPU accelerator, delivering 11.5 TFLOPS in a single card!

 

AMD Instinct MI100 : 11.5 TFLOPS In A Single Card!

Powered by the new CDNA architecture, the AMD Instinct MI100 is the world’s fastest HPC GPU, and the first to break the 10 TFLOPS FP64 barrier!

Compared to the last-generation AMD accelerators, the AMD Instinct MI100 offers HPC applications almost 3.5X faster performance (FP32 matrix), and AI applications nearly 7X boost in throughput (FP16).

  • up to 11.5 TFLOPS of FP64 performance for HPC
  • up to 46.1 TFLOPS of FP32 Matrix performance for AI and machine learning
  • up to 184.6 TFLOPS of FP16 performance for AI training

2nd Gen AMD Infinity Fabric

It also leverages on the 2nd Gen AMD Infinity Fabric technology to deliver twice the peer-to-peer IO bandwidth of PCI Express 4.0. Thanks to its triple Infinity Fabric Links, it offers up to 340 GB/s of aggregate bandwidth per card.

In a server, MI100 GPUs can be configured as two fully-connected quad GPU hives, each providing up to 552 GB/s of P2P IO bandwidth.

Ultra-Fast HBM2 Memory

The AMD Instinct MI100 comes with 32 GB of HBM2 memory that deliver up to 1.23 TB/s of memory bandwidth to support large datasets.

PCI Express 4.0 Interface

The AMD Instinct MI100 is supports PCI Express 4.0, allowing for up to 64 GB/s of peak bandwidth from CPU to GPU, when paired with 2nd Gen AMD EPYC processors.

AMD Instinct MI100 : Specifications

Specifications AMD Instinct MI100
Fab Process 7 nm
Compute Units 120
Stream Processors 7,680
Peak BFLOAT16
Peak INT4 | INT8
Peak FP16
Peak FP32
Peak FMA32
Peak FP64 | FMA64
92.3 TFLOPS
184.6 TOPS
184.6 TFLOPS
46.1 TFLOPS
23.1 TFLOPS
11.5 TFLOPS
Memory 32 GB HBM2
Memory Interface 4,096 bits
Memory Clock 1.2 GHz
Memory Bandwidth 1.2 TB/s
Reliability Full Chip ECC
RAS Support
Scalability 3 x Infinity Fabric Links
OS Support Linux 64-bit
Bus Interface PCIe Gen 3 / Gen 4
Board Form Factor Full Height, Dual Slot
Board Length 10.5-inch long
Cooling Passively Cooled
Max Board Power 300 W TDP
Warranty 3-Years Limited

 

AMD Instinct MI100 : Availability

The AMD Instinct MI100 will be available in systems by the end of 2020 from OEM/ODM partners like Dell, Gigabyte, Hewlett Packard Enterprise (HPE), and Supermicro.

 

Recommended Reading

Go Back To > Enterprise ITComputer HardwareHome

 

Support Tech ARP!

If you like our work, you can help support us by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


AMD CDNA Architecture : Tech Highlights!

In addition to the gaming-centric RDNA architecture, AMD just introduced a new CDNA architecture that is optimised for compute workloads.

Here are some key tech highlights of the new AMD CDNA architecture!

 

AMD CDNA Architecture : What Is It?

Unlike the fixed-function graphics accelerators of the past, GPUs are now fully-programmable accelerators using what’s called the GPGPU (General Purpose GPU) Architecture.

GPGPU allowed the industry to leverage their tremendous processing power for machine learning and scientific computing purposes.

Instead of continuing down the GPGPU path, AMD has decided to introduce two architectures :

  • AMD RDNA : optimised for gaming to maximise frames per second
  • AMD CDNA : optimised for compute workloads to maximise FLOPS per second.

Designed to accelerate compute workloads, AMD CDNA augments scalar and vector processing with new Matrix Core Engines, and adds Infinity Fabric technology for scale-up capability.

This allows the first CDNA-based accelerator – AMD Instinct MI100 – to break the 10 TFLOPS per second (FP64) barrier.

The GPU is connected to its host processor using a PCI Express 4.0 interface, that delivers up to 32 GB/s of bandwidth in both directions.

 

AMD CDNA Architecture : Compute Units

The command processor and scheduling logic receives API-level commands and translates them into compute tasks.

These compute tasks are implemented as compute arrays and managed by the four Asynchronous Compute Engines (ACE), which maintain their independent stream of commands to the compute units.

Its 120 compute units (CUs) are derived from the earlier GCN architecture, and organised into four compute engines that execute wavefronts that contain 64 work-items.

The CUs are, however, enhanced with new Matrix Core Engines, that are optimised for matrix data processing.

Here is the block diagram of the AMD Instinct MI100 accelerator, showing how its main blocks are all tied together with the on-die Infinity Fabric.

Unlike the RDNA architecture, CDNA removes all of the fixed-function graphics hardware for tasks like rasterisation, tessellation, graphics caches, blending and even the display engine.

CDNA retains the dedicated logic for HEVC, H.264 and VP9 decoding that is sometimes used for compute workloads that operate on multimedia data.

The new Matrix Core Engines add a new family of wavefront-level instructions – the Matrix Fused Multiply-Add (MFMA). The MFMA instructions perform mixed-precision arithmetic and operates on KxN matrices using four different types of input data :

  • INT8 – 8-bit integers
  • FP16 – 16-bit half-precision
  • bf16 – 16-bit brain FP
  • FP32 – 32-bit single-precision

The new Matrix Core Engines has several advantages over the traditional vector pipelines in GCN :

  • the execution unit reduces the number of register file reads, since many input values are reused in a matrix multiplication
  • narrower datatypes create opportunity for workloads that do not require full FP32 precision, e.g. machine learning – saving energy.

 

AMD CDNA Architecture : L2 Cache + Memory

Most scientific and machine learning data sets are gigabytes or even terabytes in size. Therefore L2 cache and memory performance is critical.

In CDNA, the L2 cache is shared across the entire chip, and physically partitioned into multiple slices.

The MI100, specifically, has an 8 MB cache that is 16-way set-associative and made up of 32 slices. Each slice can sustain 128 bytes for an aggregate bandwidth of over 6 TB/s across the GPU.

The CDNA memory controller can drive 4- or 8-stacks high of HBM2 memory at 2.4 GT/s for a maximum throughput of 1.23 TB/s.

The memory contents are also protected by hardware ECC.

 

AMD CDNA Architecture : Communication + Scaling

CDNA is also designed for scaling up, using the high-speed Infinity Fabric technology to connect multiple GPUs.

AMD Infinity Fabric links are 16-bits wide, and operate at 23 GT/s, with three links in CDNA to allow for full connectivity in a quad-GPU configuration.

While the last generation Radeon Instinct MI50 GPU only uses a ring topology, the new fully-connected Infinity Fabric topology boosts performance for common communication patterns like all-reduce and scatter / gather.

Unlike PCI Express, Infinity Fabric links support coherent GPU memory, which lets multiple GPUs share an address space and tightly work on a single task.

 

Recommended Reading

Go Back To > Enterprise ITComputer HardwareHome

 

Support Tech ARP!

If you like our work, you can help support us by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


FIVE Dell AI Experience Zones Launched Across APJ!

In partnership with Intel, Dell Technologies announced the launch of five Dell AI Experience Zones across the APJ region!

Here is a quick primer on the new Dell AI Experience Zones, and what they mean for organisations in the APJ region!

 

The APJ Region – Ripe For Artificial Intelligence

According to the Dell Technologies Digital Transformation Index, Artificial Intelligence (AI) will be amongst the top spending priorities for business leaders in APJ.

Half of those surveyed plan to invest in AI in the next one to three years, as part of their digital transformation strategy. However, 95% of companies face a lack of in-house expertise in AI.

This is where the five new Dell AI Experience Zones come in…

 

The Dell AI Experience Zones

The new AI Experience Zones are designed to offer both customers and partners a comprehensive look at the latest AI technologies and solutions.

Built into the existing Dell Technologies Customer Solution Centres, they will showcase how the Dell EMC High-Performance Computing (HPC) and AI ecosystem can help them address business challenges and seize opportunities.

All five AI Experience Zones are equipped with technology demonstrations built around the latest Dell EMC PowerEdge servers. Powered by the latest Intel Xeon Scalable processors, they are paired with advanced, open-source AI software like VINO, as well as Dell EMC networking and storage technologies.

Customers and partners who choose to leverage the new AI Experience Zones will receive help in kickstarting their AI initiatives, from design and AI expert engagements, to masterclass training, installation and maintenance.

“The timely adoption of AI will create new opportunities that will deliver concrete business advantages across all industries and business functions,” says Chris Kelly, vice president, Infrastructure Solutions Group, Dell Technologies, APJ.

“Companies looking to thrive in a data drive era need to understand that investments in AI are no longer optional – they are business critical. Whilst complex in nature, it is imperative that companies quickly start moving from theoretical AI strategies to practical deployments to stay ahead of the curve.”

 

Dell AI Experience Zones In APJ

The five new AI Experience Zones that Dell Technologies and Intel announced are located within the Dell Technologies Customer Solution Centres in these cities :

  • Bangalore
  • Seoul
  • Singapore
  • Sydney
  • Tokyo

 

Recommended Reading

Go Back To > Enterprise + Business | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


NVIDIA TITAN V – The First Desktop Volta Graphics Card!

NVIDIA CEO Jensen Huang (recently anointed as Fortune 2017 Businessperson of the Year) made as surprise reveal at the NIPS conference – the NVIDIA TITAN V. This is the first desktop graphics card to be built on the latest NVIDIA Volta microarchitecture, and the first to use HBM2 memory.

In this article, we will share with you everything we know about the NVIDIA TITAN V, and how it compares against its TITANic predecessors. We will also share with you what we think could be a future NVIDIA TITAN Vp graphics card!

Updated @ 2017-12-10 : Added a section on gaming with the NVIDIA TITAN V [1].

Originally posted @ 2017-12-09

 

NVIDIA Volta

NVIDIA Volta isn’t exactly new. Back in GTC 2017, NVIDIA revealed NVIDIA Volta, the NVIDIA GV100 GPU and the first NVIDIA Volta-powered product – the NVIDIA Tesla V100. Jensen even highlighted the Tesla V100 in his Computex 2017 keynote, more than 6 months ago!

Yet there has been no desktop GPU built around NVIDIA Volta. NVIDIA continued to churn out new graphics cards built around the Pascal architecture – GeForce GTX 1080 Ti and GeForce GTX 1070 Ti. That changed with the NVIDIA TITAN V.

 

NVIDIA GV100

The NVIDIA GV100 is the first NVIDIA Volta-based GPU, and the largest they have ever built. Even using the latest 12 nm FFN (FinFET NVIDIA) process, it is still a massive chip at 815 mm²! Compare that to the GP100 (610 mm² @ 16 nm FinFET) and GK110 (552 mm² @ 28 nm).

That’s because the GV100 is built using a whooping 21.1 billion transistors. In addition to 5376 CUDA cores and 336 Texture Units, it boasts 672 Tensor cores and 6 MB of L2 cache. All those transistors require a whole lot more power – to the tune of 300 W.

[adrotate group=”1″]

 

The NVIDIA TITAN V

That’s V for Volta… not the Roman numeral V or V for Vendetta. Powered by the NVIDIA GV100 GPU, the TITAN V has 5120 CUDA cores, 320 Texture Units, 640 Tensor cores, and a 4.5 MB L2 cache. It is paired with 12 GB of HBM2 memory (3 x 4GB stacks) running at 850 MHz.

The blowout picture of the NVIDIA TITAN V reveals even more details :

  • It has 3 DisplayPorts and one HDMI port.
  • It has 6-pin + 8-pin PCIe power inputs.
  • It has 16 power phases, and what appears to be the Founders Edition copper heatsink and vapour chamber cooler, with a gold-coloured shroud.
  • There is no SLI connector, only what appears to be an NVLink connector.

Here are more pictures of the NVIDIA TITAN V, courtesy of NVIDIA.

 

Can You Game On The NVIDIA TITAN V? New!

Right after Jensen announced the TITAN V, the inevitable question was raised on the Internet – can it run Crysis / PUBG?

The NVIDIA TITAN V is the most powerful GPU for the desktop PC, but that does not mean you can actually use it to play games. NVIDIA notably did not mention anything about gaming, only that the TITAN V is “ideal for developers who want to use their PCs to do work in AI, deep learning and high performance computing.

[adrotate group=”2″]

In fact, the TITAN V is not listed in their GeForce Gaming section. The most powerful graphics card in the GeForce Gaming section remains the TITAN Xp.

Then again, the TITAN V uses the same NVIDIA Game Ready Driver as GeForce gaming cards, starting with version 388.59. Even so, it is possible that some or many games may not run well or properly on the TITAN V.

Of course, all this is speculative in nature. All that remains to crack this mystery is for someone to buy the TITAN V and use it to play some games!

Next Page > Specification Comparison, NVIDIA TITAN Vp?, The Official Press Release

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The NVIDIA TITAN V Specification Comparison

Let’s take a look at the known specifications of the NVIDIA TITAN V, compared to the TITAN Xp (launched earlier this year), and the TITAN X (launched late last year). We also inserted the specifications of a hypothetical NVIDIA TITAN Vp, based on a full GV100.

SpecificationsFuture TITAN Vp?NVIDIA TITAN VNVIDIA TITAN XpNVIDIA TITAN X
MicroarchitectureNVIDIA VoltaNVIDIA VoltaNVIDIA PascalNVIDIA Pascal
GPUGV100GV100GP102-400GP102-400
Process Technology12 nm FinFET+12 nm FinFET+16 nm FinFET16 nm FinFET
Die Size815 mm²815 mm²471 mm²471 mm²
Tensor Cores672640NoneNone
CUDA Cores5376512038403584
Texture Units336320240224
ROPsNANA9696
L2 Cache Size6 MB4.5 MB3 MB4 MB
GPU Core ClockNA1200 MHz1405 MHz1417 MHz
GPU Boost ClockNA1455 MHz1582 MHz1531 MHz
Texture FillrateNA384.0 GT/s
to
465.6 GT/s
355.2 GT/s
to
379.7 GT/s
317.4 GT/s
to
342.9 GT/s
Pixel FillrateNANA142.1 GP/s
to
151.9 GP/s
136.0 GP/s
to
147.0 GP/s
Memory TypeHBM2HBM2GDDR5XGDDR5X
Memory SizeNA12 GB12 GB12 GB
Memory Bus3072-bit3072-bit384-bit384-bit
Memory ClockNA850 MHz1426 MHz1250 MHz
Memory BandwidthNA652.8 GB/s547.7 GB/s480.0 GB/s
TDP300 watts250 watts250 watts250 watts
Multi GPU CapabilityNVLinkNVLinkSLISLI
Launch PriceNAUS$ 2999US$ 1200US$ 1200

 

The NVIDIA TITAN Vp?

In case you are wondering, the TITAN Vp does not exist. It is merely a hypothetical future model that we think NVIDIA may introduce mid-cycle, like the NVIDIA TITAN Xp.

Our TITAN Vp is based on the full capabilities of the NVIDIA GV100 GPU. That means it will have 5376 CUDA cores with 336 Texture Units, 672 Tensor cores and 6 MB of L2 cache. It will also have a higher TDP of 300 watts.

[adrotate group=”1″]

 

The Official NVIDIA TITAN V Press Release

December 9, 2017—NVIDIA today introduced TITAN V, the world’s most powerful GPU for the PC, driven by the world’s most advanced GPU architecture, NVIDIA Volta .

Announced by NVIDIA founder and CEO Jensen Huang at the annual NIPS conference, TITAN V excels at computational processing for scientific simulation. Its 21.1 billion transistors deliver 110 teraflops of raw horsepower, 9x that of its predecessor, and extreme energy efficiency.

“Our vision for Volta was to push the outer limits of high performance computing and AI. We broke new ground with its new processor architecture, instructions, numerical formats, memory architecture and processor links,” said Huang. “With TITAN V, we are putting Volta into the hands of researchers and scientists all over the world. I can’t wait to see their breakthrough discoveries.”

NVIDIA Supercomputing GPU Architecture, Now for the PC

TITAN V’s Volta architecture features a major redesign of the streaming multiprocessor that is at the center of the GPU. It doubles the energy efficiency of the previous generation Pascal design, enabling dramatic boosts in performance in the same power envelope.

New Tensor Cores designed specifically for deep learning deliver up to 9x higher peak teraflops. With independent parallel integer and floating-point data paths, Volta is also much more efficient on workloads with a mix of computation and addressing calculations. Its new combined L1 data cache and shared memory unit significantly improve performance while also simplifying programming.

Fabricated on a new TSMC 12-nanometer FFN high-performance manufacturing process customised for NVIDIA, TITAN V also incorporates Volta’s highly tuned 12GB HBM2 memory subsystem for advanced memory bandwidth utilisation.

 

Free AI Software on NVIDIA GPU Cloud

[adrotate group=”2″]

TITAN V’s incredible power is ideal for developers who want to use their PCs to do work in AI, deep learning and high performance computing.

Users of TITAN V can gain immediate access to the latest GPU-optimised AI, deep learning and HPC software by signing up at no charge for an NVIDIA GPU Cloud account. This container registry includes NVIDIA-optimised deep learning frameworks, third-party managed HPC applications, NVIDIA HPC visualisation tools and the NVIDIA TensorRT inferencing optimiser.

More Details : Now Everyone Can Use NVIDIA GPU Cloud!

 

Immediate Availability

TITAN V is available to purchase today for US$2,999 from the NVIDIA store in participating countries.

Go Back To > First PageArticles | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Mellanox Technologies Expands Presence In Malaysia

KUALA LUMPUR, April 12, 2017 – Mellanox Technologies, Ltd. (NASDAQ: MLNX) today unveiled its expansion plans for Malaysia. The announcement, which is in line with the country’s ambitions of becoming the leading Big Data Analytics (BDA) solutions hub in South East Asia, reiterated Mellanox’s commitment to Malaysia through its strategic investment roadmap.

 

Mellanox Technologies Expands Presence In Malaysia

“Malaysia’s investment in Big Data, data centers and the Cloud is impressive,” said Charlie Foo, Vice President and General Manager, Asia Pacific Japan, Mellanox Technologies. “With a year-over-year growth of more than 20 percent in the last five years, the field of digital data management is maturing rapidly. Mellanox’s investment in Malaysia looks to complement Malaysia’s advancing digital economy by providing intelligent 10, 25, 40, 50 and 100Gb/s interconnect solutions that serve today’s and future needs in Malaysia. This will enable organizations to be less concerned about today’s technological demands while concentrating on running their business, resulting in unparalleled operating efficiency for these organizations.”

Mellanox’s investment into Malaysia’s digital economy comes at a time when the country is ramping up its efforts to see its ICT roadmap to fruition. The country’s ICT custodian, Malaysia Digital Economy Corporation (MDEC), noted that MSC Malaysia — a national initiative designed to attract world-class technology companies to the country — reported a U.S. $3.88 billion in export sales in 2015, representing an 18 percent increase over 2014.

Today, the MSC Malaysia footprint has expanded to include 42 locations across the country, hosting more than 3,800 companies from more than 40 countries, employing more than 150,000 high-income knowledge workers, 85 percent which are Malaysians. This has propelled Malaysia to a top three ranking in AT Kearney’s Global Services Location Index since 2005, with only China and India ahead of Malaysia.

[adrotate banner=”4″]

Mellanox’s Open Ethernet switch family delivers the highest performance and port density with a complete chassis and fabric management solution, enabling converged data centers to operate at any scale while reducing operational costs and infrastructure complexity.

Mellanox InfiniBand solutions have already been chosen to accelerate large High Performance Computing (HPC) customers in Malaysia. HPC customers use super computers and parallel processing techniques for solving complex computational problems and performing research activities through computer modeling, simulation and analysis. These HPC customers span various industries including education, bioscience, governments, finance, media and entertainment, oil and gas, pharmaceutical and manufacturing.

The company is actively seeking partnerships and collaboration opportunities to support customers from different industries, primarily within Big Data, data centers and the Cloud.

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

New Dell HPC Systems & Collaborations Launched

Kuala Lumpur, 4 July 2016Dell has announced advancements to its high performance computing (HPC) portfolio, including the availability of new Dell HPC Systems and technology partner collaborations for early access to innovative HPC technologies.

“While traditional HPC has been critical to research programs that enable scientific and societal advancement, Dell is mainstreaming these capabilities to support enterprises of all sizes as they seek a competitive advantage in an ever increasing digital world,” said William Tan, head of Enterprise Solutions, Dell Malaysia. “As a clear leader in HPC, Dell now offers customers highly flexible, precision built HPC systems for multiple vertical industries based upon years of experience powering the world’s most advanced academic and research institutions. With Dell HPC Systems, our customers can deploy HPC systems more quickly and cost effectively and accelerate their speed of innovation to deliver both breakthroughs and business results.”

 

Dell HPC Systems Portfolio Simplifies Powerful, Traditional HPC System for Enterprises of All Sizes

Available in Malaysia and globally, the Dell HPC Systems portfolio is a family of HPC and data analytics solutions that combine the flexibility of customised HPC systems with the speed, simplicity and reliability of pre-configured systems. Dell engineers and domain experts designed and tuned the new systems for specific science, manufacturing and analytics workloads with fully tested and validated building block systems, backed by a single point of hardware support and additional service options across the solution lifecycle.

With simplified configuration and ordering, organisations can more quickly select and deploy updated Dell HPC Systems at any scale today. As an Intel Scalable System Framework configuration, these systems, available today, include the latest Intel Xeon processor families, support for Intel Omni-Path Architecture (Intel OPA) fabric, and software in the Dell HPC Lustre Storage and Dell HPC NFS Storage solutions:

  • Dell HPC System for Life Sciences – Designed to meet the needs of life sciences organisations, this enables bioinformatics and genomics centers to deliver results and identify treatments in clinically relevant timeframes while maintaining compliance and protecting confidential data.
  • Dell HPC System for Manufacturing –Enables manufacturing and engineering customers to run complex design simulations, including structural analysis and computational fluid dynamics.
  • Dell HPC System for Research – Enables research centers to quickly develop HPC systems that match the unique needs of a wide variety of workloads, involving complex scientific analysis.
[adrotate group=”1″]

 

Dell Leads HPC Technology Advancements with Industry Partners to Help Accelerate Customer Innovation Cycles

Dell has instituted a customer early access program for early development and testing in preparation for Dell’s next server offering in the HPC solutions portfolio, the Dell PowerEdge C6320p server, which will be available in the second half of 2016, with the Intel Xeon Phi processor (formerly code-named Knights Landing). The PowerEdge C6320p unique server engineering and design will enable customers to:

  • Gain insights faster with a modular building block design, engineered to deliver faster insights for data-intensive computations and scale-up parallel processing.
  • Accelerate performance in dense and highly parallel HPC environments with 72 cores that are specifically optimised for parallel computing.
  • Simplify and automate systems management with the integrated Dell Remote Access Controller 8 (iDRAC8) with Lifecycle Controller. Customers can deploy, monitor and update PowerEdge C6320p servers faster and ensure higher levels of service and availability.

The Texas Advanced Computing Center (TACC) at The University of Texas at Austin has partnered with Dell and Intel to deploy an upgrade to its Stampede supercomputing cluster with Intel Xeon Phi processors and Intel OPA via Dell’s early access program.

[adrotate group=”2″]

Stampede, one of the main clusters for the Extreme Science and Engineering Discovery Environment (XSEDE),is a multi-use, cyberinfrastructure resource offering large memory, large data transfer, and GPU capabilities for data-intensive, accelerated or visualisation computing for thousands of projects ranging from cancer cure research to severe weather modeling.

This month, the U.S. National Science Foundation awarded US$30 million to TACC to acquire and deploy Stampede 2 as a strategic national resource to provide HPC capabilities for thousands of researchers in the U.S. The new Dell HPC System is expected to deliver a peak performance of up to 18 petaflops, more than twice the system performance of the current Stampede system. Three and a half years since its installation, Stampede ranks as the 12th most powerful supercomputer in the world, according to the June 2016 TOP500 list.

Additionally, Dell continues to bring HPC capabilities to mainstream enterprises through a series of evolving solutions and services designed to deliver a range of HPC-as-a-Service capabilities, giving HPC sites a choice of local or remote management services with deployment on-premise, off-premise or a hybrid of the two.

Go Back To > Enterprise | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

NVIDIA Tesla P100 For PCIe-Based Servers Overview

On June 20, 2016, NVIDIA officially unveiled their Tesla P100 accelerator for PCIe-based servers. This is a long-expected PCI Express variant of the Tesla P100 accelerator that was launched in April using the NVIDIA NVLink interconnect. Let’s check out what’s new!

 

NVIDIA Tesla P100

The NVIDIA Tesla P100 was originally unveiled at the GPU Technology Conference on April 5, 2016. Touted as the world’s most advanced hyperscale data center accelerator, it was built around the new NVIDIA Pascal architecture and the proprietary NVIDIA NVLink high-speed GPU interconnect.

Like all other Pascal-based GPUs, the NVIDIA Tesla P100 is fabricated on the 16 nm FinFET process technology. Even with the much smaller process technology, the Tesla P100 is the largest FinFET chip ever built.

Unlike the Pascal-based GeForce GTX 1080 and GTX 1070 GPUs designed for desktop gaming though, the Tesla P100 uses HBM2 memory. In fact, the P100  is actually built on top of the HBM2 memory chips in a single package. This new package technology, Chip on Wafer on Substrate (CoWoS), allows for a 3X boost in memory bandwidth to 720 GB/s.

The NVIDIA NVLink interconnect allows up to eight Tesla P100 accelerators to be linked in a single node. This allows a single Tesla P100-based server node to outperform 48 dual-socket CPU server nodes.

 

Now Available With PCIe Interface

To make Tesla P100 available for HPC (High Performance Computing) applications, NVIDIA has just introduced the Tesla P100 with a PCI Express interface. This is basically the PCI Express version of the original Tesla P100.

 

Massive Leap In Performance

Such High Performance Computing servers can already make use of the NVIDIA Tesla K80 accelerators, that are based on the previous-generation NVIDIA Maxwell architecture. The new NVIDIA Pascal architecture, coupled with much faster HBM2 memory, allow for a massive leap in performance. Check out these results that NVIDIA provided :

Ultimately, the NVIDIA Tesla P100 for PCIe-based servers promises to deliver “dramatically more” performance for your money. As a bonus, the energy cost of running Tesla P100-based servers is much lower than CPU-based servers, and those savings accrue over time.

[adrotate banner=”5″]

 

Two Configurations

The NVIDIA Tesla P100 for PCIe-based servers will be slightly (~11-12%) slower than the NVLink version, turning out up to 4.7 teraflops of double-precision performance, 9.3 teraflops of single-precision performance, and 18.7 teraflops of half-precision performance.

The Tesla P100 will be offered in two configurations. The high-end configuration will have 16 GB of HBM2 memory with a maximum memory bandwidth of 720 GB/s. The lower-end configuration will have 12 GB of HBM2 memory with a maximum memory bandwidth of 540 GB/s.

 

Complete NVIDIA Slides

For those who are interested in more details, here are the NVIDIA Tesla P100 for PCIe-based Servers slides.

4 / 11
[adrotate banner=”5″]

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!