Tag Archives: HPC

AMD Instinct MI100 : 11.5 TFLOPS In A Single Card!

AMD just announced the Instinct MI100 – the world’s fastest HPC GPU accelerator, delivering 11.5 TFLOPS in a single card!

 

AMD Instinct MI100 : 11.5 TFLOPS In A Single Card!

Powered by the new CDNA architecture, the AMD Instinct MI100 is the world’s fastest HPC GPU, and the first to break the 10 TFLOPS FP64 barrier!

Compared to the last-generation AMD accelerators, the AMD Instinct MI100 offers HPC applications almost 3.5X faster performance (FP32 matrix), and AI applications nearly 7X boost in throughput (FP16).

  • up to 11.5 TFLOPS of FP64 performance for HPC
  • up to 46.1 TFLOPS of FP32 Matrix performance for AI and machine learning
  • up to 184.6 TFLOPS of FP16 performance for AI training

2nd Gen AMD Infinity Fabric

It also leverages on the 2nd Gen AMD Infinity Fabric technology to deliver twice the peer-to-peer IO bandwidth of PCI Express 4.0. Thanks to its triple Infinity Fabric Links, it offers up to 340 GB/s of aggregate bandwidth per card.

In a server, MI100 GPUs can be configured as two fully-connected quad GPU hives, each providing up to 552 GB/s of P2P IO bandwidth.

Ultra-Fast HBM2 Memory

The AMD Instinct MI100 comes with 32 GB of HBM2 memory that deliver up to 1.23 TB/s of memory bandwidth to support large datasets.

PCI Express 4.0 Interface

The AMD Instinct MI100 is supports PCI Express 4.0, allowing for up to 64 GB/s of peak bandwidth from CPU to GPU, when paired with 2nd Gen AMD EPYC processors.

AMD Instinct MI100 : Specifications

Specifications AMD Instinct MI100
Fab Process 7 nm
Compute Units 120
Stream Processors 7,680
Peak BFLOAT16
Peak INT4 | INT8
Peak FP16
Peak FP32
Peak FMA32
Peak FP64 | FMA64
92.3 TFLOPS
184.6 TOPS
184.6 TFLOPS
46.1 TFLOPS
23.1 TFLOPS
11.5 TFLOPS
Memory 32 GB HBM2
Memory Interface 4,096 bits
Memory Clock 1.2 GHz
Memory Bandwidth 1.2 TB/s
Reliability Full Chip ECC
RAS Support
Scalability 3 x Infinity Fabric Links
OS Support Linux 64-bit
Bus Interface PCIe Gen 3 / Gen 4
Board Form Factor Full Height, Dual Slot
Board Length 10.5-inch long
Cooling Passively Cooled
Max Board Power 300 W TDP
Warranty 3-Years Limited

 

AMD Instinct MI100 : Availability

The AMD Instinct MI100 will be available in systems by the end of 2020 from OEM/ODM partners like Dell, Gigabyte, Hewlett Packard Enterprise (HPE), and Supermicro.

 

Recommended Reading

Go Back To > Enterprise ITComputer HardwareHome

 

Support Tech ARP!

If you like our work, you can help support us by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


AMD CDNA Architecture : Tech Highlights!

In addition to the gaming-centric RDNA architecture, AMD just introduced a new CDNA architecture that is optimised for compute workloads.

Here are some key tech highlights of the new AMD CDNA architecture!

 

AMD CDNA Architecture : What Is It?

Unlike the fixed-function graphics accelerators of the past, GPUs are now fully-programmable accelerators using what’s called the GPGPU (General Purpose GPU) Architecture.

GPGPU allowed the industry to leverage their tremendous processing power for machine learning and scientific computing purposes.

Instead of continuing down the GPGPU path, AMD has decided to introduce two architectures :

  • AMD RDNA : optimised for gaming to maximise frames per second
  • AMD CDNA : optimised for compute workloads to maximise FLOPS per second.

Designed to accelerate compute workloads, AMD CDNA augments scalar and vector processing with new Matrix Core Engines, and adds Infinity Fabric technology for scale-up capability.

This allows the first CDNA-based accelerator – AMD Instinct MI100 – to break the 10 TFLOPS per second (FP64) barrier.

The GPU is connected to its host processor using a PCI Express 4.0 interface, that delivers up to 32 GB/s of bandwidth in both directions.

 

AMD CDNA Architecture : Compute Units

The command processor and scheduling logic receives API-level commands and translates them into compute tasks.

These compute tasks are implemented as compute arrays and managed by the four Asynchronous Compute Engines (ACE), which maintain their independent stream of commands to the compute units.

Its 120 compute units (CUs) are derived from the earlier GCN architecture, and organised into four compute engines that execute wavefronts that contain 64 work-items.

The CUs are, however, enhanced with new Matrix Core Engines, that are optimised for matrix data processing.

Here is the block diagram of the AMD Instinct MI100 accelerator, showing how its main blocks are all tied together with the on-die Infinity Fabric.

Unlike the RDNA architecture, CDNA removes all of the fixed-function graphics hardware for tasks like rasterisation, tessellation, graphics caches, blending and even the display engine.

CDNA retains the dedicated logic for HEVC, H.264 and VP9 decoding that is sometimes used for compute workloads that operate on multimedia data.

The new Matrix Core Engines add a new family of wavefront-level instructions – the Matrix Fused Multiply-Add (MFMA). The MFMA instructions perform mixed-precision arithmetic and operates on KxN matrices using four different types of input data :

  • INT8 – 8-bit integers
  • FP16 – 16-bit half-precision
  • bf16 – 16-bit brain FP
  • FP32 – 32-bit single-precision

The new Matrix Core Engines has several advantages over the traditional vector pipelines in GCN :

  • the execution unit reduces the number of register file reads, since many input values are reused in a matrix multiplication
  • narrower datatypes create opportunity for workloads that do not require full FP32 precision, e.g. machine learning – saving energy.

 

AMD CDNA Architecture : L2 Cache + Memory

Most scientific and machine learning data sets are gigabytes or even terabytes in size. Therefore L2 cache and memory performance is critical.

In CDNA, the L2 cache is shared across the entire chip, and physically partitioned into multiple slices.

The MI100, specifically, has an 8 MB cache that is 16-way set-associative and made up of 32 slices. Each slice can sustain 128 bytes for an aggregate bandwidth of over 6 TB/s across the GPU.

The CDNA memory controller can drive 4- or 8-stacks high of HBM2 memory at 2.4 GT/s for a maximum throughput of 1.23 TB/s.

The memory contents are also protected by hardware ECC.

 

AMD CDNA Architecture : Communication + Scaling

CDNA is also designed for scaling up, using the high-speed Infinity Fabric technology to connect multiple GPUs.

AMD Infinity Fabric links are 16-bits wide, and operate at 23 GT/s, with three links in CDNA to allow for full connectivity in a quad-GPU configuration.

While the last generation Radeon Instinct MI50 GPU only uses a ring topology, the new fully-connected Infinity Fabric topology boosts performance for common communication patterns like all-reduce and scatter / gather.

Unlike PCI Express, Infinity Fabric links support coherent GPU memory, which lets multiple GPUs share an address space and tightly work on a single task.

 

Recommended Reading

Go Back To > Enterprise ITComputer HardwareHome

 

Support Tech ARP!

If you like our work, you can help support us by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


AMD EPYC : Four Supercomputers In Top 50, Ten In Top 500!

AMD is on the roll, announcing more supercomputing wins for their 2nd Gen EPYC processors, including four supercomputers in the top 50 list, and ten in the top 500!

 

2nd Gen AMD EPYC : A Quick Primer

The 2nd Gen AMD EPYC family of server processors are based on the AMD Zen 2 microarchitecture and fabricated on the latest 7 nm process technology.

According to AMD, they offer up to 90% better integer performance and up to 79% better floating-point performance, than the competing Intel Xeon Platinum 8280 processor. For more details :

Here is a quick 7.5 minute summary of the 2nd Gen EPYC product presentations by Dr. Lisa Su, Mark Papermaster and Forrest Norrod!

 

AMD EPYC : Four Supercomputers In Top 50, Ten In Top 500!

Thanks to the greatly improved performance of their 2nd Gen EPYC processors, they now power four supercomputers in the top 50 list :

Top 50 Rank Supercomputer Processor
7 Selene
NVIDIA DGX A100 SuperPOD
AMD EPYC 7742
30 Belenos
Atos BullSequana XH2000
AMD EPYC 7H12
34 Joilot-Curie
Atos BullSequana XH2000
AMD EPYC 7H12
48 Mahti
Atos BullSequana XH2000
AMD EPYC 7H12

On top of those four supercomputers, there are another six other supercomputers in the Top 500 ranking, powered by AMD EPYC.

In addition to powering supercomputers, AMD EPYC 7742 processors will soon power Gigabyte servers selected by CERN to handle data from their Large Hadron Collider (LHC).

 

3rd Gen AMD EPYC Supercomputers

AMD also announced that two universities will deploy Dell EMC PowerEdge servers powered by the upcoming 3rd Gen AMD EPYC processors.

Indiana University

Indiana University will deploy Jetstream 2 – an eight-petaflop distributed cloud computing system, powered by the upcoming 3rd Gen AMD EPYC processors.

Jetstream 2 will be used by researchers in a variety of fields like AI, social sciences and COVID-19 research.

Purdue University

Purdue University will deploy Anvil – a supercomputer powered by the upcoming 3rd Gen AMD EPYC processors, for use in a wide range of computational and data-intensive research.

AMD EPYC will also power Purdue University’s community cluster “Bell”, scheduled for deployment in the fall.

 

Recommended Reading

Go Back To > Computer Hardware | Business | Home

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


Dell EMC Ready Solutions for AI + vHPC on VMware vSphere!

Dell Technologies just introduced Dell EMC Ready solutions for both AI and virtualised HPC workloads on VMware vSphere 7!

Join us for the tech briefing on both new Dell EMC computing solutions for VMware, and find out how it can simplify your advanced computing needs!

 

Simplified Advanced Computing With Dell EMC Ready Solutions

Let’s start with the Dell Technologies briefing on the two new Dell EMC Ready solutions for both AI and virtualised HPC workloads.

Based on VMware Cloud Foundation, they are designed to make AI easier to deploy and consume, with new features from VMware vSphere 7, including Bitfusion.

 

 

Dell EMC Ready Solutions for AI : GPU-as-a-Service (GaaS)

GPUs in individual workstations or servers are often under-utilised at less than 15% of capacity. The new Dell EMC Ready Solutions for AI : GPU-as-a-Service fixes that and maximises your investment with virtual GPU pools.

The newest design includes the latest VMware vSphere 7 with Bitfusion, making it possible to virtualise GPUs on-premise. Factory-installed by Dell, VMware vSphere 7 with Bitfusion will let developers and data scientists pool IT resources and share them across datacenters.

Dell EMC Ready Solutions for AI : GPU-as-a-Service also uses the latest VMware Cloud Foundation with VMware vSphere 7 support for Kubernetes and containerised applications to run AI workloads anywhere. Containers make it easier to bring cloud-native applications into production, with the ability to move workloads.

 

Dell EMC Ready Solutions for Virtualised HPC

Most HPC workloads run on dedicated systems that require specialised skills to deploy and manage. Dell EMC Ready Solutions for Virtualised HPC can include VMware Cloud Foundation with VMware vSphere 7 featuring Bitfusion.

That should make it simpler and more economical to use VMware environments for HPC and AI applications in computational chemistry, bioinformatics and computer-aided engineering. IT teams can quickly provision hardware as needed, speed up initial deployment and configuration, saving time with simpler centralised management and security.

For very large HPC implementations, Dell EMC Ready Solutions for vHPC can include VMware vSphere Scale-Out Edition for additional cost savings.

 

Dell EMC OpenManage for Dell EMC Ready Solutions

The new Dell EMC Ready Solutions for AI and Virtualised HPC ship with the Dell EMC OpenManage systems management software, which helps administrators improve system uptime, keep data insights flowing and prepare for AI operations.

New Dell EMC OpenManage improvements include :

  • OpenManage Integration for VMware vCenter, supporting vSphere Lifecycle Manager, automates software, driver and firmware updates holistically to save time and simplify operations.
  • The enhanced OpenManage Mobile app gives administrators the ability to view power and thermal policies, perform emergency power reduction and monitor internal storage from anywhere in the world.

 

Recommended Reading

Go Back To > Enterprise | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


2nd Gen EPYC – Everything You Need To Know Summarised!

Leveraging their new Zen 2 microarchitecture and 7 nm process technology, AMD just introduced their 2nd Gen EPYC processors.

Designed to challenge Intel Xeon in the enterprise, cloud and HPC markets, the 2nd Gen EPYC processors promise to deliver “record-setting performance“, while reducing TCO (Total Cost of Ownership) by up to 50%.

Here is everything you need to know about the new 2nd Gen EPYC processors… summarised!

 

The Official 2nd Gen EPYC Product Presentation Summary

Let’s start with a quick 7.5 minute summary of the 2nd Gen EPYC product presentations by Dr. Lisa Su, Mark Papermaster and Forrest Norrod!

Now, let’s take a look at its key features and specifications!

AMD Infinity Architecture Explained

The AMD Infinity Architecture is a fancy name for their new modular chiplet-based design. It allows them to combine up to eight processor dies with a single I/O die on the same package, faster and at lower cost.

The processor dies are fabricated with the industry-leading 7 nm process technology for best performance at lowest power consumption, and thermal output.

The I/O die, on the other hand, can be fabricated on the much cheaper 14 nm process technology, with a much higher yield.

2nd Gen EPYC Is Built On 7nm

The 2nd Gen EPYC processor cores are fabricated on the 7nm process technology. This allows AMD to fit more transistors into a smaller space.

By doubling the transistor density, coupled with microarchitectural optimisations, the 2nd Gen EPYC delivers 4X the floating point performance of the 1st Gen EPYC processors.

The smaller process also increases energy efficiency, reducing both power consumption and heat output. According to AMD, 2nd Gen EPYC will use half the power consumption as the 1st Gen EPYC at the same performance level.

Industry-Leading Performance

AMD claims they will offer up to 90% better integer performance and up to 79% better floating-point performance, than the competing Intel Xeon Platinum 8280 processor.

On top of significantly better performance per socket, they also come with hardware memory encryption, and a dedicated security processor.

Baked-In Security On Multiple Levels

The 2nd Gen EPYC processors are built-in with multiple levels of security features, to harden it against cyberattacks.

  • They have a secure root of trust designed to validate the initial BIOS boot without corruption.
    In virtualised environments, you can use it to cryptographically check that your entire software stack is booted without corruption.
  • They have memory encryption engines built into their memory channels to hardware-encrypt data in the memory, preventing cold boot attacks.
  • In the 2nd Gen EPYC, every virtual machine is now encrypted with one of up to 509 unique encryption keys known only to the processor.
    This protects your data even if a malicious VM finds its way into your virtual machine memory, or if a compromised hypervisor gains access into a guest VM.

2nd Gen EPYC Is PCI Express Gen 4 Ready!

Like the 3rd Gen Ryzen processors, the 2nd Gen EPYC is PCI Express Gen 4 ready.

PCIe 4.0 doubles the bandwidth over PCIe 3.0, and every EPYC processor has 128 lanes to tie together HPC clusters, or connect to GPU accelerators and NVMe drives.

[adrotate group=”1″]

 

2nd Gen EPYC Model, Specifications + Price Summary

For your convenience, we summarised the specifications and prices of the 2nd Gen EPYC models!

64-Core Models Cores /
Threads
Base Clock Boost Clock L3 Cache TDP 1K Price
EPYC 7742 64 / 128 2.25 GHz 3.4 GHz 256 MB 225 W $6,950
EPYC 7702 64 / 128 2.0 GHz 3.35 GHz 256 MB 200 W $6,450
EPYC 7702P 64 / 128 2.0 GHz 3.35 GHz 256 MB 200 W $4,425
48-Core Models Cores /
Threads
Base Clock Boost Clock L3 Cache TDP 1K Price
EPYC 7642 48 / 96 2.3 GHz 3.3 GHz 256 MB 225 W $4,775
EPYC 7552 48 / 96 2.2 GHz 3.3 GHz 192 MB 200 W $4,025
32-Core Models Cores /
Threads
Base Clock Boost Clock L3 Cache TDP 1K Price
EPYC 7542 32 / 64 2.9 GHz 3.4 GHz 128 MB 225 W $3,400
EPYC 7502 32 / 64 2.5 GHz 3.35 GHz 128 MB 180 W $2,600
EPYC 7502P 32 / 64 2.5 GHz 3.35 GHz 128 MB 180 W $2,300
EPYC 7452 32 / 64 2.35 GHz 3.35 GHz 128 MB 155 W $2,205
24-Core Models Cores /
Threads
Base Clock Boost Clock L3 Cache TDP 1K Price
EPYC 7402 24 / 48 2.8 GHz 3.35 GHz 128 MB 180 W $1,783
EPYC 7402P 24 / 48 2.8 GHz 3.35 GHz 128 MB 180 W $1,250
EPYC 7352 24 / 48 2.3 GHz 3.2 GHz 128 MB 155 W $1,350
16-Core Models Cores /
Threads
Base Clock Boost Clock L3 Cache TDP 1K Price
EPYC 7302 16 / 32 3.0 GHz 3.3 GHz 128 MB 155 W $978
EPYC 7302P 16 / 32 3.0 GHz 3.3 GHz 128 MB 155 W $825
EPYC 7282 16 / 32 2.8 GHz 3.2 GHz 64 MB 120 W $650
12-Core Models Cores /
Threads
Base Clock Boost Clock L3 Cache TDP 1K Price
EPYC 7272 12 / 24 2.9 GHz 3.2 GHz 64 MB 120 W $625
12-Core Models Cores /
Threads
Base Clock Boost Clock L3 Cache TDP 1K Price
EPYC 7262 8 / 16 3.2 GHz 3.4 GHz 128 MB 155 W $575
EPYC 7252 8 / 16 3.1 GHz 3.2 GHz 64 MB 120 W $475
EPYC 7232P 8 / 16 3.1 GHz 3.2 GHz 32 MB 120 W $450

 

2nd Gen EPYC Is Already Changing The Industry

AMD appeared to have shipped the 2nd Gen EPYC processors early to Google, where they were deployed in production servers for their internal datacenter infrastructure.

Google also plans to use the 2nd Gen EPYC processors in new general-purpose machines that are part of the Google Cloud Compute Engine.

Twitter has also announced that they are already using the 2nd Gen EPYC processors to reduce their datacenter TCO (total cost of ownership) by 25%.

 

Recommended Reading

Go Back To > Computer Hardware | Home

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


FIVE Dell AI Experience Zones Launched Across APJ!

In partnership with Intel, Dell Technologies announced the launch of five Dell AI Experience Zones across the APJ region!

Here is a quick primer on the new Dell AI Experience Zones, and what they mean for organisations in the APJ region!

 

The APJ Region – Ripe For Artificial Intelligence

According to the Dell Technologies Digital Transformation Index, Artificial Intelligence (AI) will be amongst the top spending priorities for business leaders in APJ.

Half of those surveyed plan to invest in AI in the next one to three years, as part of their digital transformation strategy. However, 95% of companies face a lack of in-house expertise in AI.

This is where the five new Dell AI Experience Zones come in…

 

The Dell AI Experience Zones

The new AI Experience Zones are designed to offer both customers and partners a comprehensive look at the latest AI technologies and solutions.

Built into the existing Dell Technologies Customer Solution Centres, they will showcase how the Dell EMC High-Performance Computing (HPC) and AI ecosystem can help them address business challenges and seize opportunities.

All five AI Experience Zones are equipped with technology demonstrations built around the latest Dell EMC PowerEdge servers. Powered by the latest Intel Xeon Scalable processors, they are paired with advanced, open-source AI software like VINO, as well as Dell EMC networking and storage technologies.

Customers and partners who choose to leverage the new AI Experience Zones will receive help in kickstarting their AI initiatives, from design and AI expert engagements, to masterclass training, installation and maintenance.

“The timely adoption of AI will create new opportunities that will deliver concrete business advantages across all industries and business functions,” says Chris Kelly, vice president, Infrastructure Solutions Group, Dell Technologies, APJ.

“Companies looking to thrive in a data drive era need to understand that investments in AI are no longer optional – they are business critical. Whilst complex in nature, it is imperative that companies quickly start moving from theoretical AI strategies to practical deployments to stay ahead of the curve.”

 

Dell AI Experience Zones In APJ

The five new AI Experience Zones that Dell Technologies and Intel announced are located within the Dell Technologies Customer Solution Centres in these cities :

  • Bangalore
  • Seoul
  • Singapore
  • Sydney
  • Tokyo

 

Recommended Reading

Go Back To > Enterprise + Business | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


NVIDIA TITAN V – The First Desktop Volta Graphics Card!

NVIDIA CEO Jensen Huang (recently anointed as Fortune 2017 Businessperson of the Year) made as surprise reveal at the NIPS conference – the NVIDIA TITAN V. This is the first desktop graphics card to be built on the latest NVIDIA Volta microarchitecture, and the first to use HBM2 memory.

In this article, we will share with you everything we know about the NVIDIA TITAN V, and how it compares against its TITANic predecessors. We will also share with you what we think could be a future NVIDIA TITAN Vp graphics card!

Updated @ 2017-12-10 : Added a section on gaming with the NVIDIA TITAN V [1].

Originally posted @ 2017-12-09

 

NVIDIA Volta

NVIDIA Volta isn’t exactly new. Back in GTC 2017, NVIDIA revealed NVIDIA Volta, the NVIDIA GV100 GPU and the first NVIDIA Volta-powered product – the NVIDIA Tesla V100. Jensen even highlighted the Tesla V100 in his Computex 2017 keynote, more than 6 months ago!

Yet there has been no desktop GPU built around NVIDIA Volta. NVIDIA continued to churn out new graphics cards built around the Pascal architecture – GeForce GTX 1080 Ti and GeForce GTX 1070 Ti. That changed with the NVIDIA TITAN V.

 

NVIDIA GV100

The NVIDIA GV100 is the first NVIDIA Volta-based GPU, and the largest they have ever built. Even using the latest 12 nm FFN (FinFET NVIDIA) process, it is still a massive chip at 815 mm²! Compare that to the GP100 (610 mm² @ 16 nm FinFET) and GK110 (552 mm² @ 28 nm).

That’s because the GV100 is built using a whooping 21.1 billion transistors. In addition to 5376 CUDA cores and 336 Texture Units, it boasts 672 Tensor cores and 6 MB of L2 cache. All those transistors require a whole lot more power – to the tune of 300 W.

[adrotate group=”1″]

 

The NVIDIA TITAN V

That’s V for Volta… not the Roman numeral V or V for Vendetta. Powered by the NVIDIA GV100 GPU, the TITAN V has 5120 CUDA cores, 320 Texture Units, 640 Tensor cores, and a 4.5 MB L2 cache. It is paired with 12 GB of HBM2 memory (3 x 4GB stacks) running at 850 MHz.

The blowout picture of the NVIDIA TITAN V reveals even more details :

  • It has 3 DisplayPorts and one HDMI port.
  • It has 6-pin + 8-pin PCIe power inputs.
  • It has 16 power phases, and what appears to be the Founders Edition copper heatsink and vapour chamber cooler, with a gold-coloured shroud.
  • There is no SLI connector, only what appears to be an NVLink connector.

Here are more pictures of the NVIDIA TITAN V, courtesy of NVIDIA.

 

Can You Game On The NVIDIA TITAN V? New!

Right after Jensen announced the TITAN V, the inevitable question was raised on the Internet – can it run Crysis / PUBG?

The NVIDIA TITAN V is the most powerful GPU for the desktop PC, but that does not mean you can actually use it to play games. NVIDIA notably did not mention anything about gaming, only that the TITAN V is “ideal for developers who want to use their PCs to do work in AI, deep learning and high performance computing.

[adrotate group=”2″]

In fact, the TITAN V is not listed in their GeForce Gaming section. The most powerful graphics card in the GeForce Gaming section remains the TITAN Xp.

Then again, the TITAN V uses the same NVIDIA Game Ready Driver as GeForce gaming cards, starting with version 388.59. Even so, it is possible that some or many games may not run well or properly on the TITAN V.

Of course, all this is speculative in nature. All that remains to crack this mystery is for someone to buy the TITAN V and use it to play some games!

Next Page > Specification Comparison, NVIDIA TITAN Vp?, The Official Press Release

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The NVIDIA TITAN V Specification Comparison

Let’s take a look at the known specifications of the NVIDIA TITAN V, compared to the TITAN Xp (launched earlier this year), and the TITAN X (launched late last year). We also inserted the specifications of a hypothetical NVIDIA TITAN Vp, based on a full GV100.

SpecificationsFuture TITAN Vp?NVIDIA TITAN VNVIDIA TITAN XpNVIDIA TITAN X
MicroarchitectureNVIDIA VoltaNVIDIA VoltaNVIDIA PascalNVIDIA Pascal
GPUGV100GV100GP102-400GP102-400
Process Technology12 nm FinFET+12 nm FinFET+16 nm FinFET16 nm FinFET
Die Size815 mm²815 mm²471 mm²471 mm²
Tensor Cores672640NoneNone
CUDA Cores5376512038403584
Texture Units336320240224
ROPsNANA9696
L2 Cache Size6 MB4.5 MB3 MB4 MB
GPU Core ClockNA1200 MHz1405 MHz1417 MHz
GPU Boost ClockNA1455 MHz1582 MHz1531 MHz
Texture FillrateNA384.0 GT/s
to
465.6 GT/s
355.2 GT/s
to
379.7 GT/s
317.4 GT/s
to
342.9 GT/s
Pixel FillrateNANA142.1 GP/s
to
151.9 GP/s
136.0 GP/s
to
147.0 GP/s
Memory TypeHBM2HBM2GDDR5XGDDR5X
Memory SizeNA12 GB12 GB12 GB
Memory Bus3072-bit3072-bit384-bit384-bit
Memory ClockNA850 MHz1426 MHz1250 MHz
Memory BandwidthNA652.8 GB/s547.7 GB/s480.0 GB/s
TDP300 watts250 watts250 watts250 watts
Multi GPU CapabilityNVLinkNVLinkSLISLI
Launch PriceNAUS$ 2999US$ 1200US$ 1200

 

The NVIDIA TITAN Vp?

In case you are wondering, the TITAN Vp does not exist. It is merely a hypothetical future model that we think NVIDIA may introduce mid-cycle, like the NVIDIA TITAN Xp.

Our TITAN Vp is based on the full capabilities of the NVIDIA GV100 GPU. That means it will have 5376 CUDA cores with 336 Texture Units, 672 Tensor cores and 6 MB of L2 cache. It will also have a higher TDP of 300 watts.

[adrotate group=”1″]

 

The Official NVIDIA TITAN V Press Release

December 9, 2017—NVIDIA today introduced TITAN V, the world’s most powerful GPU for the PC, driven by the world’s most advanced GPU architecture, NVIDIA Volta .

Announced by NVIDIA founder and CEO Jensen Huang at the annual NIPS conference, TITAN V excels at computational processing for scientific simulation. Its 21.1 billion transistors deliver 110 teraflops of raw horsepower, 9x that of its predecessor, and extreme energy efficiency.

“Our vision for Volta was to push the outer limits of high performance computing and AI. We broke new ground with its new processor architecture, instructions, numerical formats, memory architecture and processor links,” said Huang. “With TITAN V, we are putting Volta into the hands of researchers and scientists all over the world. I can’t wait to see their breakthrough discoveries.”

NVIDIA Supercomputing GPU Architecture, Now for the PC

TITAN V’s Volta architecture features a major redesign of the streaming multiprocessor that is at the center of the GPU. It doubles the energy efficiency of the previous generation Pascal design, enabling dramatic boosts in performance in the same power envelope.

New Tensor Cores designed specifically for deep learning deliver up to 9x higher peak teraflops. With independent parallel integer and floating-point data paths, Volta is also much more efficient on workloads with a mix of computation and addressing calculations. Its new combined L1 data cache and shared memory unit significantly improve performance while also simplifying programming.

Fabricated on a new TSMC 12-nanometer FFN high-performance manufacturing process customised for NVIDIA, TITAN V also incorporates Volta’s highly tuned 12GB HBM2 memory subsystem for advanced memory bandwidth utilisation.

 

Free AI Software on NVIDIA GPU Cloud

[adrotate group=”2″]

TITAN V’s incredible power is ideal for developers who want to use their PCs to do work in AI, deep learning and high performance computing.

Users of TITAN V can gain immediate access to the latest GPU-optimised AI, deep learning and HPC software by signing up at no charge for an NVIDIA GPU Cloud account. This container registry includes NVIDIA-optimised deep learning frameworks, third-party managed HPC applications, NVIDIA HPC visualisation tools and the NVIDIA TensorRT inferencing optimiser.

More Details : Now Everyone Can Use NVIDIA GPU Cloud!

 

Immediate Availability

TITAN V is available to purchase today for US$2,999 from the NVIDIA store in participating countries.

Go Back To > First PageArticles | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

NVIDIA Among 6 Companies In Exascale Computing Project

June 16, 2017 — NVIDIA is among six technology companies to receive funding from the U.S. Department of Energy’s Exascale Computing Project (ECP) to accelerate the development of next-generation supercomputers.

 

The Exascale Computing Project

The ECP mission is to facilitate the delivery of at least two exascale computing systems, with an aim to deliver at least one by 2021. Such systems would be approximately 50x more powerful than the nation’s fastest supercomputer, Titan, located at Oak Ridge National Laboratory, in use today.

The goal of the ECP PathForward programme is to find solutions that maximise the energy efficiency and overall performance of future large-scale supercomputers critical to areas such as national security, manufacturing, industrial competitiveness, and energy research.

In addition to performance, the DOE has ambitious goals for improving power efficiency, to achieve exascale performance using only 20-30 megawatts. By comparison, an exascale system built with CPUs alone could consume hundreds of megawatts.

 

NVIDIA In The Exascale Computing Project

NVIDIA has been researching and developing faster, more efficient GPUs for high performance computing for more than a decade. This is its sixth DOE research and development subcontract, which will help accelerate its efforts to develop highly efficient throughput computing technologies to ensure U.S. leadership in HPC.

[adrotate group=”2″]

NVIDIA’s R&D will focus on critical areas including energy-efficient GPU architectures and resilience. Its findings may be incorporated into future generation GPU architectures after Volta (which will be used in the DOE’s upcoming flagship Summit and Sierra supercomputers, scheduled to go online in 2018).

The DOE has placed a high priority on supercomputer research. Its PathForward technical requirements state, “The U.S. faces serious and urgent economic, environmental, and national security challenges based on energy, climate, and growing security threats. High performance computing is a requirement for addressing such challenges, and the need for the development of capable exascale computers has become critical for solving these problems.”

To facilitate and test its technology, NVIDIA research teams will collaborate closely with six national DOE laboratories: Argonne National Laboratory, Lawrence Berkeley National Laboratory, Lawrence Livermore National Laboratory, Los Alamos National Laboratory, Oak Ridge National Laboratory, and Sandia National Laboratories.

Go Back To > News | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Mellanox Technologies Expands Presence In Malaysia

KUALA LUMPUR, April 12, 2017 – Mellanox Technologies, Ltd. (NASDAQ: MLNX) today unveiled its expansion plans for Malaysia. The announcement, which is in line with the country’s ambitions of becoming the leading Big Data Analytics (BDA) solutions hub in South East Asia, reiterated Mellanox’s commitment to Malaysia through its strategic investment roadmap.

 

Mellanox Technologies Expands Presence In Malaysia

“Malaysia’s investment in Big Data, data centers and the Cloud is impressive,” said Charlie Foo, Vice President and General Manager, Asia Pacific Japan, Mellanox Technologies. “With a year-over-year growth of more than 20 percent in the last five years, the field of digital data management is maturing rapidly. Mellanox’s investment in Malaysia looks to complement Malaysia’s advancing digital economy by providing intelligent 10, 25, 40, 50 and 100Gb/s interconnect solutions that serve today’s and future needs in Malaysia. This will enable organizations to be less concerned about today’s technological demands while concentrating on running their business, resulting in unparalleled operating efficiency for these organizations.”

Mellanox’s investment into Malaysia’s digital economy comes at a time when the country is ramping up its efforts to see its ICT roadmap to fruition. The country’s ICT custodian, Malaysia Digital Economy Corporation (MDEC), noted that MSC Malaysia — a national initiative designed to attract world-class technology companies to the country — reported a U.S. $3.88 billion in export sales in 2015, representing an 18 percent increase over 2014.

Today, the MSC Malaysia footprint has expanded to include 42 locations across the country, hosting more than 3,800 companies from more than 40 countries, employing more than 150,000 high-income knowledge workers, 85 percent which are Malaysians. This has propelled Malaysia to a top three ranking in AT Kearney’s Global Services Location Index since 2005, with only China and India ahead of Malaysia.

[adrotate banner=”4″]

Mellanox’s Open Ethernet switch family delivers the highest performance and port density with a complete chassis and fabric management solution, enabling converged data centers to operate at any scale while reducing operational costs and infrastructure complexity.

Mellanox InfiniBand solutions have already been chosen to accelerate large High Performance Computing (HPC) customers in Malaysia. HPC customers use super computers and parallel processing techniques for solving complex computational problems and performing research activities through computer modeling, simulation and analysis. These HPC customers span various industries including education, bioscience, governments, finance, media and entertainment, oil and gas, pharmaceutical and manufacturing.

The company is actively seeking partnerships and collaboration opportunities to support customers from different industries, primarily within Big Data, data centers and the Cloud.

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

New Dell HPC Systems & Collaborations Launched

Kuala Lumpur, 4 July 2016Dell has announced advancements to its high performance computing (HPC) portfolio, including the availability of new Dell HPC Systems and technology partner collaborations for early access to innovative HPC technologies.

“While traditional HPC has been critical to research programs that enable scientific and societal advancement, Dell is mainstreaming these capabilities to support enterprises of all sizes as they seek a competitive advantage in an ever increasing digital world,” said William Tan, head of Enterprise Solutions, Dell Malaysia. “As a clear leader in HPC, Dell now offers customers highly flexible, precision built HPC systems for multiple vertical industries based upon years of experience powering the world’s most advanced academic and research institutions. With Dell HPC Systems, our customers can deploy HPC systems more quickly and cost effectively and accelerate their speed of innovation to deliver both breakthroughs and business results.”

 

Dell HPC Systems Portfolio Simplifies Powerful, Traditional HPC System for Enterprises of All Sizes

Available in Malaysia and globally, the Dell HPC Systems portfolio is a family of HPC and data analytics solutions that combine the flexibility of customised HPC systems with the speed, simplicity and reliability of pre-configured systems. Dell engineers and domain experts designed and tuned the new systems for specific science, manufacturing and analytics workloads with fully tested and validated building block systems, backed by a single point of hardware support and additional service options across the solution lifecycle.

With simplified configuration and ordering, organisations can more quickly select and deploy updated Dell HPC Systems at any scale today. As an Intel Scalable System Framework configuration, these systems, available today, include the latest Intel Xeon processor families, support for Intel Omni-Path Architecture (Intel OPA) fabric, and software in the Dell HPC Lustre Storage and Dell HPC NFS Storage solutions:

  • Dell HPC System for Life Sciences – Designed to meet the needs of life sciences organisations, this enables bioinformatics and genomics centers to deliver results and identify treatments in clinically relevant timeframes while maintaining compliance and protecting confidential data.
  • Dell HPC System for Manufacturing –Enables manufacturing and engineering customers to run complex design simulations, including structural analysis and computational fluid dynamics.
  • Dell HPC System for Research – Enables research centers to quickly develop HPC systems that match the unique needs of a wide variety of workloads, involving complex scientific analysis.
[adrotate group=”1″]

 

Dell Leads HPC Technology Advancements with Industry Partners to Help Accelerate Customer Innovation Cycles

Dell has instituted a customer early access program for early development and testing in preparation for Dell’s next server offering in the HPC solutions portfolio, the Dell PowerEdge C6320p server, which will be available in the second half of 2016, with the Intel Xeon Phi processor (formerly code-named Knights Landing). The PowerEdge C6320p unique server engineering and design will enable customers to:

  • Gain insights faster with a modular building block design, engineered to deliver faster insights for data-intensive computations and scale-up parallel processing.
  • Accelerate performance in dense and highly parallel HPC environments with 72 cores that are specifically optimised for parallel computing.
  • Simplify and automate systems management with the integrated Dell Remote Access Controller 8 (iDRAC8) with Lifecycle Controller. Customers can deploy, monitor and update PowerEdge C6320p servers faster and ensure higher levels of service and availability.

The Texas Advanced Computing Center (TACC) at The University of Texas at Austin has partnered with Dell and Intel to deploy an upgrade to its Stampede supercomputing cluster with Intel Xeon Phi processors and Intel OPA via Dell’s early access program.

[adrotate group=”2″]

Stampede, one of the main clusters for the Extreme Science and Engineering Discovery Environment (XSEDE),is a multi-use, cyberinfrastructure resource offering large memory, large data transfer, and GPU capabilities for data-intensive, accelerated or visualisation computing for thousands of projects ranging from cancer cure research to severe weather modeling.

This month, the U.S. National Science Foundation awarded US$30 million to TACC to acquire and deploy Stampede 2 as a strategic national resource to provide HPC capabilities for thousands of researchers in the U.S. The new Dell HPC System is expected to deliver a peak performance of up to 18 petaflops, more than twice the system performance of the current Stampede system. Three and a half years since its installation, Stampede ranks as the 12th most powerful supercomputer in the world, according to the June 2016 TOP500 list.

Additionally, Dell continues to bring HPC capabilities to mainstream enterprises through a series of evolving solutions and services designed to deliver a range of HPC-as-a-Service capabilities, giving HPC sites a choice of local or remote management services with deployment on-premise, off-premise or a hybrid of the two.

Go Back To > Enterprise | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!