Tag Archives: PCI Express

AMD Smart Access Memory : How To Enable It?

AMD Smart Access Memory (Resizable BAR) Guide

Find out what AMD Smart Access Memory is all about, and how to turn it on for a FREE BOOST in performance!

 

Smart Access Memory : PCIe Resizable BAR for AMD!

Smart Access Memory is AMD’s marketing term for their implementation of the PCI Express Resizable BAR (Base Address Registers) capability.

What does that mean exactly?

CPUs are traditionally limited to a 256 MB I/O memory address region for the GPU frame buffer. This of it as an “data dump” for stuff like textures, shaders and geometry.

Since this “data dump” is limited to 256 MB, the CPU can only send texture, shader and geometry data as and when the GPU requires them.

This introduces some latency – delay from when the GPU requires the data, and the CPU send them.

Turning on Resizable BAR or Smart Access Memory greatly expands the size of that data dump, letting the CPU directly access the GPU’s entire frame buffer memory.

Instead of transferring data when requested by the GPU, the CPU processes and stores the data directly in the graphics memory.

Graphics assets can be transferred to graphics memory in full, instead of in pieces. In addition, multiple transfers can occur simultaneously, instead of being queued up.

While this AMD graphic above suggests that Smart Access Memory will widen the memory path (and thus memory bandwidth) between the CPU and GPU, that is not true.

Smart Access Memory / Resizable BAR will not increase memory bandwidth.

What it does is let the CPU directly access the entire GPU frame buffer memory, instead of using the usual 256 MB “dump”. That reduces latency because the graphics assets are now accessible by the GPU at all times.

 

AMD Smart Access Memory : Performance Gains

According to AMD, enabling Smart Access Memory will give you a small but free boost of 5% to 11% in gaming performance.

Here is a summary of the test results from our article, RX 6800 XT Smart Access Memory Performance Comparison!

You can expect up to 16% better performance in some games, but no effect in certain games. But overall, you get a free boost in performance. There is simply no reason not to enable Smart Access Memory.

1080p Resolution (1920 x 1080)

1440p Resolution (2560 x 1440)

2160p Resolution (3840 x 2160)

 

AMD Smart Access Memory : Requirements

Since Smart Access Memory is just an AMD implementation of PCI Express Resizable BAR. Therefore, it can be be implemented for all PCI Express 3.0 and PCI Express 4.0 graphics cards and motherboards.

However, AMD is currently limiting it to a small subset of components, having validated it only for their new Ryzen 5000 series CPUs, select Ryzen 3000 Series Processors and Radeon RX 6000 series graphics cards.

So this is what you currently require to enable AMD Smart Access Memory :

Hardware

Software

  • AMD Radeon Software Driver 20.11.2 or newer
  • Latest Motherboard BIOS (AMD AGESA 1.1.0.0 or newer)

AMD currently recommends these X570 motherboards, because they have updated BIOS available :

 

AMD Smart Access Memory : How To Enable It?

If you have all of those supported components above, and updated your motherboard BIOS, you need to manually enable Smart Access Memory.

Now, the method will vary from motherboard to motherboard, and it probably won’t even be called Smart Access Memory.

Instead, look for variations of Above 4G Decoding, or Resizing BAR, or Resizable BAR, or Re-Size BAR Support.

AMD Generic Method

AMD has provided these generic steps to enable Smart Access Memory :

  1. Enter the System BIOS by press <DEL> or <F12> during the system startup.
  2. Navigate to the Advanced Settings or Advanced menu.
  3. Enable “Above 4G Decoding” and “Re-Size BAR Support“.
  4. Save the changes and restart the computer.

Step-by-Step Method For ASUS Crosshair VIII Hero

In our guide, we are using the ASUS CROSSHAIR VIII Hero (AMD X570) motherboard, as an example :

  1. First you will need to turn off CSM (Compatibility Support Module), or make sure it’s disabled.Go to the Boot menu and look for a CSM / Compatibility Support Module option.

  1. Set CSM (Compatibility Support Module) to Disabled.

  1. Go to the Advanced menu and look for the PCI Subsystem. In other motherboards, look for PCIe / PCI Express configuration options.

  1. Enable Above 4G Decoding.

  1. This will give you access to the Re-Size BAR Support option. Set it to Auto.

  1. Now go to the Exit menu, and select Save Changes & Reset.

  1. It will ask you to confirm the changes. Just verify both, and click OK.

After the motherboard reboots, AMD Smart Access Memory (PCIe Resizable BAR) will be enabled for your Ryzen 5000 series CPU and Radeon RX 6000 series graphics card!

 

CSM Warning For GIGABYTE AORUS X570 Master

AMD currently recommends these X570 motherboards, because they have updated BIOS available :

CSM is disabled by default for the ASUS, ASRock and MSI motherboards. However, it is enabled by default in the GIGABYTE AORUS X570 Master.

If you installed Windows without first turning CSM off, it will be configured as non-UEFI. It will NOT boot if you enable Resizable BAR Support (Smart Access Memory).

You will need to reinstall Windows with CSM support disabled.

 

Recommended Reading

Go Back To > Computer | GamingHome

 

Support Tech ARP!

If you like our work, you can help support us by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


Smart Access Memory Now Enabled For Ryzen 3000 CPUs!

AMD just enabled Smart Access Memory for select Ryzen 3000 desktop processors!

Find out what Smart Access Memory does, and how to enable it for a FREE boost in performance!

 

Smart Access Memory Now Enabled For Ryzen 3000 CPUs!

When AMD launched the Radeon RX 6700 XT graphics card, they also mentioned that Smart Access Memory is now enabled for Ryzen 3000 desktop processors, except :

  • AMD Ryzen 5 3400G
  • AMD Ryzen 3 3200G

This would give those older processors a small but FREE boost in performance, when paired with Radeon RX 6000 series graphics cards and AMD 500 series motherboards.

To enable Smart Access Memory for your Ryzen 3000 / 5000 series PC, please follow the steps! 

Unfortunately, AMD has not enabled Smart Access Memory for Radeon RX 5000 series graphics cards, or AMD 400 series motherboards yet.

Recommended : AMD Smart Access Memory (Resizable BAR) Guide

 

Smart Access Memory : How Does It Boost Ryzen 3000 Performance?

Smart Access Memory is AMD’s marketing term for their implementation of the PCI Express Resizable BAR (Base Address Registers) capability.

What does that mean exactly?

CPUs are traditionally limited to a 256 MB I/O memory address region for the GPU frame buffer. This of it as an “data dump” for stuff like textures, shaders and geometry.

Since this “data dump” is limited to 256 MB, the CPU can only send texture, shader and geometry data as and when the GPU requires them.

This introduces some latency – delay from when the GPU requires the data, and the CPU send them.

Turning on Resizable BAR or Smart Access Memory greatly expands the size of that data dump, letting the CPU directly access the GPU’s entire frame buffer memory.

Instead of transferring data when requested by the GPU, the CPU processes and stores the data directly in the graphics memory.

Graphics assets can be transferred to graphics memory in full, instead of in pieces. In addition, multiple transfers can occur simultaneously, instead of being queued up.

While this AMD graphic above suggests that Smart Access Memory will widen the memory path (and thus memory bandwidth) between the CPU and GPU, that is not true.

Smart Access Memory / Resizable BAR will not increase memory bandwidth.

What it does is let the CPU directly access the entire GPU frame buffer memory, instead of using the usual 256 MB “dump”. That reduces latency because the graphics assets are now accessible by the GPU at all times.

 

Smart Access Memory For Ryzen 3000 : Requirements

This is what you currently require to enable AMD Smart Access Memory for Ryzen 3000 desktop processors :

Hardware

Software

  • AMD Radeon Software Driver 20.11.2 or newer
  • Latest Motherboard BIOS (AMD AGESA 1.1.0.0 or newer)

AMD currently recommends these X570 motherboards, because they have updated BIOS available :

 

Recommended Reading

Go Back To > Computer | GamingHome

 

Support Tech ARP!

If you like our work, you can help support us by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


NVIDIA To Introduce Resizable BAR In February 2021!

NVIDIA just announced that they will introduce Resizable BAR support at the end of February 2021!

Find out what Resizable BAR is all about, and why it matters!

 

Resizable BAR : What Is It?

Resizable BAR is an optional PCI Express feature, that can deliver a small but free boost in performance for the graphics card.

CPUs are traditionally limited to a 256 MB I/O memory address region for the GPU frame buffer. The CPU can only transfer data like textures, shaders and geometry to the GPU through that small 256 MB “window”.

Turning on Resizable BAR expands that small access window, letting the CPU directly access the GPU’s entire frame buffer memory.

Those graphics assets can thus be sent in full, instead of in pieces. In addition, multiple transfers can occur simultaneously, instead of being queued up.

 

NVIDIA To Introduce Resizable BAR In February 2021!

AMD was first out the door with Resizable BAR in November 2020, launching it as Smart Access Memory.

It gave their Radeon RX 6800 XT graphics card a free performance boost of up to 16% in some games, but no effect in other games.

You can check out the performance difference in our article, RX 6800 XT Smart Access Memory Performance Comparison!

On 12 January 2021, NVIDIA announced that they will be able to introduce Resizable BAR support in GeForce drivers from Late February 2021 onwards.

It will be limited to their GeForce RTX 30 series graphics cards and laptops, but will work with both Intel and AMD CPUs.

The newly announced GeForce RTX 3060 will ship with support for Resizable BAR. However, older GeForce RTX 30 series cards will need to have their VBIOS updated from March 2021 onwards.

The motherboard must also be updated with Resizable BAR support. According to Intel, this will be limited to 11th Gen platforms, and select 10th Gen platforms.

So ironically, Resizable BAR will first work on GeForce RTX 30 series graphics cards paired with AMD Ryzen 5000 processors and AMD 500-series motherboards!

 

Recommended Reading

Go Back To > Computer | Gaming | Home

Support Tech ARP!

If you like this review, please support us by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


RX 6800 XT Smart Access Memory Performance Comparison!

Find out what AMD Smart Access Memory is all about, and how much of a performance effect it really has on the Radeon RX 6800 XT graphics card!

 

RX 6800 XT Smart Access Memory : How Does It Improve Performance?

Smart Access Memory is really a marketing term for AMD’s implementation of the PCI Express Resizable BAR (Base Address Registers) capability.

CPUs are traditionally limited to a 256 MB I/O memory address “window” for the GPU frame buffer.

Turning on Resizable BAR or Smart Access Memory removes that small access window, letting the CPU directly access the Radeon RX 6800 XT‘s graphics memory.

While the AMD graphics above suggest that Smart Access Memory will widen the memory path, and thus memory bandwidth, between the CPU and GPU, that’s not true.

It does not increase memory bandwidth. Instead, it speeds up CPU to GPU communications, by letting the CPU directly access more of GPU memory, instead of using the usual 256 MB “window”.

Recommended : AMD Smart Access Memory – How To Enable It?

 

RX 6800 XT Smart Access Memory : 3DMark

The 3DMark benchmark results don’t show any significant performance difference, with Smart Access Memory enabled.

 

RX 6800 XT Smart Access Memory : Game Performance Summary

But let’s look at its effect on the real world gaming performance…

Let’s start with a bird’s eye look at the performance effect of Smart Access Memory on the Radeon RX 6800 XT‘s performance.

For more detailed look at Smart Access Memory’s effect on each game, please click to the next page.

1080p Resolution (1920 x 1080)

At 1080p, Smart Access Memory improved frame rates by about 4.33%, and does not always give a performance boost to the Radeon RX 6800 XT.

It had virtually no performance effect in World War Z, The Division 2 and Star Control : Origins.

On the other hand, it delivered up to 16% better frame rates in Total War : Troy.

1440p Resolution (2560 x 1440)

Smart Access Memory had a bigger (5.22% average) effect on the Radeon RX 6800 XT at 1440p.

It had no effect in four games – Metro Exodus, World War Z, The Division 2 and Star Control : Origins.

But it has a large 10%-11% performance boost in F1 2019, Total War : Troy, Dirt 5 and Gears Tactics.

2160p Resolution (3840 x 2160)

At the 4K resolution though, the average performance boost from Smart Access Memory dropped to just 3.11%.

Most of the games had insignificant boosts in frame rates of 2-3%. Oddly enough, World War Z received a significant 4% boost in frame rate at 4K.

F1 2019 received the biggest boost from Smart Access Memory – a large 14% boost in frame rate!

Next Page > RX 6800 XT Smart Access Memory Game Performance

 

Support Tech ARP!

If you like our work, you can help support us by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


RX 6800 XT Smart Access Memory : Gaming Performance

F1 2019

F1 2019 really benefited from Smart Access Memory, with significant boosts in frame rates :

  • 1080p : +6.0%
  • 1440p : +10.8%
  • 2160p : +14.0%

Metro Exodus

On the other hand, Smart Access Memory had no effect on Metro Exodus.

World War Z

World War Z had uneven results with Smart Access Memory, with the greater effect at 4K :

  • 1080p : -1.3%
  • 1440p : +0.5%
  • 2160p : +4.0%

Total War : Troy

Total War : Troy benefited greatly from Smart Access Memory, especially at the 1080p and 1440p resolutions.

  • 1080p : +15.7%
  • 1440p : +10.0%
  • 2160p : +4.0%

The Division 2

The Division 2 actually performed slightly worse with Smart Access Memory enabled :

  • 1080p : -0.6%
  • 1440p : No difference
  • 2160p : -1.5%

Dirt 5

Dirt 5 benefited the most at the 1440p resolution :

  • 1080p : +4.0%
  • 1440p : +10.3%
  • 2160p : +2.5%

Shadow of the Tomb Raider

Shadow of the Tomb Raider 5 benefited the most at the 1080p and 1440p resolutions :

  • 1080p : +7.9%
  • 1440p : +5.5%
  • 2160p : +4.0%

Gears Tactics

Gears Tactics benefited the most at the 1080p and 1440p resolutions :

  • 1080p : +5.7%
  • 1440p : +9.6%
  • 2160p : +3.3%

Star Control: Origins

Smart Access Memory had no effect on Star Control: Origins.

 

Recommended Reading

Go Back To > First PageComputer | GamingHome

 

Support Tech ARP!

If you like our work, you can help support us by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


PEG Port VC1/Map from The Tech ARP BIOS Guide!

PEG Port VC1/Map

Common Options : Disabled, TC1, TC2, TC3, TC4, TC5, TC6, TC7

 

PEG Port VC1/Map : A Quick Review

Unlike the sideband signals used to prioritize traffic on the AGP or PCI bus, PCI Express uses virtual channels and traffic classes (also called transaction classes) to decide what traffic gets higher priority to bandwidth on the bus at any particular time.

The PEG Port VC1/Map BIOS feature allows you to manually map a specific traffic class to the second (VC1) virtual channel of the PCI Express graphics port.

This is the higher-priority virtual channel, so mapping a specific traffic class to it will increase bandwidth allocation priority for that traffic class. However, this is not a requirement.

When set to Disabled, no traffic class will be manually mapped to the VC1 virtual channel.

When set to TC1, the TC1 traffic class will be manually mapped to the VC1 virtual channel. This allows traffic with a higher priority than TC0 to be given access to a higher priority virtual channel.

When set to TC2, the TC2 traffic class will be manually mapped to the VC1 virtual channel. This allows traffic with a higher priority than TC1 to be given access to a higher priority virtual channel.

When set to TC3, the TC3 traffic class will be manually mapped to the VC1 virtual channel. This allows traffic with a higher priority than TC2 to be given access to a higher priority virtual channel.

When set to TC4, the TC4 traffic class will be manually mapped to the VC1 virtual channel. This allows traffic with a higher priority than TC3 to be given access to a higher priority virtual channel.

When set to TC5, the TC5 traffic class will be manually mapped to the VC1 virtual channel. This allows traffic with a higher priority than TC4 to be given access to a higher priority virtual channel.

When set to TC6, the TC6 traffic class will be manually mapped to the VC1 virtual channel. This allows traffic with a higher priority than TC5 to be given access to a higher priority virtual channel.

When set to TC7, the TC7 traffic class will be manually mapped to the VC1 virtual channel. This allows only traffic with the highest priority to be given access to a higher priority virtual channel.

Generally, it is recommended that you leave this BIOS feature at the default setting of TC7. This allows only the highest priority traffic to be given access to the higher priority VC1 channel.

 

PEG Port VC1/Map : The Full Details

Unlike the sideband signals used to prioritize traffic on the AGP or PCI bus, PCI Express uses virtual channels and traffic classes (also called transaction classes) to decide what traffic gets a higher priority to bandwidth on the bus at any particular time.

PCI Express requires each port to support at least one, and up to eight Virtual Channels (VC0 to VC7). Each port is also required to support at least one, and up to eight Traffic Classes (TC0 to TC7).

In short, each port must support at least VC0 and TC0. It can support additional virtual channels or traffic classes up to VC7 and TC7, but that is optional.

Virtual channels are used to allow easy division of bandwidth according to demand and availability. Each virtual channel has its own set of queues, buffers and control logic, which allow independent flow control between multiple virtual channels.

If more than one virtual channel is supported, each subsequent virtual channel has a higher priority than the default VC0 channel. In other words, VC1 has a higher priority than VC0, but a lower priority than VC3. The last virtual channel, VC7, has the highest priority.

Traffic classes, on the other hand, are used to separate system traffic into different priority levels. If more than one traffic class is supported, each subsequent traffic class is higher in priority than the default TC0 class.

In other words, TC1 traffic is higher in priority than TC0, but lower in priority than TC3. The last traffic class, TC7, is the highest in priority.

[adrotate group=”1″]

The PCI Express specifications require TC0 to be mapped to VC0 at the very least. This is essentially hardwired. The other virtual channels and traffic classes can be assigned to each other as required. There are just some considerations to note :

  • A single virtual channel can be shared by multiple traffic classes (i.e. VC0 can be shared by TC0, TC1 and TC2)
  • Each traffic class must be assigned to a virtual channel. There can be no unassigned traffic class.
  • Each traffic class can be assigned to only one virtual channel. It cannot be shared by multiple virtual channels. (i.e. TC1 cannot be assigned to both VC0 and VC1)

The PEG Port VC1/Map BIOS feature allows you to manually map a specific traffic class to the second (VC1) virtual channel of the PCI Express graphics port. This is the higher-priority virtual channel, so mapping a specific traffic class to it will increase bandwidth allocation priority for that traffic class. However, this is not a requirement.

When set to Disabled, no traffic class will be manually mapped to the VC1 virtual channel.

When set to TC1, the TC1 traffic class will be manually mapped to the VC1 virtual channel. This allows traffic with a higher priority than TC0 to be given access to a higher priority virtual channel.

When set to TC2, the TC2 traffic class will be manually mapped to the VC1 virtual channel. This allows traffic with a higher priority than TC1 to be given access to a higher priority virtual channel.

When set to TC3, the TC3 traffic class will be manually mapped to the VC1 virtual channel. This allows traffic with a higher priority than TC2 to be given access to a higher priority virtual channel.

When set to TC4, the TC4 traffic class will be manually mapped to the VC1 virtual channel. This allows traffic with a higher priority than TC3 to be given access to a higher priority virtual channel.

When set to TC5, the TC5 traffic class will be manually mapped to the VC1 virtual channel. This allows traffic with a higher priority than TC4 to be given access to a higher priority virtual channel.

When set to TC6, the TC6 traffic class will be manually mapped to the VC1 virtual channel. This allows traffic with a higher priority than TC5 to be given access to a higher priority virtual channel.

When set to TC7, the TC7 traffic class will be manually mapped to the VC1 virtual channel. This allows only traffic with the highest priority to be given access to a higher priority virtual channel.

Generally, it is recommended that you leave this BIOS feature at the default setting of TC7. This allows only the highest priority traffic to be given access to the higher priority VC1 channel.

 

Recommended Reading

Go Back To > Tech ARP BIOS GuideComputer | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


PCIE Spread Spectrum from The Tech ARP BIOS Guide!

PCIE Spread Spectrum

Common Options : Down Spread, Disabled

 

PCIE Spread Spectrum : A Quick Review

Spread spectrum clocking works by continuously modulating the clock signal around a particular frequency. This “spreads out” the power output and “flattens” the spikes of signal waveform, keeping them below the FCC limit.

The PCIE Spread Spectrum BIOS feature controls spread spectrum clocking of the PCI Express interconnect.

When set to Down Spread, the motherboard modulates the PCI Express interconnect’s clock signal downwards by a small amount. Because the clock signal is modulated downwards, there is a slight reduction in performance.

The amount of modulation is not revealed and depends on what the manufacturer has qualified for the motherboard. However, the greater the modulation, the greater the reduction of EMI and performance.

When set to Disabled, the motherboard disables any modulation of the PCI Express interconnect’s clock signal.

Generally, frequency modulation via this feature should not cause any problems. Since the motherboard only modulates the signal downwards, system stability is not compromised.

However, spread spectrum clocking can interfere with the operation of timing-critical devices like clock-sensitive SCSI devices. If you are using such devices on the PCI Express interconnect, you must disable PCIE Spread Spectrum.

System stability may also be compromised if you are overclocking the PCI Express interconnect. Therefore, it is recommended that you disable this feature if you are overclocking the PCI Express interconnect.

Of course, if EMI reduction is still important to you, enable this feature by all means, but you may have to reduce the PCI Express interconnect frequency a little to provide a margin of safety.

If you are not overclocking the PCI Express interconnect, the decision to enable or disable this feature is really up to you. If you have electronic devices nearby that are affected by the EMI generated by your motherboard, or have sensitive data that must be safeguarded from electronic eavesdropping, enable this feature.

Otherwise, disable it to remove even the slightest possibility of stability issues.

[adrotate group=”1″]

 

PCIE Spread Spectrum : The Full Details

All clock signals have extreme values (spikes) in their waveform that create EMI (Electromagnetic Interference). This EMI interferes with other electronics in the area. There are also claims that it allows electronic eavesdropping of the data being transmitted.

To prevent EMI from causing problems to other electronics, the FCC enacted Part 15 of the FCC regulations in 1975. It regulates the power output of such clock generators by limiting the amount of EMI they can generate. As a result, engineers use spread spectrum clocking to ensure that their motherboards comply with the FCC regulation on EMI levels.

Spread spectrum clocking works by continuously modulating the clock signal around a particular frequency. Instead of generating a typical waveform, the clock signal continuously varies around the target frequency within a tight range. This “spreads out” the power output and “flattens” the spikes of signal waveform, keeping them below the FCC limit.

Clock signal (courtesy of National Instruments)

The same clock signal, with spread spectrum clocking

The PCIE Spread Spectrum BIOS feature controls spread spectrum clocking of the PCI Express interconnect.

When set to Down Spread, the motherboard modulates the PCI Express interconnect’s clock signal downwards by a small amount. Because the clock signal is modulated downwards, there is a slight reduction in performance.

The amount of modulation is not revealed and depends on what the manufacturer has qualified for the motherboard. However, the greater the modulation, the greater the reduction of EMI and performance.

When set to Disabled, the motherboard disables any modulation of the PCI Express interconnect’s clock signal.

Generally, frequency modulation via this feature should not cause any problems. Since the motherboard only modulates the signal downwards, system stability is not compromised.

However, spread spectrum clocking can interfere with the operation of timing-critical devices like clock-sensitive SCSI devices. If you are using such devices on the PCI Express interconnect, you must disable PCIE Spread Spectrum.

System stability may also be compromised if you are overclocking the PCI Express interconnect. Of course, this depends on the amount of modulation, the extent of overclocking and other factors like temperature, voltage levels, etc. As such, the problem may not readily manifest itself immediately.

Therefore, it is recommended that you disable this feature if you are overclocking the PCI Express interconnect. You will be able to achieve better overclockability, at the expense of higher EMI.

Of course, if EMI reduction is still important to you, enable this feature by all means, but you may have to reduce the PCI Express interconnect frequency a little to provide a margin of safety.

If you are not overclocking the PCI Express interconnect, the decision to enable or disable this feature is really up to you. If you have electronic devices nearby that are affected by the EMI generated by your motherboard, or have sensitive data that must be safeguarded from electronic eavesdropping, enable this feature.

Otherwise, disable it to remove even the slightest possibility of stability issues.

 

Recommended Reading

Go Back To > Tech ARP BIOS GuideComputer | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


Intel Nervana NNP-T1000 PCIe + Mezzanine Cards Revealed!

The new Intel Nervana NNP-T1000 neural network processor comes in PCIe and Mezzanine card options designed for AI training acceleration.

Here is EVERYTHING you need to know about the Intel Nervana NNP-T1000 PCIe and Mezzanine card options!

 

Intel Nervana Neural Network Processors

Intel Nervana neural network processors, NNPs for short, are designed to accelerated two key deep learning technologies – training and inference.

To target these two different tasks, Intel created two AI accelerator families – Nervana NNP-T that’s optimised for training, and Nervana NNP-I that’s optimised for inference.

They are both paired with a full software stack, developed with open components and deep learning framework integration.

Recommended : Intel Nervana AI Accelerators : Everything You Need To Know!

 

Intel Nervana NNP-T1000

The Intel Nervana NNP-T1000 is not only capable of training even the most complex deep learning models, it is highly scalable – offering near linear scaling and efficiency.

By combining compute, memory and networking capabilities in a single ASIC, it allows for maximum efficiency with flexible and simple scaling.

Each Nervana NNP-T1000 is powered by up to 24 Tensor Processing Clusters (TPCs), and comes with 16 bi-directional Inter-Chip Links (ICL).

Its TPC supports 32-bit floating point (FP32) and brain floating point (bfloat16) formats, allowing for multiple deep learning primitives with maximum processing efficiency.

Its high-speed ICL communication fabric allows for near-linear scaling, directly connecting multiple NNP-T cards within servers, between servers and even inside and across racks.

  • High compute utilisation using Tensor Processing Clusters (TPC) with bfloat16 numeric format
  • Both on-die SRAM and on-package High-Bandwidth Memory (HBM) keep data local, reducing movement
  • Its Inter-Chip Links (ICL) glueless fabric architecture and fully-programmable router achieves near-linear scaling across multiple cards, systems and PODs
  • Available in PCIe and OCP Open Accelerator Module (OAM) form factors
  • Offers a programmable Tensor-based instruction set architecture (ISA)
  • Supports common open-source deep learning frameworks like TensorFlow, PaddlePaddle and PyTorch

 

Intel Nervana NNP-T1000 Models

The Intel Nervana NNP-T1000 is currently available in two form factors – a dual-slot PCI Express card, and a OAM Mezzanine Card, with these specifications :

Specifications Intel Nervana NNP-T1300 Intel Nervana NNP-T1400
Form Factor Dual-slot PCIe Card OAM Mezzanine Card
Compliance PCIe CEM OAM 1.0
Compute Cores 22 TPCs 24 TPCs
Frequency 950 MHz 1100 MHz
SRAM 55 MB on-chip, with ECC 60 MB on-chip, with ECC
Memory 32 GB HBM2, with ECC 32 GB HBM2, with ECC
Memory Bandwidth 2.4 Gbps (300 MB/s)
Inter-Chip Link (ICL) 16 x 112 Gbps (448 GB/s)
ICL Topology Ring Ring, Hybrid Cube Mesh,
Fully Connected
Multi-Chassis Scaling Yes Yes
Multi-Rack Scaling Yes Yes
I/O to Host CPU PCIe Gen3 / Gen4 x16
Thermal Solution Passive, Integrated Passive Cooling
TDP 300 W 375 W
Dimensions 265.32 mm x 111.15 mm 165 mm x 102 mm

 

Intel Nervana NNP-T1000 PCIe Card

This is what the Intel Nervana NNP-T1000 (also known as the NNP-T1300) PCIe card looks like :

 

Intel Nervana NNP-T1000 OAM Mezzanine Card

This is what the Intel Nervana NNP-T1000 (also known as NNP-T1400) Mezzanine card looks like :

 

Recommended Reading

[adrotate group=”2″]

Go Back To > Business + Enterprise | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


PCI-E Max Read Request Size – The Tech ARP BIOS Guide

PCI-E Max Read Request Size

Common Options : Automatic, Manual – User Defined

 

Quick Review of PCI-E Max Read Request Size

This BIOS feature can be used to ensure a fairer allocation of PCI Express bandwidth. It determines the largest read request any PCI Express device can generate. Reducing the maximum read request size reduces the hogging effect of any device with large reads.

When set to Automatic, the BIOS will automatically select a maximum read request size for PCI Express devices. Usually, this would be a manufacturer-preset value that’s designed with maximum “fairness“, rather than performance in mind.

When set to Manual – User Defined, you will be allowed to enter a numeric value (in bytes). Although it appears as though you can enter any value, you must only enter one of these values :

128 – This sets the maximum read request size to 128 bytes. All PCI Express devices will only be allowed to generate read requests of up to 128 bytes in size.

256 – This sets the maximum read request size to 256 bytes. All PCI Express devices will only be allowed to generate read requests of up to 256 bytes in size.

512 – This sets the maximum read request size to 512 bytes. All PCI Express devices will only be allowed to generate read requests of up to 512 bytes in size.

1024 – This sets the maximum read request size to 1024 bytes. All PCI Express devices will only be allowed to generate read requests of up to 1024 bytes in size.

2048 – This sets the maximum read request size to 2048 bytes. All PCI Express devices will only be allowed to generate read requests of up to 2048 bytes in size.

4096 – This sets the maximum read request size to 4096 bytes. This is the largest read request size currently supported by the PCI Express protocol. All PCI Express devices will be allowed to generate read requests of up to 4096 bytes in size.

It is recommended that you set this BIOS feature to 4096, as it maximizes performance by allowing all PCI Express devices to generate as large a read request as they require. However, this will be at the expense of devices that generate smaller read requests.

Even so, this is generally not a problem unless they require a certain degree of quality of service. For example, you may experience glitches with the audio output (e.g. stuttering) of a PCI Express sound card when its reads are delayed by a bandwidth-hogging graphics card.

If such problems arise, reduce the maximum read request size. This reduces the amount of bandwidth any PCI Express device can hog at the expense of the other devices.

 

Details of PCI-E Max Read Request Size

Arbitration for PCI Express bandwidth is based on the number of requests from each device. However, the size of each request is not taken into account. As such, if some devices request much larger data reads than others, the PCI Express bandwidth will be unevenly allocated between those devices.

This can cause problems for applications that have specific quality of service requirements. These application may not have timely access to the requested data simply because another PCI Express device is hogging the bandwidth by requesting for very large data reads.

This BIOS feature can be used to correct that and ensure a fairer allocation of PCI Express bandwidth. It determines the largest read request any PCI Express device can generate. Reducing the maximum read request size reduces the hogging effect of any device with large reads.

However, doing so reduces the performance of devices that generate large reads. Instead of generating large but fewer reads, they will have to generate smaller reads but in greater numbers. Because arbitration is done according to the number of requests, they will have to wait longer for the data requested.

[adrotate group=”1″]

When set to Automatic, the BIOS will automatically select a maximum read request size for PCI Express devices. Usually, this would be a manufacturer-preset value that’s designed with maximum “fairness“, rather than performance in mind.

When set to Manual – User Defined, you will be allowed to enter a numeric value (in bytes). Although it appears as though you can enter any value, you must only enter one of these values :

128 – This sets the maximum read request size to 128 bytes. All PCI Express devices will only be allowed to generate read requests of up to 128 bytes in size.

256 – This sets the maximum read request size to 256 bytes. All PCI Express devices will only be allowed to generate read requests of up to 256 bytes in size.

512 – This sets the maximum read request size to 512 bytes. All PCI Express devices will only be allowed to generate read requests of up to 512 bytes in size.

1024 – This sets the maximum read request size to 1024 bytes. All PCI Express devices will only be allowed to generate read requests of up to 1024 bytes in size.

2048 – This sets the maximum read request size to 2048 bytes. All PCI Express devices will only be allowed to generate read requests of up to 2048 bytes in size.

4096 – This sets the maximum read request size to 4096 bytes. This is the largest read request size currently supported by the PCI Express protocol. All PCI Express devices will be allowed to generate read requests of up to 4096 bytes in size.

It is recommended that you set this BIOS feature to 4096, as it maximizes performance by allowing all PCI Express devices to generate as large a read request as they require. However, this will be at the expense of devices that generate smaller read requests.

Even so, this is generally not a problem unless they require a certain degree of quality of service. For example, you may experience glitches with the audio output (e.g. stuttering) of a PCI Express sound card when its reads are delayed by a bandwidth-hogging graphics card.

If such problems arise, reduce the maximum read request size. This reduces the amount of bandwidth any PCI Express device can hog at the expense of the other devices.

Go Back To > The Tech ARP BIOS Guide | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

PCI-E Reference Clock from The Tech ARP BIOS Guide

PCI-E Reference Clock

Common Options : 100 MHz, adjustable in 1 MHz steps

 

Quick Review of PCI-E Reference Clock

All PCI Express slots use a 100 MHz reference clock to generate its clocking signals. This is where the PCI-E Reference Clock BIOS option comes in. It controls the frequency of the PCI Express reference clock.

By default, the PCI-E Reference Clock is set to 100 MHz. This is the official reference clock speed for the PCI Express interface. Some BIOSes allow you to adjust this reference clock, usually in steps of 1 MHz.

Adjusting the PCI Express reference clock changes its signalling rate and bandwidth. However, because the PCI Express x16 interface already has such high bandwidth, overclocking it would only have a small effect on real world performance.

In motherboards that suffer from the PCI Express x1 bug, adjusting the reference clock speed up or down can potentially “trick” the motherboard to restore the PCI Express slot to its full x16 mode. However, raising the PCI Express reference clock to 120 MHz can cause timing-sensitive PCI Express devices like SATA controllers to fail. Therefore, it is recommended that you do not exceed 115 MHz, should you choose to overclock the PCI Express reference clock.

 

Details of PCI-E Reference Clock

The PCI Express interface is made up of a series of unidirectional, serial point-to-point links. Each PCI Express lane consists of a pair of those links, making it bidirectional. In its slowest form (PCI Express 1.x), each PCI Express lane has a data transfer rate of 250 MB/s in each direction. The newer PCI Express 2.0 doubles the data transfer rate to 500 MB/s per lane.

For high-bandwidth applications, multiple PCI Express lanes are used to greatly increase the data transfer rate. Each PCI Express slot can support a variety of lanes, from just one lane (x1) up to 32 lanes (x32). At the moment though, the “widest” slot available is the PCI Express x16.

In motherboards that support the PCI Express 1.x standard, the x16 slot delivers a maximum bandwidth of 4 GB/s with a signalling rate of 2.5 gigatransfers per second. The new PCI Express 2.0 standard doubles the signalling rate and the x16 slot’s bandwidth to 8 GB/s.

Whether your motherboard supports the PCI Express 1.x standard or the newer PCI Express 2.0 standard, all PCI Express slots use a 100 MHz reference clock to generate its clocking signals. This is where the PCI-E Reference Clock BIOS option comes in. It controls the frequency of the PCI Express reference clock.

[adrotate group=”2″]

By default, the PCI-E Reference Clock is set to 100 MHz. This is the official reference clock speed for the PCI Express interface. Some BIOSes allow you to adjust this reference clock, usually in steps of 1 MHz.

Adjusting the PCI Express reference clock changes its signalling rate and bandwidth. For example, increasing the reference clock frequency to 110 MHz would raise the PCI Express signalling rate by 10% to 2.75 gigatransfers/s (PCI Express 1.x) or 5.5 gigatransfers/s (PCI Express 2.0). However, because the PCI Express x16 interface already has such high bandwidth, overclocking it would only have a small effect on real world performance.

In motherboards that suffer from the PCI Express x1 bug, adjusting the reference clock speed up or down can potentially “trick” the motherboard to restore the PCI Express slot to its full x16 mode. However, raising the PCI Express reference clock to 120 MHz can cause timing-sensitive PCI Express devices like SATA controllers to fail. Therefore, it is recommended that you do not exceed 115 MHz, should you choose to overclock the PCI Express reference clock.

Go Back To > The Tech ARP BIOS Guide | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!


PCI-E Maximum Payload Size – The BIOS Optimization Guide

PCI-E Maximum Payload Size

Common Options : 128, 256, 512, 1024, 2048, 4096

 

Quick Review

The PCI-E Maximum Payload Size BIOS feature determines the maximum TLP (Transaction Layer Packet) payload size used by the PCI Express controller. The TLP payload size determines the amount of data transmitted within each data packet.

When set to 128, the PCI Express controller will only use a maximum data payload of 128 bytes within each TLP.

When set to 256, the PCI Express controller will only use a maximum data payload of 256 bytes within each TLP.

When set to 512, the PCI Express controller will only use a maximum data payload of 512 bytes within each TLP.

When set to 1024, the PCI Express controller will only use a maximum data payload of 1024 bytes within each TLP.

When set to 2048, the PCI Express controller will only use a maximum data payload of 2048 bytes within each TLP.

When set to 4096, the PCI Express controller uses the maximum data payload of 4096 bytes within each TLP. This is the maximum payload size currently supported by the PCI Express protocol.

It is recommended that you set PCI-E Maximum Payload Size to 4096, as this allows all PCI Express devices connected to send up to 4096 bytes of data in each TLP. This gives you maximum efficiency per transfer.

However, this is subject to the PCI device connected to it. If that device only supports a maximum TLP payload size of 512 bytes, the motherboard chipset will communicate with it with a maximum TLP payload size of 512 bytes, even if you set this BIOS feature to 4096.

On the other hand, if you set this BIOS feature to a low value like 256, it will force all connected devices to use a maximum payload size of 256 bytes, even if they support a much larger TLP payload size.

 

Details of PCI-E Maximum Payload Size

The PCI Express protocol transmits data as well as control messages on the same links. This differs the PCI Express interconnect from the PCI bus and the AGP port, which make use of separate sideband signalling for control messages.

Control messages are delivered as Data Link Layer Packets or DLLPs, while data packets are sent out as Transaction Layer Packets or TLPs. However, TLPs are not pure data packets. They have a header which carries information like packet size, message type, traffic class, etc.

In addition, the actual data (known as the “payload”) is encoded with the 8B/10B encoding scheme. This replaces 8 uncoded bits with 10 encoded bits. This itself results in a 20% “loss” of bandwidth. The TLP overhead is further exacerbated by a 32-bit LCRC error-checking code.

Therefore, the size of the data payload is an important factor in determining the efficiency of the PCI Express interconnect. As the data payload gets smaller, the TLP becomes less efficient, because the overhead will then take up a more significant amount of bandwidth. To achieve maximum efficiency, the TLP should be as large as possible.

The PCI Express specifications defined the following TLP payload sizes :

  • 128 bytes
  • 256 bytes
  • 512 bytes
  • 1024 bytes
  • 2048 bytes
  • 4096 bytes

However, it is up to the manufacturer to set the maximum TLP payload size supported by the PCI Express device. It determines the maximum TLP payload size the device can send or receive. When two PCI Express devices communicate with each other, the largest TLP payload size supported by both devices will be used.

[adrotate group=”1″]

The PCI-E Maximum Payload Size BIOS feature determines the maximum TLP (Transaction Layer Packet) payload size used by the PCI Express controller. The TLP payload size, as mentioned earlier, determines the amount of data transmitted within each data packet.

When set to 128, the PCI Express controller will only use a maximum data payload of 128 bytes within each TLP.

When set to 256, the PCI Express controller will only use a maximum data payload of 256 bytes within each TLP.

When set to 512, the PCI Express controller will only use a maximum data payload of 512 bytes within each TLP.

When set to 1024, the PCI Express controller will only use a maximum data payload of 1024 bytes within each TLP.

When set to 2048, the PCI Express controller will only use a maximum data payload of 2048 bytes within each TLP.

When set to 4096, the PCI Express controller uses the maximum data payload of 4096 bytes within each TLP. This is the maximum payload size currently supported by the PCI Express protocol.

It is recommended that you set PCI-E Maximum Payload Size to 4096, as this allows all PCI Express devices connected to send up to 4096 bytes of data in each TLP. This gives you maximum efficiency per transfer.

However, this is subject to the PCI device connected to it. If that device only supports a maximum TLP payload size of 512 bytes, the motherboard chipset will communicate with it with a maximum TLP payload size of 512 bytes, even if you set this BIOS feature to 4096.

On the other hand, if you set this BIOS feature to a low value like 256, it will force all connected devices to use a maximum payload size of 256 bytes, even if they support a much larger TLP payload size.

Go Back To > The BIOS Optimization Guide | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

PCI Express Burn-in Mode – The BIOS Optimization Guide

PCI Express Burn-in Mode

Common Options : Default, 101.32MHz, 102.64MHz, 103.96MHz, 105.28MHz, 106.6MHz, 107.92MHz, 109.24MHz

 

Quick Review

The PCI Express Burn-in Mode BIOS feature allows you to overclock the PCI Express bus, even if Intel stamps its foot petulantly and insist that it is not meant for this purpose. While it does not give you direct control of the bus clocks, it allows some overclocking of the PCI Express bus.

When this BIOS feature is set to Default, the PCI Express bus runs at its normal speed of 33MHz.

When this BIOS feature is set to 101.32MHz, the PCI Express bus runs at a higher speed of 101.32MHz.

When this BIOS feature is set to 102.64MHz, the PCI Express bus runs at a higher speed of 102.64MHz.

When this BIOS feature is set to 103.96MHz, the PCI Express bus runs at a higher speed of 103.96MHz.

When this BIOS feature is set to 105.28MHz, the PCI Express bus runs at a higher speed of 105.28MHz.

When this BIOS feature is set to 106.6MHz, the PCI Express bus runs at a higher speed of 106.6MHz.

When this BIOS feature is set to 107.92MHz, the PCI Express bus runs at a higher speed of 107.92MHz.

When this BIOS feature is set to 109.24MHz, the PCI Express bus runs at a higher speed of 109.24MHz.

For better performance, it is recommended that you set this BIOS feature to 109.24MHz. This overclocks the PCI Express bus by about 9%, which should not cause any stability problems with most PCI Express devices. But if you encounter any stability issues, use a lower setting.

[adrotate group=”1″]

 

Details of PCI Express Burn-in Mode

While many motherboard manufacturers allow you to overclock various system clocks, Intel officially does not condone or support overclocking. Therefore, motherboards sold by Intel lack BIOS features that allow you to directly modify bus clocks.

However, some Intel motherboards come with a PCI Express Burn-in Mode BIOS feature. This ostensibly allows you to “burn-in” PCI Express devices with a slightly higher bus speed before settling back to the normal bus speed.

Of course, you can use this BIOS feature to overclock the PCI Express bus, even if Intel stamps its foot petulantly and insist that it is not meant for this purpose. While it does not give you direct control of the bus clocks, it allows some overclocking of the PCI Express bus.

When this BIOS feature is set to Default, the PCI Express bus runs at its normal speed of 33MHz.

When this BIOS feature is set to 101.32MHz, the PCI Express bus runs at a higher speed of 101.32MHz.

When this BIOS feature is set to 102.64MHz, the PCI Express bus runs at a higher speed of 102.64MHz.

When this BIOS feature is set to 103.96MHz, the PCI Express bus runs at a higher speed of 103.96MHz.

When this BIOS feature is set to 105.28MHz, the PCI Express bus runs at a higher speed of 105.28MHz.

When this BIOS feature is set to 106.6MHz, the PCI Express bus runs at a higher speed of 106.6MHz.

When this BIOS feature is set to 107.92MHz, the PCI Express bus runs at a higher speed of 107.92MHz.

When this BIOS feature is set to 109.24MHz, the PCI Express bus runs at a higher speed of 109.24MHz.

As you can see, this BIOS feature doesn’t allow much play with the clock speed. You can only adjust the clock speeds upwards by about 9%.

For better performance, it is recommended that you set this BIOS feature to 109.24MHz. This overclocks the PCI Express bus by about 9%, which should not cause any stability problems with most PCI Express devices. But if you encounter any stability issues, use a lower setting.

Go Back To > The BIOS Optimization Guide | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Maximum TLP Payload – The BIOS Optimization Guide

Maximum TLP Payload

Common Options : 128, 256, 512, 1024, 2048, 4096

 

Quick Review of Maximum TLP Payload

The Maximum TLP Payload BIOS feature determines the maximum TLP (Transaction Layer Packet) payload size that the motherboard’s PCI Express controller should use. The TLP payload size determines the amount of data transmitted within each data packet.

When set to 128, the motherboard’s PCI Express controller will only support a maximum data payload of 128 bytes within each TLP.

When set to 256, the motherboard’s PCI Express controller will only support a maximum data payload of 256 bytes within each TLP.

When set to 512, the motherboard’s PCI Express controller will only support a maximum data payload of 512 bytes within each TLP.

When set to 1024, the motherboard’s PCI Express controller will only support a maximum data payload of 1024 bytes within each TLP.

When set to 2048, the motherboard’s PCI Express controller will only support a maximum data payload of 2048 bytes within each TLP.

When set to 4096, the motherboard’s PCI Express controller supports the maximum data payload of 4096 bytes within each TLP. This is the maximum payload size currently supported by the PCI Express protocol.

It is recommended that you set the Maximum TLP Payload BIOS feature to 4096, as this allows all PCI Express devices connected to send up to 4096 bytes of data in each TLP. This gives you maximum efficiency per transfer.

However, this is subject to the PCI device connected to it. If that device only supports a maximum TLP payload size of 512 bytes, the PCI Express controller will communicate with it with a maximum TLP payload size of 512 bytes, even if you set this BIOS feature to 4096.

On the other hand, if you set the Maximum TLP Payload BIOS feature to a low value like 256, it will force all connected devices to use a maximum payload size of 256 bytes, even if they support a much larger TLP payload size.

 

Details of Maximum TLP Payload

The PCI Express protocol transmits data as well as control messages on the same links. This differs the PCI Express interconnect from the PCI bus and the AGP port, which make use of separate sideband signalling for control messages.

Control messages are delivered as Data Link Layer Packets or DLLPs, while data packets are sent out as Transaction Layer Packets or TLPs. However, TLPs are not pure data packets. They have a header which carries information like packet size, message type, traffic class, etc.

In addition, the actual data (known as the “payload”) is encoded with the 8B/10B encoding scheme. This replaces 8 uncoded bits with 10 encoded bits. This itself results in a 20% “loss” of bandwidth. The TLP overhead is further exacerbated by a 32-bit LCRC error-checking code.

Therefore, the size of the data payload is an important factor in determining the efficiency of the PCI Express interconnect. As the data payload gets smaller, the TLP becomes less efficient, because the overhead will then take up a more significant amount of bandwidth. To achieve maximum efficiency, the TLP should be as large as possible.

The PCI Express specifications defined the following TLP payload sizes :

  • 128 bytes
  • 256 bytes
  • 512 bytes
  • 1024 bytes
  • 2048 bytes
  • 4096 bytes

However, it is up to the manufacturer to set the maximum TLP payload size supported by the PCI Express device. It determines the maximum TLP payload size the device can send or receive. When two PCI Express devices communicate with each other, the largest TLP payload size supported by both devices will be used.

[adrotate group=”1″]

The Maximum TLP Payload BIOS feature determines the maximum TLP (Transaction Layer Packet) payload size that the motherboard’s PCI Express controller should use. The TLP payload size, as mentioned earlier, determines the amount of data transmitted within each data packet.

When set to 128, the motherboard’s PCI Express controller will only support a maximum data payload of 128 bytes within each TLP.

When set to 256, the motherboard’s PCI Express controller will only support a maximum data payload of 256 bytes within each TLP.

When set to 512, the motherboard’s PCI Express controller will only support a maximum data payload of 512 bytes within each TLP.

When set to 1024, the motherboard’s PCI Express controller will only support a maximum data payload of 1024 bytes within each TLP.

When set to 2048, the motherboard’s PCI Express controller will only support a maximum data payload of 2048 bytes within each TLP.

When set to 4096, the motherboard’s PCI Express controller supports the maximum data payload of 4096 bytes within each TLP. This is the maximum payload size currently supported by the PCI Express protocol.

It is recommended that you set the Maximum TLP Payload BIOS feature to 4096, as this allows all PCI Express devices connected to send up to 4096 bytes of data in each TLP. This gives you maximum efficiency per transfer.

However, this is subject to the PCI device connected to it. If that device only supports a maximum TLP payload size of 512 bytes, the PCI Express controller will communicate with it with a maximum TLP payload size of 512 bytes, even if you set this BIOS feature to 4096.

On the other hand, if you set the Maximum TLP Payload BIOS feature to a low value like 256, it will force all connected devices to use a maximum payload size of 256 bytes, even if they support a much larger TLP payload size.

Go Back To > BIOS Optimization Guide | Home

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

The Plextor M8Se NVMe SSD Series Launched

Plextor just announced the debut of the Plextor M8Se series solid state drives (SSDs) that built around the ultra-high-speed NVMe interfaces. The drives also use the industry’s flagship TLC NAND and control chip components, and specialized heat sinks with streamlined aesthetic design.

 

The Plextor M8Se NVMe SSD Series

The Plextor M8Se series also caters to a variety of user requirements for SSD upgrades or system assembly, with both PCIe and M.2 2280 specifications. The PCIe version adopts lines drawn from fluid mechanics and a professional blue and black high-performance heat sink design. This presents dynamic ultra-fast speed aesthetics, along with more efficient thermal conductivity that can quickly eliminate the heat generated by M8Se’s high speed transmission and help the computer system maintain optimal operating efficiency.

The M8Se series has the most trustworthy service life and stability of any SSD. The Marvell control chip and Toshiba Super-High-Performance TLC NAND flash memory also allow the M8Se series to have outstanding performance, read/write service life, capacity, and stability, even better than the usual TLC SSD.

The M8Se series is powered by the latest generation NVMe PCIe Gen 3x 4 super high-speed transmission interface that delivers high bandwidth and low latency that allows sequential read/write access speeds of up to 2,450/1,000 MB/s and random read/write speeds of up to 210,000/175,000 IOPS. Whether it’s for work, fun, or multimedia applications, the M8Se is the perfect solution to speed up your system and provide the ultimate user experience.

Equipped with Plextor firmware technology and advanced LDCP debugging capability, the M8Se series has greatly enhanced read/write reliability. Its exclusive PlexNitro write cache technology can ensure the most reliable read/write performance, and its extended SSD service life guarantees peace of mind to long-term users.

 

[adrotate banner=”4″]

The Plextor M8Se Series Availability

The Plextor M8Se series is anticipated to officially reach the market in June 2017. In addition to providing the PCle expansion card and M.2 2280 specifications, capacities of 128GB, 256GB, 512GB, and 1TB are also available to satisfy players’ diverse needs for system construction and expansion.

The M8Se series products have passed Plextor’s stringent quality tests, with a 1.5 million hour MTBF (mean time between failures) guarantee in addition to a 3-year warranty period.

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

Init Display First – The BIOS Optimization Guide

Init Display First

Common Options : AGP or PCIe, PCI

 

Quick Review

The Init Display First BIOS feature allows you to select whether to boot the system using the PCIe / AGP graphics card or the PCI graphics card. This is important if you have both PCIe / AGP and PCI graphics cards.

If you are only using a single graphics card, the BIOS will ignore this BIOS setting and boot the computer using that graphics card. However, there may be a slight reduction in the time taken to detect and initialize the card if you select the proper setting. For example, if you only use a PCIe / AGP graphics card, then setting Init Display First to PCIe or AGP may speed up your system’s booting-up process.

If you are only using a single graphics card, it is recommended that you set the Init Display First feature to the proper setting for your system :

  • PCIe for a single PCIe card,
  • AGP for a single AGP card, and
  • PCI for a single PCI card.

But if you are using multiple graphics cards, it is up to you which card you want to use as your primary display card. It is recommended that you select the fastest graphics card as the primary display card.

 

Details

Although the PCI Express and AGP buses were designed exclusively for the graphics subsystem, some users still have to use PCI graphics cards for multi-monitor support. This was more common with AGP motherboards because there can be only one AGP port, while PCI Express motherboards can have multiple PCIe slots.

If you want to use multiple monitors on AGP motherboards, you must either get an AGP graphics card with multi-monitor support, or use PCI graphics cards. PCI Express motherboards usually have multiple PCIe slots, but there may still not be enough PCIe slots, and you may need to install PCI graphics cards.

For those who upgraded from a PCI graphics card to an AGP graphics card, it is certainly enticing to use the old PCI graphics card to support a second monitor. The PCI card would do the job just fine as it merely sends display data to the second monitor. You don’t need a powerful graphics card to run the second monitor, if it’s merely for display purposes.

When it comes to a case of a PCI Express or an AGP graphics card working in tandem with a PCI graphics card, the BIOS has to determine which graphics card is the primary graphics card. Naturally, the default would be the PCIe or AGP graphics card since it would naturally be the faster graphics card.

However, there are situations in which you may want to manually select the PCI graphics card instead. For example – you have a PCIe / AGP graphics card as well as a PCI graphics card, but only one monitor. This is where the Init Display First BIOS feature comes in. It allows you to select whether to boot the system using the PCIe / AGP graphics card or the PCI graphics card.

[adrotate banner=”5″]

If you are only using a single graphics card, the BIOS will ignore this BIOS setting and boot the computer using that graphics card. However, there may be a slight reduction in the time taken to detect and initialize the card if you select the proper setting. For example, if you only use a PCIe / AGP graphics card, then setting Init Display First to PCIe or AGP may speed up your system’s booting-up process.

If you are only using a single graphics card, it is recommended that you set the Init Display First feature to the proper setting for your system :

  • PCIe for a single PCIe card,
  • AGP for a single AGP card, and
  • PCI for a single PCI card.

But if you are using multiple graphics cards, it is up to you which card you want to use as your primary display card. It is recommended that you select the fastest graphics card as the primary display card.

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participating in the Tech ARP Forums, or even donating to our fund. Any help you can render is greatly appreciated!

True Performance Of The Radeon RX 480 Examined

True Performance Of The Radeon RX 480 Examined

After the Radeon RX 480 was found to draw excessive power from the PCI Express bus, AMD released the Radeon Software 16.7.1 driver. This is a non-WHQL driver that was promises to reduce the Radeon RX 480‘s power draw from the PCI Express bus. It also promises to improve the Radeon RX 480‘s performance to correct for the expected drop in performance.

The reduction in power consumption is not enabled by default though, because it reduces performance. Instead, AMD will add a Compatibility Mode option in Radeon Settings, which you must manually toggle. Check out what the new Radeon Software 16.7.1 driver offers :

  • The Radeon RX 480’s power distribution has been improved for AMD reference boards, lowering the current drawn from the PCIe bus.
  • A new “compatibility mode” UI toggle has been made available in the Global Settings menu of Radeon Settings. This option is designed to reduce total power with minimal performance impact if end users experience any further issues.  This toggle is “off” by default.
  • Performance improvements for the Polaris architecture that yield performance uplifts in popular game titles of up to 3%. These optimizations are designed to improve the performance of the Radeon RX 480, and should substantially offset the performance impact for users who choose to activate the “compatibility” toggle.

In this article, we will examine the drop in performance caused by the reduced power consumption. Then we will compare it to the boost in performance from the Radeon Software 16.7.1 driver. Check it out!

 

3DMark (1920 x 1080)

We started testing the graphics cards using 3DMark at the most common gaming resolution – 1920 x 1080.

In the lower resolution of 1920 x 1080, the Radeon RX 480 [Amazon] received a performance boost of 3% to 3.8%. That was sufficient to completely erase the 2.4% to 3% drop in performance due to the reduced power consumption.

 

3DMark (2560 x 1440)

Then we took 3DMark up a notch to the resolution of 2560 x 1440. According to AMD, this is the sweet spot for the Radeon RX 480 [Amazon]. Let’s take a look!

When we increased the resolution to 2560 x 1440 though, the performance boost from the Radeon Software 16.7.1 driver dropped to just 2.3% to 2.9%. It just about erased the drop in performance from the reduced power consumption.

 

3DMark (3840 x 2160)

This is torture, even for the 8 GB version of the Radeon RX 480 [Amazon].

At the 4K resolution, the 2.3% to 2.85% boost in  from the Radeon Software 16.7.1 driver was not enough to offset the 3.7% to 4% drop in performance from the lower TDP. The Radeon RX 480 [Amazon] ended up 1% to 1.8% slower.

[adrotate banner=”5″]

Next Page > Fallout 4, Witcher 3 & Warhammer Results, Conclusion

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

Fallout 4 (1920 x 1080)

This chart shows you the minimum and maximum frame rates, as well as the average frame rate, FRAPS recorded in Fallout 4.

In Fallout 4, the new Radeon Software 16.7.1 driver boosted the average frame rate enough to make up for the drop in performance from the reduced power consumption.

 

The Witcher 3 (1920 x 1080)

This chart shows you the minimum and maximum frame rates, as well as the average frame rate, FRAPS recorded in The Witcher 3.

In The Witcher 3, the performance boost was substantial enough to give the Radeon RX 480 [Amazon] a small 1.3% boost in average frame rate, even with the Compatibility Mode triggered.

 

Total War : Warhammer (1920 x 1080)

This chart shows you the minimum and maximum frame rates, as well as the average frame rate, recorded by the Total War : Warhammer benchmark.

Surprisingly, the Radeon Software 16.7.1 driver did even better in Total War : Warhammer. Even with the reduced power consumption, the Radeon RX 480 [Amazon] received a nice 2.2% boost in the average frame rate!

[adrotate banner=”5″]

 

Conclusion

The Radeon Software 16.7.1 driver does two things – reprogram the Radeon RX 480‘s power controller so it will pull more current from the 6-pin PCI Express power cable, and less from the PCI Express bus. This fix does not reduce performance. However, it still means that the Radeon RX 480 [Amazon] will exceed its rated 150 W TDP.

The higher TDP should not cause any concerns normally. However, those want their Radeon RX 480 [Amazon] to adhere to the rated 150 W TDP can enable the new Compatibility Mode switch in Radeon Settings. This reduces the Radeon RX 480‘s TDP to 150 W.

The reduction in power consumption reduces performance, of course. But for all of the furore over the Radeon RX 480 power draw controversy, it looks like the performance boost that the Radeon RX 480 [Amazon] received from the higher-than-rated TDP was less than 4%.

We will be correcting our AMD Radeon RX 480 Review to reflect this change. Yes, 4% may be small, but it is still a significant change, and we have to be accurate.

The good news though is that the small drop in performance is virtually offset by performance optimisations for the AMD Polaris architecture in the new Radeon Software 16.7.1 driver. So if you are a Radeon RX 480 user, go get it now!

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

Radeon Software 16.7.1 Performance Comparison

Radeon Software 16.7.1

Following the Radeon RX 480 power draw controversy, AMD released the Radeon Software 16.7.1 driver. This is a non-WHQL driver that was pushed out quickly to fix the Radeon RX 480‘s excessive power draw from the PCI Express bus. However, it also comes with a 3% boost in performance for the Polaris architecture.

Finally, we’ve implemented a collection of performance improvements for the Polaris architecture that yield performance uplifts in popular game titles of up to 3%1. These optimizations are designed to improve the performance of the Radeon RX 480, and should substantially offset the performance impact for users who choose to activate the “compatibility” toggle.

So we decided to take a look at the performance improvements it delivers in the Radeon RX 480 [Amazon]. We also took a look at how it affects the AMD Radeon R9 380 graphics card, which is based on the previous-generation Fiji architecture. Check it out!

 

3DMark (1920 x 1080)

We started testing the graphics cards using 3DMark at the most common gaming resolution – 1920 x 1080.

The Radeon RX 480 [Amazon] received a 3.15% boost in the Overall Score, a 3.77% boost in the Graphics Score and a 3% boost in the Combined Score. Very nice! The Radeon R9 380, however, did not benefit from the newer Radeon Software 16.7.1 driver at all.

The frame rate breakdown shows the Radeon RX 480 [Amazon] edging even further away from its predecessor, the Radeon R9 380. It is now 44-48% faster than the Radeon R9 380, thanks to the Radeon Software 16.7.1 driver.

 

3DMark (2560 x 1440)

Then we took 3DMark up a notch to the resolution of 2560 x 1440. According to AMD, this is the sweet spot for the Radeon RX 480 [Amazon]. Let’s take a look!

At this higher resolution, the Radeon RX 480 [Amazon] received a smaller performance boost of 2.6% in the Overall Score, 2.9% in the Graphics Score and 2.3% in the Combined Score. The Radeon R9 380‘s performance actually suffered slightly (by 1%) with the Radeon Software 16.7.1 driver.

The small boost in performance from the Radeon Software 16.7.1 driver only gave the Radeon RX 480 [Amazon] a small 0.5-1 fps boost in frame rate. Coupled with the slight drop in the Radeon R9 380‘s performance, the Radeon RX 480 is now 40-50% faster than the Radeon R9 380.

 

3DMark (3840 x 2160)

This is torture, even for the 8 GB version of the Radeon RX 480 [Amazon]. The Radeon R9 380 would do even worse, with just 4 GB of GDDR5 memory.

For some reason, the Radeon Software 16.7.1 driver caused the benchmark to fail while running on the Radeon R9 380. However, we can see that it gives the Radeon RX 480 [Amazon] is a small 2.4% boost in the Overall Score, a 2.3% boost in the Graphics Score and a 2.85% boost in the Combined Score.

Based on the Radeon R9 380 running on the earlier Radeon Software 16.6.2 driver, the Radeon RX 480 [Amazon] is now 36-49% faster than the Radeon R9 380 at this resolution.

Next Page > Fallout 4, Witcher 3 & Warhammer, Conclusion & Downloads

[adrotate banner=”5″]

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

Fallout 4 (1920 x 1080)

This chart shows you the minimum and maximum frame rates, as well as the average frame rate, FRAPS recorded in Fallout 4.

The new Radeon Software 16.7.1 driver seems to greatly increase the frame rate range for the Radeon RX 480, and slightly in the Radeon R9 380. However, only the Radeon RX 480 [Amazon] saw a small 1.9% boost in the average frame rate.

 

The Witcher 3 (1920 x 1080)

This chart shows you the minimum and maximum frame rates, as well as the average frame rate, FRAPS recorded in The Witcher 3.

The Radeon Software 16.7.1 driver gave both the Radeon RX 480 [Amazon] and the Radeon R9 380 a small boost in frame rate of 3% and 1% respectively.

 

Total War : Warhammer (1920 x 1080)

This chart shows you the minimum and maximum frame rates, as well as the average frame rate, recorded by the Total War : Warhammer benchmark.

Surprisingly, the Radeon R9 380 saw an appreciable boost in the frame rate range, although the average frame rate only creeped slightly higher. The Radeon RX 480 [Amazon], though, received a more substantial 2.8% boost in average frame rate.

[adrotate banner=”5″]

 

Conclusion & Downloads

If you are using the new AMD Radeon RX 480 [Amazon] graphics card, you should download and use the new Radeon Software 16.7.1, even if you don’t care about its excessive power draw from the PCI Express bus.

In the 3 games we tested, the Radeon RX 480 [Amazon] enjoyed a small boost of 2-3% in frame rate. Not earth-shattering, to be sure, but still a nice boost. The performance boost alone is worth upgrading to Radeon Software 16.7.1, even though it’s not WHQL-certified. You can download them here :

However, if you are using a Fiji-based graphics card like the Radeon R9 380 we tested, you should not waste your time with the new Radeon Software 16.7.1. You will not see any improvement in performance. In fact, it may even deteriorate a little, or worse, fail to run properly when rendering in 4K.

We also investigated how much performance is lost when the Radeon RX 480 [Amazon] is set to its Compatibility Mode to comply with the PCI Express standard. Check it out in our article – True Performance of the Radeon RX 480 Examined!

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

AMD Radeon RX 480 Power Draw Controversy Rev. 3.0

After the AMD Radeon RX 480 was officially launched, several websites reported that their cards were drawing substantially more power from the PCI Express bus than the 75 W allowed by the PCI Express specifications. AMD has now come up with responses to this developing controversy.

2016-07-06 : Added a new page on the AMD driver solution, and our take on it.

2016-07-09 : Added a new section on the Radeon Software Crimson Edition 16.7.1 driver.

 

Excessive RX 480 Power Draw

The PCI Express specification allows for up to 66 W of power to be supplied by the 12 V line (12 V x 5.5 A) of the PCI Express bus. However, reviewers who have the necessary equipment to measure the power draw from the PCI Express slot have noted that the Radeon RX 480 draws 78-88 W of power from that 12 V line.

If their measurements are correct, the Radeon RX 480 exceeds the PCI Express power draw specification by a minimum of 18% and up to 33%. It also means that the Radeon RX 480 is exceeding its thermal design power (TDP) of 150 watts.

 

The Implications

The AMD Radeon RX 480 has to be certified to meet the PCI Express specifications to qualify the card as a PCI Express card, for branding and legal purposes. If the Radeon RX 480 does not fulfil its certification requirements, AMD has to fix the issue within 3 months. Failure to do so will result in the Radeon RX 480 being denied the right to be branded and sold as a PCI Express card.

For certain, AMD would certified the Radeon RX 480 to be PCI Express-compliant before the launch. However, independent testing has revealed that the Radeon RX 480 can and do exceed the power draw specifications. Why there is a discrepancy pre- and post-launch is yet unknown.

 

AMD Responds

Initially, Raja Koduri, Senior Vice President and Chief Architect, Radeon Technologies Group, responded on Reddit that :

Great question and I am really glad you asked.

We have extensive testing internally on our PCIE compliance and RX480 passed our testing. However we have received feedback from some of the reviewers on high current observed on PCIE in some cases.

We are looking into these scenarios as we speak and reproduce these scenarios internally. Our engineering team is fully engaged.

Just two days ago, AMD’s Communications Lead, Garrath Johnson, issued an update on their ongoing investigation of the issue :

As you know, we continuously tune our GPUs in order to maximize their performance within their given power envelopes and the speed of the memory interface, which in this case is an unprecedented 8Gbps for GDDR5.

Recently, we identified select scenarios where the tuning of some RX 480 boards was not optimal. Fortunately, we can adjust the GPUs tuning via software in order to resolve this issue.

We are already testing a driver that implements a fix, and we will provide an update to the community on our progress on Tuesday (July 5, 2016).

We will keep you updated on this developing Radeon RX 480 power draw story, so stay tuned!

Next Page > New Driver To Correct RX 480 Power Draw, Our Opinion

[adrotate banner=”5″]

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

New Driver To Correct RX 480 Power Draw

At 3:14 PM on July 6 (GMT+8), AMD’s Communications Lead, Garrath Johnson, emailed us the solution that AMD has developed – a new driver to correct the excessive Radeon RX 480 power draw from the PCI Express bus. Check it out :

We promised an update today (July 5, 2016) following concerns around the Radeon RX 480 drawing excess current from the PCIe bus. Although we are confident that the levels of reported power draws by the Radeon RX 480 do not pose a risk of damage to motherboards or other PC components based on expected usage, we are serious about addressing this topic and allaying outstanding concerns. Towards that end, we assembled a worldwide team this past weekend to investigate and develop a driver update to improve the power draw. We’re pleased to report that this driver—Radeon Software 16.7.1—is now undergoing final testing and will be released to the public in the next 48 hours.

In this driver we’ve implemented a change to address power distribution on the Radeon RX 480 – this change will lower current drawn from the PCIe bus.

Separately, we’ve also included an option to reduce total power with minimal performance impact. Users will find this as the “compatibility” UI toggle in the Global Settings menu of Radeon Settings. This toggle is “off” by default.

Finally, we’ve implemented a collection of performance improvements for the Polaris architecture that yield performance uplifts in popular game titles of up to 3%1. These optimizations are designed to improve the performance of the Radeon RX 480, and should substantially offset the performance impact for users who choose to activate the “compatibility” toggle.

AMD is committed to delivering high quality and high performance products, and we’ll continue to provide users with more control over their product’s performance and efficiency. We appreciate all the feedback so far, and we’ll continue to bring further performance and performance/W optimizations to the Radeon RX 480.

1: Based on data running ’Total War: Warhammer’, ultra settings, 1080p resolution. Radeon Software 16.6.2 74.2FPS vs Radeon Software 16.7.1 78.3FPS; Metro Last Light, very high settings, 1080p resolution, 80.9FPS vs 82.7 FPS. Witcher 3, Ultra settings, 1440p, 31.5FPS vs 32.5, Far Cry 4, ultra settings, 1440p, 54.65FPS vs 56.38FPS, 3DMark11 Extreme, 22.8 vs 23.7  System config: Core i7-5960X, 16GB DDR4-2666MHz, Gigabyte X99-UD4, Windows 10 64-bit. Performance figures are not average, may vary from run-to-run.

 

Radeon Software Crimson Edition 16.7.1

The Radeon Software Crimson Edition 16.7.1 driver is now available! Here are the compatibility and performance updates that promises to solve the Radeon RX 480 power draw problem :

  • The Radeon RX 480’s power distribution has been improved for AMD reference boards, lowering the current drawn from the PCIe bus.
  • A new “compatibility mode” UI toggle has been made available in the Global Settings menu of Radeon Settings. This option is designed to reduce total power with minimal performance impact if end users experience any further issues.  This toggle is “off” by default.
  • Performance improvements for the Polaris architecture that yield performance uplifts in popular game titles of up to 3%. These optimizations are designed to improve the performance of the Radeon RX 480, and should substantially offset the performance impact for users who choose to activate the “compatibility” toggle.

Also, the Radeon Software Crimson Edition 16.7.1 driver appears to fix the limited PCI Express bandwidth on the Radeon RX 480, giving it a further boost in performance :

  • Radeon RX 480 limited PCI-E Bandwidth (PCI-E bandwidth is now at the correct speed on the Radeon RX 480) with Radeon Software Crimson Edition 16.7.1.

You can download the new drivers below :

[adrotate banner=”5″]

 

Our Opinion

AMD has basically acknowledged that the Radeon RX 480 does indeed draw more power over the PCI Express bus than is allowed by the PCI Express specifications. That is also a tacit acknowledgement that the Radeon RX 480 has a thermal design power (TDP) in excess of 150 W.

They claim that the excessive Radeon RX 480 power draw will not damage the motherboard or related components. However, they also qualify that as limited to “expected usage” – that means using the Radeon RX 480 as is, and not overclocking it.

The Radeon Software Crimson Edition 16.7.1 driver they just released offers 3 changes :

  • shifting the excessive power draw from the PCI Express bus to the 6-pin PCIe power cable.
  • reduce the power consumption of Radeon RX 480 through a “compatibility” toggle in the driver.
  • improve the Radeon RX 480‘s performance by 3%, to offset the reduced performance when the “compatibility” toggle is enabled

We have a dedicated article covering the Radeon Software 16.7.1, which looks at its performance improvements. You can check it out here -> Radeon Software 16.7.1 Performance Comparison.

Although AMD implied that the performance impact of the “compatibility” toggle is substantially less than 3%, we examined its real impact and how much the Radeon Software 16.7.1 driver’s performance offset it. Check it out in our article – True Performance of the Radeon RX 480 Examined.

Going forward, we expect the Radeon RX 480 cards to eventually ship with an 8-pin PCI Express power connector for “compatibility” reasons.

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!

NVIDIA Tesla P100 For PCIe-Based Servers Overview

On June 20, 2016, NVIDIA officially unveiled their Tesla P100 accelerator for PCIe-based servers. This is a long-expected PCI Express variant of the Tesla P100 accelerator that was launched in April using the NVIDIA NVLink interconnect. Let’s check out what’s new!

 

NVIDIA Tesla P100

The NVIDIA Tesla P100 was originally unveiled at the GPU Technology Conference on April 5, 2016. Touted as the world’s most advanced hyperscale data center accelerator, it was built around the new NVIDIA Pascal architecture and the proprietary NVIDIA NVLink high-speed GPU interconnect.

Like all other Pascal-based GPUs, the NVIDIA Tesla P100 is fabricated on the 16 nm FinFET process technology. Even with the much smaller process technology, the Tesla P100 is the largest FinFET chip ever built.

Unlike the Pascal-based GeForce GTX 1080 and GTX 1070 GPUs designed for desktop gaming though, the Tesla P100 uses HBM2 memory. In fact, the P100  is actually built on top of the HBM2 memory chips in a single package. This new package technology, Chip on Wafer on Substrate (CoWoS), allows for a 3X boost in memory bandwidth to 720 GB/s.

The NVIDIA NVLink interconnect allows up to eight Tesla P100 accelerators to be linked in a single node. This allows a single Tesla P100-based server node to outperform 48 dual-socket CPU server nodes.

 

Now Available With PCIe Interface

To make Tesla P100 available for HPC (High Performance Computing) applications, NVIDIA has just introduced the Tesla P100 with a PCI Express interface. This is basically the PCI Express version of the original Tesla P100.

 

Massive Leap In Performance

Such High Performance Computing servers can already make use of the NVIDIA Tesla K80 accelerators, that are based on the previous-generation NVIDIA Maxwell architecture. The new NVIDIA Pascal architecture, coupled with much faster HBM2 memory, allow for a massive leap in performance. Check out these results that NVIDIA provided :

Ultimately, the NVIDIA Tesla P100 for PCIe-based servers promises to deliver “dramatically more” performance for your money. As a bonus, the energy cost of running Tesla P100-based servers is much lower than CPU-based servers, and those savings accrue over time.

[adrotate banner=”5″]

 

Two Configurations

The NVIDIA Tesla P100 for PCIe-based servers will be slightly (~11-12%) slower than the NVLink version, turning out up to 4.7 teraflops of double-precision performance, 9.3 teraflops of single-precision performance, and 18.7 teraflops of half-precision performance.

The Tesla P100 will be offered in two configurations. The high-end configuration will have 16 GB of HBM2 memory with a maximum memory bandwidth of 720 GB/s. The lower-end configuration will have 12 GB of HBM2 memory with a maximum memory bandwidth of 540 GB/s.

 

Complete NVIDIA Slides

For those who are interested in more details, here are the NVIDIA Tesla P100 for PCIe-based Servers slides.

[adrotate banner=”5″]

 

Support Tech ARP!

If you like our work, you can help support our work by visiting our sponsors, participate in the Tech ARP Forums, or even donate to our fund. Any help you can render is greatly appreciated!