GDDR Archives - Rambus

All You Need to Know About GDDR7

Rambus Press — Thu, 29 May 2025 18:39:51 +0000

In this blog post, we explore everything you need to know about Graphics Double Data Rate, most commonly known as GDDR. Since its introduction in 2000, GDDR has become the primary memory technology for graphics cards, evolving through several generations—from GDDR2 up to the latest GDDR7—to provide ever-increasing speed and efficiency for advanced visual and computational applications.

Let’s dive right in to everything you need to know about GDDR in the blog below.

What Does GDDR Stand For?

GDDR stands for Graphics Double Data Rate. It is a specialized type of memory designed specifically for graphics processing units (GPUs) and is engineered to deliver high bandwidth for the demanding data transfer needs of modern graphics rendering and computation. Unlike standard DDR (Double Data Rate) memory, which is used for general system tasks and CPUs, GDDR is optimized for the parallel processing and rapid data throughput required by tasks such as gaming, 3D rendering, and AI workloads.

Today, GDDR has evolved into a state-of-the-art memory solution, with the latest GDDR7 specification offering speeds of up to 48 Gbps per pin and a bandwidth of 192 GB/s per device. Beyond gaming, GDDR has become a solution for AI accelerators and GPUs requiring high memory bandwidth to handle demanding workloads such as AI inference. The latest generation of GPUs and AI systems now leverage GDDR7 to meet the performance needs of these advanced applications.

Which is Faster DDR or GDDR?

GDDR memory is faster than DDR memory when it comes to bandwidth and data transfer rates. GDDR is specifically engineered for graphics cards and GPUs, prioritizing high bandwidth to handle large volumes of graphical data, such as high-resolution textures and complex 3D models. In contrast, DDR memory is optimized for general-purpose computing tasks managed by the CPU, focusing on lower latency rather than raw bandwidth.

For example, the latest GDDR7 memory can achieve per-pin speeds up to 48 Gbps and overall memory subsystem bandwidths reaching 1.5 terabytes per second, while DDR5, the fastest mainstream DDR standard, typically operates at data rates between 4.8 and 8.4 Gbps per pin. This makes GDDR significantly faster for GPU workloads, though DDR retains an advantage in latency and energy efficiency for CPU and multitasking environments.

When Did GDDR7 Launch?

JEDEC published the GDDR7 standard in March 2024, with memory vendors reaching mass production in 2025.

Key Features of GDDR7

Ultra-High Speed: GDDR7 supported data rates up to 32 Gbps per pin in its initial rollout, with a roadmap extending to 48 Gbps in the future. This is more than double the practical speed of previous generations like GDDR6X, which tops out at 21 Gbps.
Exceptional Bandwidth: Each GDDR7 device can deliver 128 GB/s of bandwidth at 32 Gbps (and 192 GB/s at 48 Gbps), providing the throughput needed for data-intensive workloads such as AI inference and next-generation graphics.
Advanced Signaling (PAM3): GDDR7 introduces three-level pulse amplitude modulation (PAM3) signaling, which transmits 50% more data per clock cycle compared to the NRZ (PAM2) used in previous generations. This innovation enables higher data rates without requiring higher clock speeds, improving efficiency and reducing signal integrity challenges.
Lower Voltage and Improved Efficiency: Operating at 1.2V, GDDR7 is more power-efficient than GDDR6X (1.35V), helping to manage overall system power consumption while delivering higher performance.
Enhanced Reliability and RAS Features: GDDR7 incorporates advanced data integrity features such as on-die ECC with real-time reporting, data poison detection, error check and scrub, and command address parity with command blocking. These features improve reliability, availability, and serviceability (RAS), which are critical for mission-critical AI and graphics applications.
Increased Channel Parallelism: GDDR7 moves from two 16-bit channels (in GDDR6) to four 10-bit channels (8 bits data, 2 bits error reporting), enabling greater parallelism and more efficient data handling.
JEDEC Standardization: GDDR7 is a JEDEC-approved open standard, ensuring broad industry support and interoperability.

These features make GDDR7 a state-of-the-art memory solution, delivering the high bandwidth, efficiency, and reliability required for the latest AI, gaming, and graphics workloads.

Jump to: GDDR Solutions »

What is the Difference Between GDDR6 and GDDR7?

GDDR7 represents a significant upgrade over GDDR6, offering higher performance, improved efficiency, and advanced features. The most notable difference is its speed: GDDR7 delivers data rates of up to 48 Gbps per pin, compared to GDDR6’s maximum of 24 Gbps. This results in up to 2x the bandwidth, enabling faster data access and processing for demanding applications like AI inference, gaming, and high-resolution rendering. Additionally, GDDR7 utilizes PAM3 signaling, which transmits 50% more data per clock cycle than GDDR6’s NRZ encoding. It also operates at a lower voltage (1.1–1.2V vs. GDDR6’s 1.35V), improving energy efficiency per bit. Furthermore, GDDR7 features four 8-bit channels per chip (compared to GDDR6’s two 16-bit channels), enhancing parallelism and reducing latency for real-time workloads.

Features	GDDR6	GDDR7
Data Rate	Up to 24 Gbps	Up to 48 Gbps
Bandwidth per Device	Up to 96 GB/s	Up to 192 GB/s
Voltage	1.35V	1.1–1.2V
Signaling	NRZ (PAM2)	PAM3
Channels/Chip	Two 16-bit channels	Four 8-bit channels
Use Cases	High-end gaming, VR	AI workloads, 8K+ gaming

What is the Difference Between GDDR6X and GDDR7?

To understand the difference between GDDR6X and GDDR7, we need to learn more about GDDR6x.

What is GDDR6x?

GDDR6X is a high-performance graphics memory standard designed to deliver faster data transfer rates and greater memory bandwidth (vs. GDDR6) for demanding GPU applications. The key innovation in GDDR6X is its use of PAM4 (Pulse Amplitude Modulation 4-level) signaling, which allows it to transmit two bits of data per clock cycle—double what traditional NRZ (Non-Return to Zero) signaling achieves in GDDR6. This enables GDDR6X to reach data rates of up to 21 Gbps per pin, significantly increasing overall memory bandwidth, which can reach up to 768 GB/s on a 256-bit bus.

GDDR7 and GDDR6X are both high-performance graphics memory standards, but they differ in several key technical aspects:

Data Rate: GDDR7 offers a much higher maximum speed, reaching up to 48 Gbps per pin in the future, compared to GDDR6X’s maximum of 21 Gbps per pin.
Signaling Technology: GDDR7 uses PAM3 (Pulse Amplitude Modulation with 3 levels), while GDDR6X relies on PAM4 (4 levels).
Bandwidth per Device: At 32 Gbps, a GDDR7 device can deliver 192 GB/s of bandwidth, whereas GDDR6X at 21 Gbps achieves 84 GB/s per device.
Voltage and Efficiency: GDDR7 operates at a lower voltage (1.2V) compared to GDDR6X (1.35V), resulting in better power efficiency.
Standardization: GDDR7 is a JEDEC-approved open standard, ensuring broad industry support, while GDDR6X is a proprietary technology developed by Micron and NVIDIA

Parameter	GDDR7	GDDR6X
Max Speed	Up to 48 Gbps	21 Gbps
Signaling	PAM3	PAM4
Bandwidth per Device	192 GB/s (at 48 Gbps)	84 GB/s (at 21 Gbps)
Voltage	1.1-1.2V	1.35V

What is the Difference Between GDDR7 and HBM3?

GDDR7 and High Bandwidth Memory 3 (HBM3) are advanced memory technologies designed for GPUs and AI accelerators, but they serve distinct purposes and excel in different scenarios. GDDR7, the latest iteration of Graphics Double Data Rate memory, is optimized for high-speed, cost-effective applications, such as gaming and edge AI. On the other hand, HBM3 is tailored for ultra-high-bandwidth workloads in data centers, HPC (High-Performance Computing), and AI training, where efficiency and scalability are critical.

Feature	GDDR7	HBM3
Bandwidth per Device	192 GB/s (at 48 Gb/s)	819 GB/s (at 6.4 Gb/s)
Bus Width	32-bit	1024-bit
Memory Configuration	Soldered onto PCB	Stacked DRAM in package
Use Cases	Gaming, edge AI	AI training, HPC, data center GPUs
Voltage	1.1-1.2V	1.1V core voltage
Cost	Cost-effective	Expensive due to silicon interposers and stacking technology
Scalability	Flexible	Limited configurability

Use Case Differences:

GDDR7 is designed for consumer-grade GPUs used in gaming PCs and edge devices where cost-effectiveness and high-speed performance are priorities.
HBM3 is reserved for flagship GPUs in data centers or HPC environments where bandwidth requirements far exceed those of mainstream applications.

While GDDR7 excels in delivering high-speed performance at a lower cost for gaming and edge AI applications, HBM3 dominates in scenarios demanding extreme bandwidth and efficiency, such as AI training or HPC workloads. Choosing between these memory types depends on the specific requirements of the application, balancing performance needs against cost considerations.

Jump to: HBM Solutions »

What is the Difference Between GDDR7 and LPDDR5?

GDDR7 and LPDDR5 are both advanced memory technologies, but they are designed for very different use cases and have distinct technical characteristics.

GDDR7 is the latest generation of graphics memory, primarily used in GPUs for high-performance computing, AI inference, and gaming. It is engineered to deliver extremely high bandwidth and data rates, making it ideal for applications that require rapid movement of large amounts of data, such as real-time graphics rendering and AI model inference. GDDR7 achieves this by employing advanced PAM3 signaling, supporting data rates up to 32 Gbps per pin initially (with a roadmap to 48 Gbps), and offering per-chip bandwidths as high as 128–192 GB/s. Its interface and architecture are optimized for speed and throughput, with moderate power efficiency.

LPDDR5, on the other hand, stands for Low Power DDR5 and is optimized for energy efficiency and compactness, making it the memory of choice for mobile devices, laptops, and other battery-powered systems. LPDDR5 typically supports data rates of 6.4–8.5 Gbps per pin, with a focus on minimizing power consumption through features like Dynamic Voltage Scaling and multiple low-power modes. While LPDDR5 is highly efficient and supports reasonable bandwidth for mobile and embedded applications, it cannot match the raw speed and throughput of GDDR7.

Key Differences Table:

Feature	GDDR7	LPDDR5
Primary Use	GPUs, AI accelerators	Smartphones, laptops
Max Data Rate (per pin)	32–48 Gbps	Up to 6.4–8.5 Gbps
Bandwidth per Device	128–192 GB/s	~34 GB/s
Signaling	PAM3	NRZ (PAM2)
Voltage	1.1-1.2V	1.05V/0.9V (core), 0.5V/0.3V (I/O)
Power Efficiency	Moderate	High
Prefetch	32n	16n
Interface Width	32 bits	32 bits
Typical Application	Graphics cards, AI edge servers	Mobile devices, ultrabooks

GDDR7 excels in scenarios where maximum bandwidth and low latency are critical, such as AI inference and high-end gaming, but it consumes more power and is less suited for compact, battery-powered devices.

LPDDR5 is optimized for energy efficiency and space, making it ideal for mobile and portable applications, but it cannot deliver the same level of bandwidth as GDDR7.

Jump to: LPDDR Solutions »

How GDDR7 Supercharges AI Inference Performance

GDDR7 memory delivers transformative improvements for AI inference workloads through groundbreaking advancements in bandwidth, efficiency, and signaling technology.

Here’s how it achieves this:

Unmatched Bandwidth for Data-Intensive Models
- Speed: GDDR7 operates at 32–48 Gbps per pin, doubling GDDR6X’s 21 Gbps limit. At 48 Gbps, each GDDR7 device provides 192 GB/s of bandwidth, enabling AI accelerators to process trillion-parameter models (e.g., LLMs) without data bottlenecks.
- Scalability: A system requiring 500 GB/s bandwidth needs just 3 GDDR7 chips—compared to 13 LPDDR5X modules—reducing latency and complexity for edge AI deployments.
Power Efficiency for Sustainable AI
- Voltage: GDDR7 runs at 1.2V (vs. GDDR6X’s 1.35V), reducing power consumption by >10% per bit.
- Dynamic Voltage Scaling: Adjusts power based on workload demands, critical for energy-constrained edge devices running continuous inference tasks.
Advanced Signaling for Lower Latency
- PAM3 Encoding: Transmits 50% more data per cycle than GDDR6’s NRZ, enabling faster throughput without higher clock speeds. This reduces inference latency for real-time applications like autonomous driving.
Reliability for Mission-Critical Inference
- On-Die ECC: Corrects errors in real time, ensuring data integrity for sensitive applications like medical diagnostics.
- Standardization: As a JEDEC-approved technology, GDDR7 ensures broad compatibility and optimization across AI hardware ecosystems.
Real-World Impact
- Generative AI: GDDR7’s bandwidth handles large language models (LLMs) like GPT-4, enabling faster text/image generation.
- Autonomous Systems: Low latency ensures rapid sensor data processing for real-time decision-making.
- Edge Servers: Compact GDDR7-based systems deliver data center-level performance in retail, healthcare, and IoT.

By combining raw speed with intelligent power management, GDDR7 is redefining what’s possible for AI inference at the edge and beyond.

Keep on reading:
GDDR7 Memory Supercharges AI Inference

Conclusion

As AI inference models grow in size and complexity, the need for memory solutions that deliver both high bandwidth and low latency has never been greater. GDDR7 rises to this challenge, offering a leap in performance over previous memory technologies. With data rates starting at 32 Gbps per pin and a roadmap to 48 Gbps, GDDR7 provides up to 192 GB/s of bandwidth per device—more than double that of its predecessors and well ahead of alternatives like LPDDR5X. Its adoption of PAM3 signaling, enhanced reliability features, and improved power efficiency at 1.2V make it uniquely suited for the demands of next-generation GPUs and AI accelerators.

Compared to other memory types, GDDR7 stands out for its ability to efficiently feed data-hungry AI inference engines, enabling faster processing of large language models, real-time analytics, and advanced edge applications. Its balance of performance, scalability, and reliability ensures that designers can meet the requirements of both today’s and tomorrow’s AI workloads without compromise. As the industry moves forward, GDDR7 is set to become a cornerstone of high-performance computing, powering innovations in AI, gaming, and beyond.

Explore more resources:
– GDDR Memory for High-Performance AI Inference
– Supercharging AI Inference with GDDR7
– From Training to Inference: HBM, GDDR & LPDDR Memory

Rambus Advances AI 2.0 with GDDR7 Memory Controller IP

Rambus Press — Mon, 22 Apr 2024 21:00:04 +0000

As the latest addition to the Rambus portfolio of industry-leading interface and security digital IP for AI 2.0, the GDDR7 memory controller will provide the breakthrough memory throughput required by servers and clients in the next wave of AI inference.

Memory Solutions for AI 2.0

AI 2.0 represents the revolutionary world of generative AI. AI 2.0 leverages the enormous growth in Large Language Models (LLMs) and their kin to create new multimodal content. Multimodality means text, images, speech, music, video can be combined as inputs to create outputs in all these media and more. Examples include creating a 3D model from an image or a video from a text prompt.

LLMs have scaled to over a trillion parameters with data sets in the billions of samples. Training LLMs requires enormous computational power supported by the latest high-performance memory solutions.

Supercharging AI Inference with GDDR7

The output of the AI 2.0 training process is an inference model that can be employed to create new multimodal content from a user’s prompts. Since accuracy and fidelity increase with model size, there is an ongoing push to larger and larger inference models. And as AI inference becomes increasingly pervasive and moves out from the data center to the edge and endpoints, it drives the need for more powerful processing engines with tailored high-performance memory solutions across the entire computing landscape.

GPUs have been the inference engines of choice, and in the case of edge and endpoint applications, such as servers and desktops, these have been GPUs using GDDR6 memory. GDDR6, however, has reached the practical limit of standard NRZ signaling at 24 Gigabits per second (Gbps) data rates. To meet the bandwidth needs of future GPUs, a new generation of GDDR using a new signaling scheme is required. Enter GDDR7 memory which using PAM3 signaling can boost data rates to 40 Gbps and higher.

Rambus Silicon IP for AI 2.0

As the preferred silicon IP supplier for AI 2.0, Rambus offers industry-leading HBM, PCIe and CXL Controller IP and now the industry’s first GDDR7 Memory Controller IP. The Rambus GDDR7 Controller provides a full-featured, bandwidth-efficient solution for GDDR7 memory implementations. It supports 40 Gbps operation providing 160 Gigabytes per second (GB/s) throughput for a GDDR7 memory device, a 67% increase over the industry’s highest throughput GDDR6 Controller (also from Rambus). The Rambus GDDR7 Controller enables a new generation of GDDR memory deployments for cutting-edge AI accelerators, graphics and high-performance computing (HPC) applications.

“Delivering greater memory performance is mission critical as AI 2.0 workloads push bandwidth requirements higher than ever before,” said Neeraj Paliwal, general manager of Silicon IP, at Rambus. “With our breakthrough GDDR7 Controller IP solution, designers can quickly take advantage of this latest generation of GDDR memory at industry-leading throughput.”

“GDDR7 memory offers significant performance gains over GDDR6,” said Soo-Kyoum Kim, vice president, memory semiconductors at IDC. “The Rambus GDDR7 Controller IP solution will be a vital tool for anyone that wants to take advantage of the improved speed and latency features offered by GDDR7.”

Rambus GDDR7 Controller key features:

Supports all GDDR7 link features including PAM3 and NRZ signaling
Supports broad range of GDDR7 device sizes and speeds
Optimized for high efficiency and low latency across a wide variety of traffic scenarios
Flexible AXI interface support
Low-power support (self-refresh, hibernate self-refresh, dynamic frequency scaling, etc.)
Reliability, Availability and Serviceability (RAS) features – such as end-to-end data path parity, parity protection for stored registers, etc.
Comprehensive memory test support
Integration support for third-party PHYs available
Validated utilizing the latest GDDR7 VIP and memory vendor memory models

The Rambus GDDR7 Memory Controller IP is available now. Learn more about the Rambus GDDR7 Controller here or download our white paper, Supercharging AI Inference with GDDR7.

[Infographic]: The Powerful Technologies that Enable Systems like ChatGPT to Thrive

Rambus Press — Tue, 12 Mar 2024 20:59:04 +0000

Generative AI has been making waves in the tech industry. The capability to understand context and perform tasks like creating and summarizing content with astonishing accuracy in seconds showcases the cutting-edge potential that generative AI has to transform business processes.

Have you ever thought about the technologies that enable generative AI, including Chat GPT and Google Bard? Semiconductor technologies like DDR5, High-bandwidth Memory (HBM), GDDR, and PCI Express are critical in the training and deployment of generative AI.

Security will be another essential requirement as Generative AI proliferates to the edge and increasingly to client systems and smart end points. Safeguarding AI data and assets will require security anchored in hardware.

Check out the Rambus infographic below, “The Powerful Technologies that Enable Systems like ChatGPT to Thrive” to learn more.

Powering the Next Wave of AI Inference with the Rambus GDDR6 PHY at 24 Gb/s

Rambus Press — Wed, 19 Apr 2023 21:00:52 +0000

Rambus is, once again, leading the way in memory performance solutions with today’s announcement that the Rambus GDDR6 PHY now reaches performance of up to 24 Gigabits per Second (Gb/s), the industry’s highest data rate for GDDR6 memory interfaces!

AI/ML inference models are growing rapidly in both size and sophistication, and because of this we are seeing increasingly powerful hardware deployed at the network edge and in endpoint devices. For inference, memory throughput speed and low latency are critical. GDDR6 memory offers an impressive combination of bandwidth, capacity, latency and power that makes it ideal for these applications.

The GDDR6 interface supports 2 channels, each with 16 bits for a total data width of 32 bits. With speeds up to 24 Gb/s per pin, the Rambus GDDR6 PHY offers a maximum bandwidth of up to 96 GB/s. This represents a 50% increase in available bandwidth, compared with the previous generation 16G GDDR6 PHY.

Of course, hitting such high data rates also comes with some challenges. Maintaining signal integrity (SI) at speeds of 24 Gb/s, particularly at lower voltages, requires significant expertise. Designers face tighter timing and voltage margins and the number of sources of loss and their effects all rise rapidly. This is where the long-standing Rambus expertise in SI comes in and allows customers to maintain the SI of their system, even at these new 24G data rates.

Check out our “From Data Center to End Device: AI/ML Inference with GDDR6” white paper for a detailed look at GDDR6 memory capabilities and discover why it is ideally suited to meet the challenges of AI inference applications.

Powering the Next Wave of AI Applications

Rambus Press — Thu, 29 Apr 2021 19:12:43 +0000

Artificial Intelligence/Machine Learning (AI/ML) grows at a blistering pace. The size of the largest training models has passed 100 billion parameters and is on pace to hit a trillion in the next year. The impact of AI/ML is being felt across the industry landscape, in higher education, and in financial markets. Underpinning this growth is the rapid advancement in computer hardware technology with specific emphasis on AI/ML-tailored memory solutions that provide extremely high bandwidth. Check out this new infographic that captures some of the high level trends and highlights two high-performance memories, HBM2E and GDDR6 DRAM, that are powering the next wave of AI applications.

AI Requires Tailored DRAM Solutions: Part 4

Rambus Press — Wed, 14 Oct 2020 21:29:05 +0000

Frank Ferro, Senior Director Product Management at Rambus, and Shane Rau, Senior Research Executive at IDC, recently hosted a webinar that explores the role of tailored DRAM solutions in advancing artificial intelligence. Part three of this four-part series touched on a wide range of topics including the impact of AI on specific hardware systems, training versus inference, and selecting the most appropriate memory for AI/ML. This blog post (part four) takes a closer look at the evolution of HBM and GDDR6, as well as the design tradeoffs and challenges of the two memory types.

The Evolution of HBM

Although HBM2 is currently shipping, standardization and market requirements are pushing HBM to go faster and faster. HBM2 says, Ferro, started off with a two gigabits per second data rate, and has since moved to 3.2, with some companies announcing versions that are even faster.

“We don’t see any end in sight for faster HBM speeds. The next generation of the HBM standard is already being worked on. The industry is pushing very hard to continue to drive the bandwidth up on HBM,” he adds.

Despite its ability to achieve extremely high speeds and bandwidth, Ferro emphasizes that HBM, which utilizes a very wide interface, is still based on DRAM (DDR3-DDR4) technology.

Note: On Sep. 9, 2020, Rambus announced its HBM2E interface could operate at 3.6 Gbps raising bandwidth to 461 GB/s.

“In the slide [above], you can see there’s 1,024 bits that are being used for HBM. Essentially, we are taking really a traditional DRAM and we’re going to go very, very wide on the interface,” he explains.
“You’re going wide and slow if you want to think of it that way, which gives you very good power efficiency and very high bandwidth. These are very good attributes, but you also have to deal with its 2.5D structure.”

Despite its challenges, says Ferro, HBM is the most optimal system for bandwidth and power efficiency. As well, HBM also offers optimal density.

“HBM stacks more traditional DRAMs in a 3D structure. And by stacking, you get both capacity and bandwidth. On this slide [below] in the bottom left hand corner, you can see what an HBM system looks like. You have the HBM DRAM, that’s sitting on top of the silicon interposer connected to the processor,” he elaborates.

“The HBM DRAM stack, depending on the density you need, could have up to 8 stacks of DRAM. With HBM2E, you can go to a stack of 12 DRAM. Then you connect the DRAM to the CPU through a silicon interposer. Remember you have 1024 data lines that are running through the interposer along with all the control and power and ground lines.”

As well, says Ferro, there are many traces that need to be routed, which is best achieved with a silicon interposer.

“You can see that in a very small footprint you get quite a bit of memory and processing power. On the right-hand side in the slide [above], you see a picture of the processor from NVIDIA that uses four HBMs DRAM stacks. You can also see the four DRAMs are sitting alongside the processor. Again, even with a very small footprint, you get a very large amount of processing power.”

HBM: Design Tradeoffs and Challenges

The slide below, says Ferro, illustrates the trade-offs, benefits and challenges of developing an HBM system. On the left side, there are dedicated GPUs and processors like Google’s TPUs that use HBM for their processing power.

“Again, the benefits are very high bandwidth, high capacity and a small footprint. However, you have to build that system. You’ve got to deal with all of these I/Os. You also have to deal with the complexities of the 2.5D structure,” he explains. “Although the industry is maturing from a manufacturing standpoint, HBM is still relatively new because you have to take the silicon die of the DRAM and the silicon die of your processors. You have these two known good dies, and those two known good dies are going to sit on a silicon interposer, which goes into package. It is still relatively new and expensive to develop HBM. Because of the expense, it has given rise to GDDR6.”

The Evolution of GDDR6

According to Ferro, GDDR6 offers a “really good” trade-off between speed, performance, and design complexity versus HBM.

“In the slide [below], you can see it’s a 32-bit interface, again, more traditional interface, very similar to LPDDR, also very similar to DDR as well,” he states.

“You handle these with your more traditional PCB type manufacturing process. But 16 gigabits per second is very high speed – and four or five times higher than traditional DDR. So, you do have to be very careful when you design with GDDR6 from a signal integrity standpoint.”

As Ferro notes, there are graphics cards that have anywhere from 10 or 11 GDDR6s on a board to deliver very high bandwidth.

“Most GDDR systems are more bandwidth intensive in density, but you do have some options for additional density,” he elaborates. “With GDDR, the current density goes up to about 32 gigabits per device. GDDR6 also has a mode known as clamshell, which allows you to put two GDDR6s on a board opposite each other to give you double density. GDDR provides lots of flexibility for performance.”

GDDR6 Memory System: Taking A Closer Look

The slide below, says Ferro, offers a close-up look at a GDDR system.

“The GDDR is connected to your processor through a traditional PCB at 16 gigabits per second. However, you do have to be careful about your PCB materials and the physical design and placement of the signals,” he cautions.

“With GDDR6 running at high speeds, you have to keep an eye on signal integrity. You must take care to avoid a crosstalk and you have to think about issues like insertion loss. You also have to work very closely with your DRAM provider and your physical PHY provider who should provide reference information on how to carefully design those systems.”

Ferro also highlights the NVIDIA GeForce card in the slide above.

“Up in the top right, you have a block diagram of a GDDR system that has four GDDRs. You can see the 32-bit interfaces and this type of system gives you 256 gigabytes of processing. So that is very good for applications like different AI accelerators – where you can implement a PCIe card or even AI in automotive ADAS systems. 256 gigabytes will give you very good performance if you’re implementing an application like a level two or level three ADAS,” he adds.

GDDR6: Benefits and Challenges

As Ferro notes, GDDR6 offers very high bandwidth, good latency, and is well understood from an engineering standpoint. However, he emphasizes that system designs with GDDR6 must take potential signal integrity issues into account.

“There are GDDR6 systems even now that are pushing the speed limit beyond 16 gigabits per second. As well, some companies have announced 18 gigabits per second. So, the signal integrity challenges are only going to become more challenging,” he states. “Working closely with the DRAM and the PHY manufacturers are going to be critical for customers designing their PCBs. GDDR6 speeds will continue to push up into and beyond 18 gigabits per second range. The need for bandwidth is not going away, and we’re seeing constant pressure from the industry to move faster and faster with these systems.”

In summary, says Ferro, the demand for higher processing power and more memory bandwidth will only become more pronounced in the coming years.

“Today, the CPU is outstripping the ability of the memory to keep up. So, there is a lot of pressure on the memory suppliers to continue to grow the bandwidth,” he explains. “In particular, we’re looking at a higher bandwidth on the AI training side, which is where HBM is being used. We’re seeing HBM continue to push for higher and higher speeds to keep up with the needs of AI training, while GDDR6 is being used for AI inference.”

As the industry sees more and more processing power pushed out into the network and performed locally, higher GDDR speeds and high-speed memory interfaces will be necessary to support AI inference algorithms and applications such as automotive ADAS.

“This is why we are working very closely with the industry, with customers, with DRAM manufacturers, processors, all the way across the entire industry to look at memory architectures that can service the need of the growing amount of data in the system,” he concludes.

AI Requires Tailored DRAM Solutions: Part 3

Rambus Press — Thu, 01 Oct 2020 15:14:46 +0000

Frank Ferro, Senior Director Product Management at Rambus, and Shane Rau, Senior Research Executive at IDC, recently hosted a webinar that explores the role of tailored DRAM solutions in advancing artificial intelligence. Part two of this four-part series touched on multiple topics including how AI enables useful data processing, various types of AI silicon, and the evolving role of DRAM. This blog post (part three) takes a closer look at the impact of AI on specific hardware systems, training versus inference, and selecting the most appropriate memory for AI/ML applications.

Impact of AI on Specific Hardware Systems

According to Ferro, a lot of the hardware driving the internet is general-purpose compute hardware, along with some graphics GPUs. These, says Ferro, have adapted over the years to support the requirements of AI processing. However, we are now seeing many companies design specific processors that are custom tailored to support AI applications.

“For example, if you look at the cloud, there are high compute-intensity types of applications with data that is gathered and processed. We’re calling these AI training type algorithms, where you have different neural networks that need to be trained to understand the data,” he explained. “As you move through the network, you have endpoints that are pumping the data up. A good example is Amazon Alexa, which processes the data up into the cloud and moves the data across the network. Moving forward, this paradigm can potentially overwhelm the network as data loads increase. So, we need to do much more inline processing.”

The requirements of local processing at home via devices such as Alexa, says Ferro, will ultimately require significantly more processing capability and memory bandwidth capacity.

“For example, at an endpoint, you’re going to need 50-100 megabytes of processing to deal with all that data locally. To do that, you are going to need memory systems that are a little bit beyond what is available and what has been traditionally available,” he elaborated. “If you look at the edge of the network, 5G as well, there will be much more inline processing. If you move the data from your endpoint through the network, you will want to do as much processing at each of those end points. So, more memory capacity will be needed.”

Typically, says Ferro, the data is moved through a 5G base station. However, some data will ultimately be processed in the base station, as well as in the cloud, where a significant amount of processing power will be required to handle terabytes of data.

Training Versus Inference

To illustrate the difference between AI training and inference, Ferro points to the slide below that highlights training and inference requirements.

“Training relates to inputting a large amount of data into the network to make sure it understands what you’re specifically trying to identify. If you’re dealing with a car or some other critical real time system, for example, you want to make sure that you get the answer right,” he explains. “This can take many, many iterations of data and good quality data processing. Some of these networks can take days and even weeks to train and requires a lot of bandwidth and a lot of capacity on the DRAM.”

Once those models are trained, says Ferro, they can be pushed out into the network or used for local processing.

“That’s what we call inference, which has a much lower processing requirement,” he adds.

Choosing the Correct Memory for AI/ML

In terms of choosing memory systems for AI, Ferro said some memory systems are more effective for training, while others are better suited for inference.

“For inference, you can think of an endpoint, whether it is a car or consumer device. There will also be a number of different requirements. For example, you are going to want the endpoint and memory system to be power efficient, cost efficient, and process very quickly,” he elaborates. “Up in the network, it may not be the same level of real-time processing, and you’re going to need more processing power. So, there are different costs related to memory density and performance.”

To illustrate the various types of memory systems for AI applications, Ferro points to the slide below and highlights LPDDR4X.

“LPDDR4X, which was traditionally used in mobile applications, provides high speeds at 17 gigabytes per second of processing. It offers low power and very good efficiency. It has migrated away from only being used in mobile devices. In addition to mobile, LPDDR4X is now used in applications like automotive. This is because LPDDR4X provides some additional processing and power efficiency in automotive applications – and even in some low-end AI inference applications.”

Note: On Sep. 9, 2020, Rambus announced its HBM2E interface could operate at 3.6 Gbps raising bandwidth to 3,686 Gbps (461 GB/s).

According to Ferro, GDDR6 has also stepped out of the graphics and gaming worlds and is now used for “in-between applications” such as AI inference and automotive systems. HBM, says Ferro, is used in higher-end applications, offering gigabytes per second of processing power.

“As of today, most of the systems are very heavily slanted towards DDR because that was the only available memory standard. But you can see at 3.2 gigabits per second, DDR is just starting to run out of steam,” he explains. “It’s a great DRAM still being used in many, many systems, and very price efficient. However, these AI systems are simply demanding more and more bandwidth, and in some cases, capacity as well. This gives rise to the need for new memory systems.”

In the slide above, Ferro provides more details about where HBM and GDDR6 are deployed in the marketplace.

“HBM’s focus market has been high performance computing and the more performance you need, HBM is usually the memory of choice. As well, HBM is used for AI training and network applications. 5G systems need to do much more processing and HBM is starting to emerge as a solution in the network anywhere you need very high bandwidth and good capacity,” he elaborates. “As well, GDDR6 is emerging from the graphics world. Not because HBM wasn’t a good solution, but customers and systems needed something that was a little bit more cost efficient, maybe a little bit less complex to manufacture.”

According to Ferro, GDDR6 offers a “really good” trade-off between performance, cost and system complexity. As such, GDDR6 is a popular choice for AI inference, graphics and automotive applications.

“The processing requirements for automotive AI are going up, but you can’t put HBM into a car right now because of some of the reliability concerns with the manufacturing of these 3D structures. However, GDDR6 is a really good solution – giving you both the speeds you need and easier design complexity, which matches nicely for automotive,” he concludes.

AI Requires Tailored DRAM Solutions: Part 2

Rambus Press — Wed, 16 Sep 2020 21:07:16 +0000

Written by Rambus Press

Frank Ferro, Senior Director Product Management at Rambus, and Shane Rau, Senior Research Executive at IDC, recently hosted a webinar that explores the role of tailored DRAM solutions in advancing artificial intelligence. Part one of this four-part series reviewed a range of topics including the interconnected system landscape, the impact of COVID-19 on the data center, and how AI is helping to make sense of the data deluge. This blog (part two), takes a closer look at how AI enables useful data processing, various examples of AI silicon, and the evolving role of DRAM in advancing artificial intelligence.

How AI Enables Useful Data Processing

According to Rau, the sheer amount of data that has been generated in recent years more than justifies the need for AI. As well, AI requires a significant amount of processing power to handle data complexity.

“When you have more distributed data across the landscape generated by different system types, you have different data types. Some of that data is more important than other data. For example, the non-entertainment imaging share of total data, think a lot of static images, is declining” he explains.

“In contrast, entertainment remains a huge part of the data being created and think a lot of moving images, think videos, think Netflix, where the data cannot be interrupted, no one wants the sound or the video of their Netflix movie to be interrupted. That is a form of critical data that cannot be interrupted. It is real time data.”

As Rau emphasizes, AI needs to know how to prioritize data as well as process it.

“You have a combination of AI dealing with a lot of data, a lot of distributed data across that landscape, having to assess whether it is critical or not, and then how sophisticated that data type is. But immediately, the utility for AI is identifying the useful data and bringing that data to the surface for human attention,” he elaborates. “AI is stepping in between our data that we can no longer process because of the amounts and the sophistication of that data and doing the job, and also bringing the data to our attention so we can make good decisions.”

Examples of AI Silicon

As Rau observes, the industry will be working for the next decade or more to advance AI algorithms and processors.

“They will also be developing memory, memory capacity, and various memory types to adapt to the needs that AI will have across this period and across this whole data landscape,” he states.

To more clearly illustrate the market growth of various chips, Rau points to the slide below that aggregates different types of silicon including FPGAs and GPUs, along with specialized AI ASICs/ASSPs.

“You can see significant growth in devices like PCs, phones and tablets. Phones drive a lot of early AI data processing because many of the phone manufacturers, think Apple, think Huawei or their chip providers like Qualcomm, have AI-specific processing capabilities they can put in a phone,” he elaborates.

“This is critical for a phone when it’s doing AI, it’s right in front of you and can process a lot of incoming data through your voice or the video that you’re creating. It can then determine what is important to process and what needs to move on further through edge infrastructure, communications infrastructure, and then into the data center.”

According to Rau, PCs, helped by GPUs, will also be processing AI applications and data, while PCs, phones and tablets are a huge driver of the AI data processing silicon opportunity. Networking infrastructure is “very small” at this point, says Rau, but networking infrastructure will also be processing AI. Specifically, packet processing, compression, and decompression of data as it moves through infrastructure will require AI.

“We have a silicon opportunity that’s pervasive across the AI landscape (or the system landscape) from the IoT endpoints into the data center and cloud. So, we have established the need for AI, the need for processing of AI, and the opportunity for the processing silicon to do AI,” he adds. “With these processors comes the need for memory and most often DRAM. TPUs, GPUs, FPGAs, and other data processing silicon types need DRAM attached to them to bring the data close to the processing so it can be done quickly, such as in real time processing of video.”

The Evolving Role of DRAM in Advancing Artificial Intelligence

As Rau points out, DRAM is a core technology that has proven itself to be extremely adaptive over time, enabling it to support a wide range of system types.

“DRAM has adapted to the needs of new systems, starting with the 90s when personal computing and PCs drove the need for processing and large amounts of memory to meet the general-purpose needs of PCs. When DRAM was, what we called a commodity DRAM, it was ubiquitous, but then as graphics and gaming and other applications came in DRAM adapted into more specialized forms of DRAM for servers, for graphics, and GDDR for example,” he explained.

“More recently with the advent of smartphones, specialized low power LPDRAM is used for those devices. As well, cloud servers sometimes use commodity DRAM or specialized DRAM in large quantities and even some memory modules with their own data buffer chips are used to put more intelligence on the DRAM, next to the DRAM on the module so that DRAM can make some decisions on its own and offload some functions from the CPU. In this way, DRAM is also adapting.”

In terms of the next decade, says Rau, DRAM will once again adapt to new applications across multiple verticals like automotive, video surveillance, and smart homes.

“These applications will need DRAM and processing and heavy amounts of AI. Think in the smart home, think of Alexa, for example, when you talk to Alexa right now, Alexa sends your requests back to the data center,” he elaborates. “In the future, we think that smart home systems will be more locally intelligent to process your request in real time. That means more memory, that means more DRAM.”

Although the smart home space is different than automotive, says Rau, vehicles will also need significant amounts of memory and processing to support AI systems with varying levels of autonomous driving capabilities. In terms of video surveillance, says Rau, AI processing is used to process the event, analyze its severity, and determine will happen next.

“Will that event need to be reported through the video surveillance endpoint, through infrastructure, into the data center in the cloud for some form of more detailed analysis and response?”

The Ubiquity of AI and DRAM

According to Rau, AI will ultimately become ubiquitous in electronic systems.

“As human beings, we use those systems, conceivably seven billion human beings on the planet. There will be seven billion types of AI solutions – especially when you consider all the potential combinations of training versus inferencing, different processing types, the different DRAM types and capacities and configurations that will support that AI,” he explains. “With all of this, DRAM technology continues to adapt. We have GDDR6, which formerly was just graphics DRAM, now being applied in automotive, for example. And HBM, which is high bandwidth memory. Again, formerly just for graphics, but now applied to applications outside of graphics that require very high performance, low latency, as well as a non-proprietary, cost-effective form of performance. There are also memory buffers that go on modules that help the DRAM be more intelligent to respond and offload the needs of CPUs.”

For the next decade and beyond, Rau concludes, DRAM will continue to adapt to new applications driven by AI and data processing.

High-Performance Memory for AI/ML and HPC: Part 2

Rambus Press — Wed, 13 May 2020 17:21:38 +0000

In part one of this two-part series, Semiconductor Engineering Editor in Chief Ed Sperling and Rambus Sr. Director of Product Management Frank Ferro took a closer look at the various types of memory that system designers are using to support artificial intelligence (AI), machine learning (ML), and high-performance computing (HPC) applications. In this blog post, Sperling and Ferro explore how bandwidth-hungry AI/ML and HPC applications are driving new chip, system, and memory architectures.

“We’ve seen that HBM2 and HBM2E are frequently used in applications that demand high performance computing. These include AI/ML training in particular – along with certain networking applications that have locked in on HBM as a solution of choice,” Ferro confirms. “What is shipping out in the market today are solutions based on HBM2. HBM2 gives you a quite a bit of bandwidth. Although 2Gbps per pin isn’t very fast from a per pin standpoint, you’ve got 1024‑bit wide access, which gives you a total of 256 gigabytes of bandwidth.”

Comparison chart of Memory Solutions for AI and HPC

As Ferro notes, the 256Gbps represents the throughput that is delivered by HBM2.

“If you look at some of the applications in AI/ML training, if these are where you string multiple HBM stacks together, this paradigm will give you the kind of performance you are looking for,” he explains. “For example, four stacks of HBM2 will get you up to a terabyte of data throughput.”

Another advantage of HBM2 and HBM2E, says Ferro, is that the memory comprises a small portion of the total system area – with relatively low levels of power consumption.

“With HBM2 and HBM2E, you’re only taking up a small area on the system, along with very low levels of power consumption,” he elaborates. “In short, high bandwidth memory offers a compact solution that delivers very high bandwidth, low power, optimal density, high performance and a small footprint.”

Moreover, says Ferro, HBM2 or HBM2E memories can be linked together in HPC systems.

“How one daisy chains everything together, how the data is moved to the CPU, that is the secret sauce for companies architecting their respective systems,” he states. “Perhaps they have a CPU with its own HBM, or multiple CPUs feeding off a single HBM. These are all potential system designs.”

Commenting on the future beyond HBM2E, Ferro notes that the need for more bandwidth and higher memory densities is only increasing.

“HBM2E uses wide I/O and TSV technologies to support densities up to 24 GB per device at speeds up to 307 GB/s. This bandwidth is delivered across a 1024-bit wide device interface, which is divided into 8 independent channels on each DRAM stack,” he explains. “HBM2E can support 2-high, 4-high, 8-high, and 12-high TSV stacks of DRAM at full bandwidth to allow systems flexibility on capacity requirements from 1 GB – 24 GB per stack.”

As well, says Ferro, HBM2E extends the per pin bandwidth to 3.2 Gbps, adds a new footprint option to accommodate the 16 Gb-layer and 12-high configurations for higher density components, and updates the MISR polynomial options for these new configurations.

Nevertheless, Ferro emphasizes that AI/ML data sets continue to increase at almost exponential rates.

“Even with the improvements offered by HBM2E, system designers are having difficulty keeping up with the massive data sets for AI/ML training. HBM3 is anticipated to deliver much higher bandwidth and densities.”

Commenting on SoC size and chiplets, Ferro points out that costs associated with increasing SoC size has become a significant consideration for the industry.

“We have to look at ways to reduce those costs. We have seen a certain level of disaggregation which has given rise to chiplet technology. There are many reasons aside from a silicon reticle perspective to leverage chiplets,” he elaborates. “One example could be high speed IP that was designed in an earlier node. By using chiplets, a company can take advantage of all that work that was previously developed.”

More specifically, a CPU can be fabbed in the most advanced node, with I/Os in an older node.

“You can also have chips that are I/O bound and then you don’t want to waste silicon. There are a lot of different applications for chiplets going forward. However, system designers will have to determine how a chiplet communicates with the SoC,” he says. “If you are talking to a DRAM – and you have the same HBM PHY – you will have with a monolithic design. But then, as you go back to the CPU, there are different SerDes technologies that are being explored: high speed, very high-speed, and short reach.”

Ferro concludes by noting that high bandwidth memory presents a number of design challenges, as it combines 2.5D and 3D technologies.

“There are a lot of advantages associated with HBM: low power, low area and high bandwidth. However, you must deal with fitting these 2.5D structures in a small area. In addition, maintaining signal integrity is a challenge with traces running at high speeds, as is thermal dissipation from the heat generated by the SoC and the memory itself. Last, but certainly not least, the HBM interposer is a fairly large piece of silicon that presents warping concerns. Nevertheless, even with these design challenges, the performance benefits of HBM are well worth the extra design effort,” he adds.

High-Performance Memory for AI/ML and HPC: Part 1

Rambus Press — Thu, 07 May 2020 15:44:34 +0000

Semiconductor Engineering Editor in Chief Ed Sperling recently spoke with Rambus Sr. Director of Product Management Frank Ferro about designing high-performance memory subsystems for artificial intelligence (AI), machine learning (ML), and high-performance computing (HPC) applications.

As Ferro notes, there is plenty of compute (CPU) power available today to support the above-mentioned markets.

“[However], the advances in computing are now outstripping the ability to feed those [compute] engines with memory. [So, we are often asked (by customers) how to solve this memory bottleneck]– how do we keep all these compute engines fed for applications like HPC and AI/ML?”

More specifically, says Ferro, the above-mentioned bottleneck is in the memory subsystem, with compute speeds routinely outstripping the memory.

“In the past, system designers had limited choices for their memory subsystem. Essentially, DDR4 was the only choice for a time. [Since DDR4 could only hit a max speed of 3.2Gbps, it gave rise to the need for new solutions],” he explains. “High bandwidth memory (HBM), which was one of the first to emerge, is based on DDR technology. It runs at about 3.2Gbps today and gives you the same DDR technology – only with a very wide interface.”

Another memory type based on DDR, says Ferro, is GDDR6 which was originally created for the graphics market some 20 years ago. GDDR has undergone several major evolutions, with the latest iteration running at 16Gbps. In addition, LPDDR5, which has broken out of the mobile market, is running at speeds of 6.4 Gbit/s/pin.

“Designers can now look at all these different technologies to architect the memory subsystem. Depending on the application, any of these – HBM2E, GDDR6, and LPDDR – could potentially be a good solution to keep those CPUs fed with data.”

Selecting the most appropriate memory type, continues Ferro, depends on multiple tradeoffs that are illustrated in the image below.

“There are a number of applications that require HPC capabilities in the cloud. For example, HPC is often a key requirement for complex applications such as genome sequencing and graphics rendering,” Ferro states. “The trade-off for memory to support HPC applications is paying more in terms of power consumption, as well as dollars to access the highest levels of performance computing.”

For AI/ML, says Ferro, there are several market segments, including training with large, complex data sets that can take weeks to process and refine.

“Nevertheless, the tradeoffs for AI/ML memory can be more practical, as these applications typically require one to two terabytes of bandwidth. Power consumption is one of the biggest expenses in the data center, so a more balanced approach makes sense,” he explains. “Certainly, the more processing power you can access in the data center the better. At the same time, you must be conscious of the power and cost elements. Even within the data center you see several market segments where accelerator cards are limited by power. You have some cards that are down as low as 75 watts, but you’ll also have cards that are up to 250 to 300 watts – so you can draw on additional power for those highest performance applications.”

Ferro also notes that although networking isn’t typically associated with high bandwidth computing, the increase of in-line processing means that companies can no longer rely completely on the cloud.

“As you go through the edge of the network all the way up through the cloud, you’re going to want to do processing in-line. So that is why we are now seeing networking applications that require very high bandwidth,” he adds.