AI & Machine Learning Archives - Rambus At Rambus, we create cutting-edge semiconductor and IP products, providing industry-leading chips and silicon IP to make data faster and safer. Tue, 10 Feb 2026 18:00:05 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.3 DEEPX, Rambus, and Samsung Foundry Collaborate to Enable Efficient Edge Inferencing Applications https://www.rambus.com/blogs/deepx-rambus-and-samsung-foundry-collaborate-to-enable-efficient-edge-inferencing-applications/ https://www.rambus.com/blogs/deepx-rambus-and-samsung-foundry-collaborate-to-enable-efficient-edge-inferencing-applications/#respond Tue, 10 Feb 2026 18:00:05 +0000 https://www.rambus.com/?post_type=blogs&p=66027 As artificial intelligence (AI) continues to proliferate across industries – from smart cities and autonomous vehicles to industrial automation, robotics, edge servers, and consumer electronics – edge inferencing has become a cornerstone of next-generation computing. Delivering real-time, low-power AI processing at the edge requires close coordination across AI compute architectures, memory subsystems, and silicon platforms. To meet these demands, DEEPX is collaborating with Rambus and Samsung Foundry to deliver a highly optimized solution that combines efficient AI compute, high-bandwidth memory interfaces, and advanced logic process technology.

A Proven Foundation Scaling Forward

As the foundation of this collaboration, DEEPX worked with Rambus and Samsung Foundry on the DX-M1 AI processor, fabricated using Samsung Foundry’s 5nm technology and integrating silicon-proven LPDDR5 controller IP from Rambus. DX-M1 has been deployed across a range of edge applications, including robotics, edge servers, AI-enabled IT services, smart cameras, and factory automation. Looking to the next generation of edge AI, DEEPX is developing the DX-M2 processor for ultra-low-power generative AI inference on edge devices using Samsung Foundry’s 2nm process technology. Samsung Foundry’s GAA-based 2nm platform is designed to deliver further improvements in power efficiency and performance scaling as edge AI workloads grow in complexity.

Through the Samsung Advanced Foundry Ecosystem (SAFETM) IP Alliance, Rambus works closely with Samsung Foundry to optimize its memory controller IP for advanced Samsung process technologies, enabling DEEPX to integrate proven IP more efficiently, lower design risk, and accelerate time to production for next-generation designs.

A Unified Solution for Edge AI

The collaboration between DEEPX, Rambus, and Samsung Foundry brings together three core pillars of edge inferencing:

  • AI Inference Technology: DEEPX contributes its ultra-efficient AI inference processors, designed to deliver high performance with minimal power consumption—ideal for endpoint devices such as AI PC, AI of Things, automotive, edge server, robotics, and industrial sensors.
  • High Performance Memory:  Rambus enhances memory performance with its LPDDR5/5X memory controller IP, which supports data rates up to 9.6 Gbps and features advanced bank management, command queuing, and look-ahead logic to maximize throughput and minimize latency.
  • Advanced Process Technology:  Samsung Foundry provides the silicon platform and ecosystem enablement that support DEEPX’s edge AI development, helping reduce integration complexity and improve design predictability through advanced logic processes and the SAFETM Alliance. Samsung Foundry’s 2nm GAA – process technology represents a key next step for DEEPX’s DX-M2 processor, supporting further gains in power efficiency and performance scaling.

Together, these technologies empower edge devices to run complex AI workloads locally, with low power and performance efficiency, setting the stage for the next generation of edge inferencing.

Optimized Memory for AI Inference

The Rambus LPDDR5/5X memory controller IP is purpose-built for applications requiring high memory throughput at low power. It supports features such as:

  • Queue-based user interface with reordering scheduler
  • Look-ahead activate, precharge, and auto-precharge logic
  • Support for burst lengths BL16 and BL32
  • Parity protection and in-line ECC
  • Compatibility with LPDDR5T, LPDDR5, and LPDDR5X devices
  • Interoperability with Samsung LPDDR5/5X PHY

These capabilities are essential for AI inference, where memory bandwidth and latency directly impact model responsiveness and accuracy.

The Value of Samsung Foundry’s “One-Stop-Shop” Model

Samsung Foundry brings together advanced logic process technology and a tightly aligned SAFETM IP ecosystem through a vertically integrated technology stack that simplifies complex programs. By coordinating cutting-edge logic processes -, IP readiness, and manufacturing considerations earlier in the design cycle, Samsung Foundry helps reduce multi-vendor friction, improves integration efficiency, and accelerates time-to-market.

For edge AI applications such as DEEPX’s DX-M roadmap, Samsung Foundry’s scalable process portfolio – from FinFET to leading-edge 2nm GAA – supports aggressive power-performance targets while maintaining manufacturability. Through collaboration with the SAFETM ecosystem, memory controller IP from partners like Rambus can be efficiently integrated, helping reduce risk and accelerate time to silicon.

This ecosystem-driven model allows customers to focus on AI architecture and application differentiation, while relying on a stable and scalable silicon platform to support current and future edge AI designs.

Empowering the AI Revolution at the Edge

This collaboration exemplifies the power of ecosystem synergy. By combining DEEPX’s AI compute innovation, Samsung Foundry’s manufacturing excellence and ecosystem enablement, and Rambus’ memory interface leadership the trio is enabling a new generation of edge devices that are smarter, faster, and more secure.

Whether it’s enabling real-time object detection in smart cameras, predictive maintenance in industrial systems, or intelligent navigation in autonomous drones, the joint solution is poised to transform how AI is deployed at the edge.

Looking Ahead: Pushing the Boundaries with LPDDR6

Looking ahead, DEEPX and Rambus are extending their collaboration to the next frontier: LPDDR6 & LPDDR6-PIM (Processing In Memory). As AI models grow in complexity and demand even greater memory bandwidth, LPDDR6 is poised to deliver speeds exceeding 9.6 Gbps, while reducing operational power by up to 30% compared to LPDDR5X.

DEEPX, with its roadmap for next-generation AI chips like the DX-M2, is aligning its architecture to take full advantage of LPDDR6’s capabilities.

This forward-looking collaboration underscores the trio’s commitment to redefining what’s possible in edge AI—delivering smarter, faster, and more efficient solutions that scale with the future of computing.

]]>
https://www.rambus.com/blogs/deepx-rambus-and-samsung-foundry-collaborate-to-enable-efficient-edge-inferencing-applications/feed/ 0
High Bandwidth Memory (HBM): Everything You Need to Know https://www.rambus.com/blogs/hbm3-everything-you-need-to-know/ https://www.rambus.com/blogs/hbm3-everything-you-need-to-know/#respond Thu, 30 Oct 2025 17:00:18 +0000 https://www.rambus.com/?post_type=blogs&p=63372 [Updated on October 30, 2025] In an era where data-intensive applications, from AI and machine learning to high-performance computing (HPC) and gaming, are pushing the limits of traditional memory architectures, High Bandwidth Memory (HBM) has emerged as a high-performance, power-efficient solution. As industries demand faster, higher throughput processing, understanding HBM’s architecture, benefits, and evolving role in next-gen systems is essential.

In this blog, we’ll explore how HBM works, how it compares to previous generations, and why it’s becoming the cornerstone of next-generation computing.

Table of Contents:

What is High Bandwidth Memory (HBM) and How is it Reshaping the Future of Computing?


As computing races toward higher speeds and greater efficiency, memory bandwidth has emerged as a major bottleneck for workloads like AI, high-performance computing, and data analytics. This is where High Bandwidth Memory (HBM) comes in. HBM is a cutting-edge 2.5D and 3D memory architecture designed with an exceptionally wide data path, enabling massive throughput and performance gains. Unlike traditional memory architectures that rely on horizontal layouts and narrow interfaces, HBM takes a vertical approach: stacking memory dies atop one another and connecting through through-silicon vias (TSVs). This 3D-stacked design drastically shortens data travel paths, enabling bandwidth and lower power consumption in a compact footprint.

HBM operates at incredible multi-gigabit speeds. When you combine that speed with a very wide data path, the result is staggering bandwidth, often measured in hundreds of Gigabytes per second (Gb/s) and even reaching into the Terabytes per second (TB/s) range.

To put this into perspective, an HBM4 device running at 8 GB/s delivers 2.048 TB/s of bandwidth. That level of performance is what makes HBM4 a leading choice for AI training hardware.

What is a 2.5D/3D Architecture?

2.5D and 3D architectures refer to advanced integration techniques that improve performance, bandwidth, and power efficiency by bringing components closer together—literally.

HBM4 Uses a 2.5D/3D Architecture
HBM4 Uses a 2.5D/3D Architecture

3D Architecture
The “3D” part is easy to see. In 3D architecture, chips are stacked vertically and connected through TSVs (vertical electrical connections that pass through the silicon dies). An HBM memory device is a packaged 3D stack of DRAM, forming a compact, high-performance memory module. Think of it as a high-rise building of chips with elevators (TSVs) connecting the floors.

2.5D Architecture
In a 2.5D setup, multiple chips, like a CPU, GPU, and in our case, HBM devices stacks are placed side-by-side on a silicon interposer – a thin substrate of silicon that acts as a high-speed communication bridge. The interposer contains the fine-pitch wiring that enables fast, low-latency connections between the chips.
Why do we need to use a silicon interposer? The data path between each HBM4 memory device and the processor requires 2,048 “wires” or traces. With the addition of command and address, clocks, etc. the number of traces necessary grows to about 3,000.

Thousands of traces are far more than can be supported on a standard PCB. Therefore, a silicon interposer is used as an intermediary to connect memory device(s) and processor. As with an integrated circuit, finely spaced traces can be etched in the silicon interposer enabling the desired number of wires needed for the HBM interface. The HBM device(s) and the processor are mounted atop the interposer in what is referred to as a 2.5D architecture.

HBM uses both 2.5D and 3D architectures described above, so it’s a 2.5D/3D architecture memory solution.

How is HBM4 Different from HBM3E, HBM3, HBM3, or HBM (Gen 1)?

HBM4 represents a significant leap forward from its predecessors—HBM3E, HBM3 and earlier generations—in terms of bandwidth, capacity, efficiency and architectural innovation. With each generation, we’ve seen an upward trend in data rate, 3D-stack height, and DRAM chip density. That translates to higher bandwidth and greater device capacity with each upgrade of the specification.

When HBM launched, it started with a 1 Gb/s data rate and a 1024-bit wide interface. HBM delivered 128 GB/s of bandwidth, a huge step forward at the time.
Since then, every generation has pushed the limits a little further. HBM2, HBM3, and now HBM3E have all scaled bandwidth primarily by increasing the data rate. For example, HBM3E runs at 9.6 Gb/s, enabling a 1229 GB/s of bandwidth per stack. That’s impressive, but HBM4 takes things to an entirely new level. HBM4 doesn’t just tweak the speed; it doubles the interface width from 1024 bits to 2048 bits. This architectural shift means that even at a modest 8 Gb/s data rate, HBM4 can deliver 2.048 TB/s of bandwidth per stack. That’s nearly double what HBM3E offers.

Chip architects aren’t stopping at one stack. In fact, they’re designing systems with higher attach rates to feed the insatiable appetite of AI accelerators and next-gen GPUs. Imagine a configuration with eight HBM4 stacks, each running at 8 Gb/s. The result? A staggering 16.384 TB/s of memory bandwidth. That’s the kind of throughput needed for massive AI models and high-performance computing workloads.

This table below shares the key differences between HBM4 and its earlier generations.

Generation Data Rate (Gb/s) Interface Width (b) Bandwidth per Device (GB/s) Stack Height Max. DRAM Capacity (Gb) Max. Device Capacity (GB)
HBM 1.0 1024 128 8 16 16
HBM2 2.0 1024 256 8 16 16
HBM2E 3.6 1024 461 12 24 36
HBM3 6.4 1024 819 16 32 64
HBM3E 9.6 1024 1229 16 32 64
HBM4 8.0 2048 2048 16 32 64

What are the Additional Features of HBM4?

But that’s not all. HBM4 also introduces enhancements in power, memory access and RAS over HBM3E.

    • Double the Memory Channels: HBM4 doubles the number of independent channels per stack to 32 with 2 pseudo-channels per channel. This provides designers more flexibility in accessing the DRAM devices in the stack.
    • Improved Power Efficiency: HBM4 supports VDDQ options of 0.7V, 0.75V, 0.8V or 0.9V and VDDC of 1.0V or 1.05V. The lower voltage levels improve power efficiency.
    • Compatibility and Flexibility: The HBM4 interface standard ensures backwards compatibility with existing HBM3 controllers, allowing for seamless integration and flexibility in various applications.
    • Directed Refresh Management (DRFM): HBM4 incorporates Directed Refresh Management (DRFM) for improved Reliability, Availability, and Serviceability (RAS) including improved row-hammer mitigation.

Rambus HBM Memory Controller Cores for AI and High-Performance Workloads

Rambus delivers a comprehensive portfolio of HBM controller cores engineered for maximum speed and efficiency. Designed for high bandwidth and ultra-low latency, these controllers enable cutting-edge performance for AI training, machine learning, and advanced computing applications.

The lineup includes our industry-leading HBM4 memory controller, supporting data rates up to 10 Gb/s and offering exceptional flexibility for next-generation workloads. With Rambus HBM controllers, designers can achieve superior throughput, scalability, and reliability for demanding AI and HPC environments.

Summary

As computing demands continue to skyrocket, HBM stands out as a transformative technology that addresses the critical bottleneck of memory bandwidth. By leveraging advanced 2.5D and 3D architectures, HBM delivers massive throughput, exceptional power efficiency, and scalability for next-generation workloads. With HBM4 doubling interface width and introducing new features for flexibility and reliability, it is poised to become the backbone of AI, HPC, and data-intensive applications. Understanding this evolution is key to achieving the performance required for tomorrow’s most demanding systems.

Explore more resources:
HBM4 Memory: Break Through to Greater Bandwidth
Unleashing the Performance of AI Training with HBM4
Ask the Experts: HBM3E Memory Interface IP

]]>
https://www.rambus.com/blogs/hbm3-everything-you-need-to-know/feed/ 0
From Dorm Room Beginnings to a Pioneer in the AI Chip Revolution: How Etched is Collaborating with Rambus to Achieve Their Vision https://www.rambus.com/blogs/from-dorm-room-beginnings-to-a-pioneer-in-the-ai-chip-revolution-how-etched-is-collaborating-with-rambus-to-achieve-their-vision/ https://www.rambus.com/blogs/from-dorm-room-beginnings-to-a-pioneer-in-the-ai-chip-revolution-how-etched-is-collaborating-with-rambus-to-achieve-their-vision/#respond Tue, 17 Jun 2025 15:22:06 +0000 https://www.rambus.com/?post_type=blogs&p=65562 In the fast-paced world of Artificial Intelligence (AI), a groundbreaking startup is making waves with its innovative approach to AI chip design. Etched, founded by Harvard dropouts Gavin Uberti, Chris Zhu, and Robert Wachen in 2022, has set out to transform the AI industry with a bold vision: creating specialized chips designed to accelerate AI inference.

The Birth of a Vision

What began as a school project quickly evolved into a mission to revolutionize AI computing. Bonding over their shared passion for building cool things and fascination with operating models at scale, they stumbled upon an idea that would change their lives forever. The trio realized that the future of AI lay in specialized chips for running generative AI models to deliver higher performance when compared to existing solutions. Driven by their conviction that transformers would dominate the AI landscape, they made the daring decision to drop out of Harvard and fully commit to their startup. Their vision was clear: to create a chip that could outperform general-purpose GPUs in running transformer models, revolutionizing the AI industry.

Early Challenges

Like many early-stage startups, Etched faced numerous challenges as they embarked on their journey. Some of the most pressing issues included:

  1. Building the Right Team: Assembling a group of talented engineers and AI experts who shared their vision and could bring their ideas to life was crucial.
  2. Product Definition: Designing a chip specifically for transformer models required overcoming intricate technical challenges and pushing the boundaries of existing technology.
  3. Market Validation: Convincing potential customers and partners to get behind their conviction was essential to maintain the course.
  4. Resource Management: Balancing limited resources while striving for rapid development and innovation posed a constant challenge.

Enter Rambus: A Collaborative Partnership

As Etched worked to overcome these challenges, they found a valuable partner in Rambus, a leader in high-performance chip and silicon IP solutions. The expertise of Rambus in memory and interface technologies proved instrumental in helping Etched achieve their ambitious goals for their system-on-chip (SoC) design. With their silicon-proven, high-performance memory controller IP cores, Rambus provided Etched with the IP they needed to optimize their chip for AI/ML applications.

Overcoming Technical Hurdles

One of the key challenges Etched faced was achieving the right balance of Power, Performance, and Area (PPA) for their SoC in solving the memory bottleneck problem. Rambus offered Etched an HBM memory controller along with integration services to integrate the controller with a PHY to realize a complete memory sub-system. This integrated solution significantly reduced implementation complexity, allowing Etched to focus on their core innovation.

Etched Transformer Accelerator
Etched Transformer Accelerator

Achieving Goals and Pushing Boundaries

With support from Rambus, Etched was able to:

  1. Enhance Performance: The high-bandwidth, low-latency memory solutions provided by Rambus enabled Etched to achieve the performance targets necessary for running complex transformer models.
  2. Optimize Power Consumption: By leveraging efficient interface IP from Rambus, Etched could design their chip to deliver exceptional performance while consuming less energy than traditional GPUs.
  3. Minimize Chip Area: Etched optimize their chip’s footprint without sacrificing functionality by leveraging the expertise of Rambus in designing compact, high-performance interfaces.
  4. Accelerate Time-to-Market: The silicon-proven IP and comprehensive support from Rambus significantly reduced development time and risks, allowing Etched to bring their innovative chip to market faster.

A Bright Future Ahead

The Etched story is a testament to the power of innovation and perseverance. With continued support from Rambus with its cutting-edge IP solutions, Etched is well-positioned to challenge established players in the AI chip market and potentially reshape the future of artificial intelligence computing. Their first chip, Sohu, promises to deliver 10x performance gains when compared to the leading GPU in the market today.

As AI continues to evolve and transform industries across the globe, collaborations between innovative startups like Etched and established technology leaders like Rambus will be crucial in driving the next wave of technological advancements. The success story of Etched serves as an inspiration to other early-stage startups, demonstrating that with the right vision, partners, and technology, it’s possible to turn bold ideas into reality and make a lasting impact by delivering innovative solutions in a rapidly changing AI landscape.

]]>
https://www.rambus.com/blogs/from-dorm-room-beginnings-to-a-pioneer-in-the-ai-chip-revolution-how-etched-is-collaborating-with-rambus-to-achieve-their-vision/feed/ 0
Ask the Experts and the Future of Client Computing https://www.rambus.com/blogs/ask-the-experts-and-the-future-of-client-computing/ https://www.rambus.com/blogs/ask-the-experts-and-the-future-of-client-computing/#respond Tue, 20 May 2025 17:40:54 +0000 https://www.rambus.com/?post_type=blogs&p=65532 In our latest episode of Ask the Experts, we had the opportunity to explore the evolving world of client computing with insights from Carlos Weissenberg, senior product marketing manager for Memory Interface Chips at Rambus. The client computing market encompasses both desktop and notebook PCs. This market is witnessing significant trends and innovations, particularly driven by the increasing demands of AI applications. Carlos shares his perspectives on these trends and the implications for the client computing architecture.

Watch the video below are scroll down to read some highlights.

Trends in the PC Market

The PC remains the premier platform for productivity, content creation, and gaming. Despite the rise of mobile devices, there is still a substantial segment of users and businesses that prefer desktop PCs due to their flexibility, upgradeability, and lower costs. However, over the past decade, there has been a steady shift towards notebook PCs. Users are increasingly seeking the portability of notebooks without compromising on computing power.

AI’s Impact on Client Computing Architecture

AI is driving a renaissance in the PC market. The demands of AI applications are pushing the industry to innovate and offer new memory module form factors that deliver higher memory performance. AI inferencing, which requires high memory capacity and bandwidth, is becoming a key feature in PCs. The rapid development of AI models and applications is driving the need for higher data rates and higher density DRAM chips.

Introducing LPCAMM2: A New Memory Module Form Factor

One of the notable innovations in the client computing space is the LPCAMM2 memory module. LPCAMM2 enables the use of LPDDR memory in notebook PCs with the flexibility and upgradability of a module. Originally developed for mobile phones, LPDDR memory was designed to be soldered down close to the processor to maintain signal integrity. LPCAMM2 overcomes the limitations of soldered-down LPDDR memory by providing a compact, low-height form factor that supports ultra-thin notebooks. It also allows for memory expansion and repairs, which were not possible with LPDDR mounted directly to the motherboard.

New Client Memory Interface Chipsets for DDR5 DIMMs and LPCAMM2

Carlos discussed the new memory interface chipsets for DDR5 client DIMMs and LPCAMM2. Traditionally, desktop PCs used UDIMMs and notebook PCs used SODIMMs. However, starting at 6400 MT/s, desktop PCs now use CUDIMMs and notebook PCs use CSODIMMs; the “C” standing for Clocked. These new modules feature a Client Clock Driver (CKD) to improve signal integrity and enable higher memory bandwidth. Additionally, the CUDIMM/CSODIMM memory interface chipset includes a Power Management IC (PMIC5120) and a Serial Presence Detect Hub (SPD Hub). For the LPCAMM2, the chipset includes a specifically tailored PMIC (PMIC5200) and the SPD Hub.

Why Choose Rambus for Memory Solutions?

Rambus has been a pioneer in the memory space for over 30 years, developing innovative memory subsystems to support both servers and client CPUs. With a strong track record and high-volume experience, Rambus has established itself as a first-class semiconductor product company. The company has strong partnerships across the memory ecosystem, making it a reliable choice for system and module makers seeking advanced memory solutions.

Expert

Carlos Weissenberg, senior product marketing manager for Memory Interface Chips at Rambus. His extensive industry experience includes technical marketing leadership positions at Intel and Supermicro.

Key Takeaways

  • PC Market Trends: The shift towards notebook PCs driven by the demand for portability without compromising computing power.
  • AI’s Impact: The increasing role of AI in driving innovations for higher performance in the client computing architecture.
  • LPCAMM2: A new memory module form factor enabling the higher data rates, and lower power consumption of LPDDR memory in a flexible module form factor.
  • Memory Interface Chipsets: Rambus offers new memory interface chipsets for DDR5 CSODIMMs and CUDIMMs and LPDDR5 LPCAMM2 which support higher capacity and bandwidth.
  • Rambus: A trusted innovator in the memory space with a strong track record and broad industry experience.

Stay tuned for more insights from our experts as we continue to explore the evolving landscape of client computing.

]]>
https://www.rambus.com/blogs/ask-the-experts-and-the-future-of-client-computing/feed/ 0
Chain-of-Thought and the State of AI in this episode of Ask the Experts https://www.rambus.com/blogs/ask-the-experts-the-state-of-ai/ https://www.rambus.com/blogs/ask-the-experts-the-state-of-ai/#respond Wed, 26 Mar 2025 16:44:33 +0000 https://www.rambus.com/?post_type=blogs&p=65353 On this episode of Ask the Experts, we had the opportunity to chat with Steven Woo, Rambus Fellow & Distinguished Inventor, about the latest developments in AI and the implications for hardware and computing architecture. Specifically, he discussed the exciting new inference technique called chain-of-thought and the innovations needed to support its computational demands.

Key Topics of this episode of Ask The Experts

What is Chain-of-Thought?

Chain-of-thought is an improvement in the way large language models operate, where complex questions and prompts are broken down into simpler steps to achieve the answer to a more complex problem. This mimics human reasoning, where smaller steps are taken to reason through a problem. Large language models in AI are now starting to mimic this process, allowing for higher quality answers.

How does Chain-of-Thought enable memory innovations?

Chain-of-thought requires more compute power as each step in the reasoning process takes its own amount of compute capability. This approach puts more bandwidth and capacity pressure on the computing architecture. Innovations such as HBM4 with 3D stacking, multi-level signaling (PAM3) in GDDR7, and multiplexing in MRDIMM are being developed and deployed to provide more memory bandwidth and capacity.

How can we manage the power demands of AI systems?

As AI systems scale up, the requirements for greater power also rise. Cooling systems are transitioning to liquid cooling due to its higher capacity to transfer and move heat compared to air. Additionally, power delivery systems are shifting from 12-volt to 48-volt infrastructure to manage the increased current demands. Future discussions are exploring even higher voltages, such as 400-volt infrastructure.

Power Management ICs (PMICs) and AI

Changes in voltage levels require power management ICs (PMICs) to convert the voltage close to the components that consume it. This ensures high-quality power with minimal losses. PMICs are expected to support larger numbers of voltage conversions and show up in more places within the chassis.

The Future of AI

AI is a game-changing technology, bringing new use cases and applications. Large language models are improving in accuracy, and robotics are being trained in virtual worlds to navigate and perform tasks. These assistive technologies make humans more productive and open up new possibilities for further improvements in technology.

Key Quote

“Chain-of-thought is really an improvement in the way some of these large language models operate, where very large and complex questions and prompts are broken down into simpler steps first to achieve the answer to a more complex problem. This really mimics human reasoning where in order to arrive at the answer to a complex problem, we take smaller steps, and we reason through it.”

]]>
https://www.rambus.com/blogs/ask-the-experts-the-state-of-ai/feed/ 0
Rambus, VIAVI and Samtec Demonstrate CXL® over Optics PoC at Upcoming SC24 https://www.rambus.com/blogs/rambus-viavi-and-samtec-demonstrate-cxl-over-optics-poc-at-upcoming-sc24/ https://www.rambus.com/blogs/rambus-viavi-and-samtec-demonstrate-cxl-over-optics-poc-at-upcoming-sc24/#respond Thu, 14 Nov 2024 21:46:56 +0000 https://www.rambus.com/?post_type=blogs&p=65141 The disruption of GenAI over the last few years has forced system architects and hardware designers to rethink data center topologies. While AI model sizes and compute capability are growing exponentially, I/O throughput and memory access are growing linearly. These trends create an unsustainable gap, and it needs to be addressed across the stack starting from the physical layer at the chip level all the way to the network layer.

New external cabling solutions enable ever-changing data center topologies. Rack scale connectivity, as an example, will define next-generation architectures over long reach cables.  Copper will work for a couple of meters, but optical solutions are needed for a rack-to-rack use case with a cable length of 7 meters and for cable lengths exceeding 10 meters for larger clustering use cases.

What is CXL?

CXL is a breakthrough high-speed CPU-to-Device and CPU-to-Memory interconnect designed to accelerate next-generation data center topologies.

CXL is an open industry standard offering high-bandwidth low-latency connectivity between the host processor and devices such as accelerators, memory controller/expander, and smart I/O devices for heterogeneous computing and disaggregation use cases.

The CXL® Consortium is an open industry standard group formed to develop technical specifications that facilitate breakthrough performance for emerging usage models while supporting an open ecosystem for data center accelerators and other high-speed enhancements. The CXL Consortium represents a wide range of industry expertise including leading cloud service providers, communications OEMs, IP/silicon/device providers and system OEMs.

Rambus CXL Controller IP

Rambus high-performance CXL controller IP is optimized for use in SoCs, ASICs and FPGAs. These industry-leading solutions for high-performance interfaces address AI/ML, data center and edge applications.

The Rambus CXL Controller IP leverages a silicon-proven PCIe controller architecture for the CXL.io path and adds CXL.cache and CXL.mem paths specific to the CXL standard. The controller IP exposes a native Tx/Rx user interface for CXL.io traffic as well as an Intel CXL-cache/mem Protocol Interface (CPI) for CXL.mem and CXL.cache traffic.

The provided Graphical User Interface (GUI) Wizard allows designers to tailor the IP to their exact requirements, by enabling, disabling, and adjusting a vast array of parameters, including CXL device type, PIPE interface configuration, buffer sizes and latency, low power support, SR-IOV parameters, etc. for optimal throughput, latency, size and power.

The controller IP can be delivered standalone or integrated with the customer’s choice of CXL/PCIe compliant SerDes. It can also be provided with example reference designs for integration with FPGA SerDes.

VIAVI CXL Products

VIAVI Xgig Analyzer solutions for PCIe 5.0/6.0 support PCIe/CXL.io and CXL.cache/memory transactions with advanced trigger and filter templates that enable faster debugging and root cause analysis. The Xgig captures valuable real-time metrics and performs detailed analytics across multiple protocols.

VIAVI Xgig Exerciser solutions also support CXL compliance and traffic generation.

VIAVI PCIe interposers, such as the Xgig PCIe 16-lane CEM Interposer, can be used to capture CXL traffic running on a PCIe physical layer. The interposer creates a bi-directional interface between the protocol analyzer and system under test.

Samtec CXL Over Optics Technology

Samtec’s FireFly™ Micro Flyover System™ embedded and rugged optical transceivers take data connection via optical cable at greater distances or copper for cost optimization. FireFly is the first interconnect system that gives a designer the flexibility of using optical and copper interconnects interchangeably with the same connector system.

Samtec’s PCUO series supports PCIe and CXL protocols via patented FireFly optical transceiver in x4, x8 and x16 configurations at PCIe 4.0/16 Gbps data rates.  PCIe 5.0/32 Gbps and PCIe 6.0/64 Gbps PAM4 data rates are also under development. Additionally, Samtec also offers a growing family of optically-enables industry standard PCB form factors (PCIe CEM AIC, OCP NIC 3.0, OCP OAI EXP, EDSFF E3.x 2T, etc.) for easy-to-use optical connectivity.

For More Information
For more information about VIAVI’s CXL product portfolio, please visit www.viavisolutions.com/cxl.
For more information about the Rambus CXL product portfolio, please visit www.rambus.com/cxl.
For more information about the Samtec CXL product portfolio, please visit www.samtec.com/cxl-interconnect.

]]>
https://www.rambus.com/blogs/rambus-viavi-and-samtec-demonstrate-cxl-over-optics-poc-at-upcoming-sc24/feed/ 0
Memory Bandwidth and DDR5 MRDIMMs Explained in this Ask the Experts https://www.rambus.com/blogs/ask-the-experts-ddr5-mrdimms/ https://www.rambus.com/blogs/ask-the-experts-ddr5-mrdimms/#respond Tue, 12 Nov 2024 19:15:26 +0000 https://www.rambus.com/?post_type=blogs&p=65132 John Eble, Vice President of Product Marketing for Memory Interface Chips at Rambus, recently shared the latest developments on the MRDIMM (Multiplexed Rank DIMM) DDR5 memory module architecture. This cutting-edge technology brings significant advances to memory bandwidth and capacity to support compute-intensive workloads including generative AI.

What is MRDIMM?

MRDIMM builds upon the existing DDR5 infrastructure to ease implementation while providing a substantial performance boost. Its architecture is designed to double the data rate per signal pin, significantly enhancing bandwidth while preserving DDR5 signal routing between hosts and memory modules. It does so by introducing key innovations such as parallel activation and access of DRAM ranks and data stream multiplexing, effectively unlocking higher data transfer rates.

Key Innovations of MRDIMM

The MRDIMM architecture enhances performance in important ways:

  1. Parallel DRAM Activation: MRDIMM enables pairs of DRAM ranks to be activated and accessed in parallel. This innovation plays a crucial role in increasing data throughput.
  2. Multiplexed Data Streams: By multiplexing data streams, MRDIMM can effectively double the data rate for each signal pin, resulting in a substantial improvement in memory bandwidth. MRDIMM 12800 provides a data rate of 12,800 MT/s using DDR5 DRAM that operate at 6,400 MT/s.
  3. Increased Capacity: Unlike traditional memory modules, MRDIMM supports more than two ranks of DRAM. This capability allows for increased memory capacity in a cost-efficient manner.

New Components for MRDIMM

To support its high-performance architecture, MRDIMM requires several new components, designed to work together seamlessly:

  • MRCD (Multiplexing Registering Clock Driver): The MRCD extends the typical registering clock driver function to receive an interleaved stream of DRAM commands at twice the typical RDIMM rate. It deinterleaves the command data stream and then steers it correctly to its rank-specific outputs.
  • MDB (Multiplexing Data Buffer): Ten of these chips per MRDIMM provide the multiplexing and demultiplexing necessary to convert a 16-bit DRAM interface running at native DRAM speed to an 8-bit host interface running at twice that speed.  MDBs also provide load isolation to the host or CPU, which is a key enabler for MRDIMM to increase the number of ranks and overall capacity of the module.
  • PMIC 5030 Power Management IC: Given the parallel activations of DRAM ranks and the additional chips added to the chipset, the absolute power envelope of the module is higher than a typical RDIMM. The new PMIC 5030 is designed to comfortably handle the amount of power required of such a high-bandwidth/high-capacity DIMM.

Flexibility and Compatibility

One of the standout features of MRDIMM is its compatibility as a drop-in replacement for server main memory upgrades. This design approach provides a high level of flexibility, allowing data centers and enterprises to adopt MRDIMM for enhanced memory performance while preserving DDR5 server architecture.

Rambus’ Expertise in High-Quality Memory Solutions

Eble emphasized Rambus’ long-standing commitment to developing reliable, interoperable memory solutions. With decades of expertise, Rambus is well positioned to lead advancements in memory technology, ensuring that MRDIMM modules meet the rigorous demands of modern computing environments.

Looking Ahead

As the memory demands of advanced workloads grow, innovations like MRDIMM represent critical enablers of the continued progression of computing performance. With its ability to increase both bandwidth and capacity while maintaining compatibility with existing server architecture, MRDIMM is poised to become an important element of cloud and enterprise data centers.

Watch the full video interview below or skip down the page to read the key takeaways.

Expert

John Eble, Vice President of Product Marketing for Memory Interface Chips, Rambus

Key Takeaways

1. Enhanced DDR5 Architecture: MRDIMM is a new DDR5 memory module architecture that significantly increases memory bandwidth and capacity by utilizing parallel access and multiplexing techniques.

2. Doubled Bandwidth: MRDIMM modules effectively double the data rate per signal pin which doubles the bandwidth available to the CPU per DIMM slot compared to standard DDR5 RDIMMs.

3. Increased DRAM Capacity: The new architecture allows for more than two ranks of DRAM, enabling cost-efficient capacity increases with configurations of up to 8 ranks of single or dual die packages.

4. New Memory Interface Chips: MRDIMM requires new and upgraded components, including the multiplexing registered clock driver (MRCD) and multiplexing data buffer (MDB), as well as a new power management IC (PMIC 5030) to handle higher power demands.

5. Future Roadmap: Servers utilizing MRDIMM 12800 are expected to launch in 2026, with future MRDIMM modules leveraging still faster DRAMs and advanced signaling innovations to achieve even higher speeds and capacities.

Key Quote

One of the nice things about this technology is that it can be a drop-in replacement. A single motherboard design will support both MRDIMM and RDIMM as the DIMM connector is the same and the routing topology, the physical layer, is the same from the host to the DIMM. So, users do not need to decide on MRDIMM or RDIMM when designing their servers, or even when initially deploying a server as they can always come back at a later time and upgrade. This provides a lot of flexibility through the life cycle of the server.

]]>
https://www.rambus.com/blogs/ask-the-experts-ddr5-mrdimms/feed/ 0
Challenges and Opportunities of Deploying AI at the Edge with Expedera https://www.rambus.com/blogs/ask-the-experts-ai-at-the-edge/ https://www.rambus.com/blogs/ask-the-experts-ai-at-the-edge/#respond Mon, 04 Nov 2024 21:48:47 +0000 https://www.rambus.com/?post_type=blogs&p=65119 The rapid evolution of artificial intelligence (AI) is transforming edge computing, and Sharad Chole, Co-founder and Chief Scientist at Expedera discusses the implications. Expedera, a neural network IP provider, focuses on neural processing units (NPUs) for edge devices, emphasizing low-power operation, optimizing bandwidth, and cost efficiency. In our latest episode of Ask the Experts, Sharad shared his insights on the challenges and opportunities of deploying AI inference workloads at the edge.

The Exponential Growth in AI Model Complexity

Sharad began by noting the exponential growth in AI model sizes, from hundreds of millions to billions and now trillions of parameters. This explosive increase poses significant challenges, especially when deploying these complex models on edge devices with limited resources.

Overcoming Challenges in Edge AI: Memory and Bandwidth

Memory and bandwidth management emerged as central themes in Sharad’s talk. For edge devices to perform AI inference tasks efficiently, they need advanced memory management techniques to handle data processing without overwhelming system resources. Sharad emphasized the role of quantization techniques, which reduce the computational load of AI models, making them more suitable for edge deployment. He categorized AI applications into human task replacement, supervised agents, and tools, noting the industry is increasingly focused on supervised agents and tools for practical deployment.

The Road Ahead for AI at the Edge

Sharad concluded by outlining the critical challenges that lie ahead for AI hardware, particularly the need for efficient memory and bandwidth management for both training and inference. As AI continues to grow in complexity, so too will the demands on hardware.

For those interested in learning more about Expedera’s work in advancing edge AI technology, Sharad invites readers to visit Expedera’s website and connect with him on LinkedIn.

Watch the full video interview below or skip down the page to read the key takeaways.

Expert

  • Sharad Chole, Co-founder and Chief Scientist, Expedera

Key Takeaways

  1. Neural Processing Units: Expedera, founded in 2018, specializes in Neural Processing Units (NPUs) for edge devices, focusing on low-power, optimized bandwidth, and cost-effective solutions, instantiated in over 10 million customer devices already deployed.
  2. AI Model Challenges: The rapid growth in AI model sizes, such as Stable Diffusion and LLMs, presents significant challenges for edge deployment, particularly in managing memory and optimizing model weights through techniques like quantization and knowledge distillation.
  3. Multimodal AI Complexity: Multimodal AI, which integrates text, audio, video and other media, increases the complexity and memory demands of models, necessitating advanced methods like cross-attention layers to handle diverse data inputs efficiently.
  4. AI Workloads: AI applications can be thought of in three broad classes: human replacement, supervised agents, and unsupervised tools, with the latter two showing more immediate practical use, especially in tasks like translation and voice command processing.
  5. AI Hardware Challenges: The primary challenges for AI hardware include managing the high bandwidth and interconnect needs of large models and ensuring cost-effective, scalable memory solutions for inference, with a focus on balancing capacity and cost.

Key Quote

One thing to point out here is why we are going towards larger models. And this is very interesting to me, it’s because of scaling laws. So, there is a point at which the models start exhibiting interesting capabilities – it’s not just predicting the next token based on what is similar in the corpus, but it’s actually understanding your question and context, and you can ask it more complex questions.

]]>
https://www.rambus.com/blogs/ask-the-experts-ai-at-the-edge/feed/ 0
Ask the Experts Explores Securing AI https://www.rambus.com/blogs/ask-the-experts-securing-ai/ https://www.rambus.com/blogs/ask-the-experts-securing-ai/#respond Thu, 11 Jul 2024 18:10:40 +0000 https://www.rambus.com/?post_type=blogs&p=64828 The topic of this “Ask the Experts” episode is one that is much discussed right now: how to secure AI. We talked to Scott Best, Senior Director of Security Products at Rambus to find out more.

The discussion focused on the challenges of securing AI systems, drawing parallels with FPGA systems. The discussion focuses on the immense value that an AI inference model holds and how hardware-level security solutions are key to protecting it from potential adversaries.

The discussion also touched on the emerging threat of quantum computers, which could compromise public key cryptography. To counter these threats, Rambus offers a broad portfolio of security IP to protect AI silicon.

The interview concluded with a rather meta discussion on the potential of AI being used to attack AI systems, highlighting further the need for robust security measures.

Expert

  • Scott Best, Senior Director of Security Products, Rambus

Key Takeaways

  1. Securing Inference Models: Securing AI systems revolves around the protection of the inference model, which holds all the information the AI model was trained against. This model can be a potential target for adversaries or competitors, making it crucial to secure it whether it’s sitting in memory (data at rest) or being pulled into a chip (data in use).
  2. Hardware-Based AI Security: AI security needs to take place at the hardware level, and it’s up to chip manufacturers to implement a secure solution. This means securing data privacy and authenticity and making sure that these security measures do not hinder the system’s performance.
  3. Quantum Threats to Security: The advent of powerful quantum computers poses a threat to current public key cryptography. Systems being built today that are expected to be in the field for 5-10 years or more need to consider implementing quantum safe cryptography to ensure the privacy and authenticity of their data.
  4. Rambus Security IP: Rambus offers a broad portfolio of security IP that enables hardware-based security for AI silicon, as well as Root of Trust IP for data at rest protection, Inline Memory Encryption IP for data in use protection, and Quantum Safe Cryptography solutions to protect devices and data in the quantum era.
  5. AI-Driven Security Attacks: It’s possible that adversaries could potentially use AI to attack AI, particularly in power analysis side channel attacks where AI could be trained to find a small signal within a lot of noise. This highlights the need for robust security measures in AI systems.

Key Quote

In AI systems, there’s an inference model produced by a training system, and that inference model is then loaded into an AI chip, and that AI chip then executes that inference model. These inference models contain years of value to companies who created the training system and associated training data. If you’re an adversary or a competitor that wants to see what the “secret sauce” of a particular company is, then the inference model is of great interest.

Related Content

]]>
https://www.rambus.com/blogs/ask-the-experts-securing-ai/feed/ 0
Rambus Unveils PCIe 7.0 IP Portfolio for High-Performance Data Center and AI SoCs https://www.rambus.com/blogs/rambus-unveils-pcie-7-0-ip-portfolio-for-high-performance-data-center-and-ai-socs/ https://www.rambus.com/blogs/rambus-unveils-pcie-7-0-ip-portfolio-for-high-performance-data-center-and-ai-socs/#respond Wed, 12 Jun 2024 12:55:38 +0000 https://www.rambus.com/?post_type=blogs&p=64564 The relentless innovation in Artificial Intelligence (AI) and High-Performance computing (HPC) demands a cutting-edge hardware infrastructure capable of handling unprecedented data loads. To overcome these challenges and usher in a new era of performance, Rambus is proud to announce the launch of our PCI Express® (PCIe®) 7.0 IP portfolio, encompassing a comprehensive suite of IP solutions including:

  • PCIe 7.0 Controller designed to deliver the high bandwidth, low latency, and robust performance required for next-generation AI and HPC applications
  • PCIe 7.0 Retimer for highly-optimized, low-latency data path for signal regeneration
  • PCIe 7.0 Multi-port Switch that is physically aware to support numerous architectures
  • XpressAGENTTM to enable customers to rapidly bring-up first silicon

“The burgeoning landscape of data center chip manufacturers, driven by the emergence of novel data center architectures, necessitates the availability of high-performance interface IP solutions to foster a robust and thriving ecosystem,” said Neeraj Paliwal, SVP & GM of Silicon IP at Rambus. “The Rambus PCIe 7.0 IP portfolio addresses this challenge by delivering unparalleled bandwidth, low latency, and security features. These components work together to provide a seamless, high-performance solution that meets the rigorous demands of AI and HPC applications.”

Rambus PCIe 7.0 Controller IP key features include:

  • Supports PCIe 7.0 specification including 128 GT/s data rate
  • Implementation of low-latency Forward Error Correction (FEC) for link robustness
  • Supports fixed-sized FLITs that enable high-bandwidth efficiency
  • Backward compatible to PCIe 6.0, 5.0, 4.0, etc.
  • State-of-the-art security with an IDE engine
  • Supports AMBA AXI interconnect
PCIe 7.0 Controller IP Block Diagram
PCIe 7.0 Controller IP Block Diagram

Rambus PCIe 7.0 Retimer IP key features include:

  • Supports PCIe 7.0 specification x2 to x16 lanes
  • Pre-integrated Xpress Agent debug analysis IP
  • Highly-configurable equalization algorithms with adaptive behaviors
  • Power modes and intelligent clock gating to best manage controller IP
PCIe 7.0 Retimer IP Block Diagram
PCIe 7.0 Retimer IP Block Diagram

Rambus PCIe 7.0 Switch IP key features include:

  • Highly scalable up to 32 ports configurable external or internal endpoints
  • Physically aware to account for port placements across large die
  • Superior performance through non-blocking architecture
  • Allows seamless migration from FPGA prototyping design to ASIC/SoC production design with the same RTL
PCIe 7.0 Switch Block Diagram
PCIe 7.0 Switch Block Diagram

Rambus PCIe XpressAGENT key features include:

  • Non-intrusive, intelligent, in-IP debug/logic analyzer for PCIe Controller, Retimer and Switch IP enabling rapid first-silicon bring-up
  • Integrates with any PIPE compliant SerDes
  • Provides unified access to PHY, MAC and Link Layers locally or remotely via a CPU-agnostic API
  • Provides pre-emptive monitoring and diagnosis via remote access for infield products

In addition to the PCIe IP portfolio, Rambus also offers industry-leading interface IP for HBM, CXL, GDDR, LPDDR, and MIPI. For more information, visit www.rambus.com/interface-ip.

]]>
https://www.rambus.com/blogs/rambus-unveils-pcie-7-0-ip-portfolio-for-high-performance-data-center-and-ai-socs/feed/ 0