DRAM Archives - Rambus

GDDR6 DRAM Signal Integrity: Taking on The New Design Challenges

Rambus Press — Mon, 11 Feb 2019 15:28:21 +0000

Signal integrity (SI) is a perennial issue with system and SoC designers. As design engineers begin to launch into next generation systems with GDDR6 DRAMs, they’ll encounter new rounds of signal integrity issues not faced before. Rambus experts are currently taking a hard look at the issue to come up with answers to this difficult problem that involves a host of associated contributors to signal integrity.

The Signal Integrity Journal defines it as: “Signal integrity covers all the issues about single ended and differential signal propagation from the transmitter to the receiver, including problems such as impedance control, discontinuities, reflections, topology, terminations, losses, ISI, jitter, eye diagrams, cross talk and ground bounce.”

Having said that, with GDDR6, signal integrity goes well beyond the chip and core levels as most savvy SI experts know. Today, GDDR6 DRAM vendors and their technical research staffs are placing major emphasis investigating not only chip and core signal integrity problematic areas, but also going into the printed circuit board (PCB) design.

PCB design questions arising deal with the correct surface finishes and whether to use blind or buried vias and if so, what effects are there on SI and the design tradeoffs that have to be made. Plus, there are a multitude of questions dealing with PCB design when it comes to SI.

But, for starters, Rambus SI experts say there are three main SI losses. Those are insertion loss, reflections, and crosstalk.

Insertion loss is due to dielectric loss or metal surface roughness. The more surface roughness, the more there is a DC-like loss. At high frequency, that loss increases, and from the dielectric side, dielectric loss tangents cause attenuation on the signal. However, the receiver can control this type of loss. Usually, there are amplifiers, gains, and filters like continuous time linear equalization (CTLE), which can correct this loss.

Reflections occur when the signal goes through the fabric. It transitions from the chip to the package, onto the PCB) and then on to the DRAM package. Through these transitions, the signal goes from a bump to the trace on the package, and then through the vias, and then through the ball-grid array (BGA) package and on to the PCB.

Also, it’s inevitable that the signal goes over a void. A void is found in a BGA ball solder joint to the BGA interface, or in the solder joint to the PCB interface. It causes an impedance discontinuity, which in turn causes the signal to reflect. These reflections cause the eye to eye closure and loss of signaling.

But citing these three losses is but the tip of the iceberg when it comes to GDDR6 DRAM signal integrity issues designers will be confronted with. There’s more. So, stay tuned.

Saving power with HBM

Rambus Press — Mon, 28 Nov 2016 17:14:44 +0000

Ed Sperling of Semiconductor Engineering notes that power has always been a “global concern” in the design process because it affects every part of a chip. Nevertheless, partitioning for power rather than functionality or performance has not, historically, been seriously considered, although the status quo is beginning to change.

For example, says Sperling, the increasing use of system partitioning into multiple chips connected by high-speed buses rather than putting everything on a single chip offers some interesting possibilities for managing power.

Read first our primer on:
HBM2E Implementation & Selection – The Ultimate Guide »

According to Kelvin Low, senior director of foundry marketing at Samsung, system architects are now looking at power management in a different way rather than simply relying on silicon technology.

“You can partition a system to achieve system-level performance scaling,” Low told SemiEngineering. “So if you use a 2.5D approach with HBM2 (second-generation High-Bandwidth Memory), the system-level performance increases. It becomes a partition problem, but the distributed processing approach is an important enabler.”

As Sperling points out, this approach has a bearing on power as well, because it takes less power to drive signals through an interposer than through increasingly narrow wires on a single die at advanced nodes. As a result, there are significant power savings in addition to performance increases.

Frank Ferro, a senior director of product management for memory and interface IP at Rambus, expressed similar sentiments.

“One of the advantages of HBM2 is that it is that you can move it closer to the processing, and you have 2 gig (gigatransfers/second per pin) rates,” Ferro told the publication. “The power of HBM2 is lower, too, and you can re-use quite a bit of technology. But it does require a new PHY design.”

As Ferro explained in a Semiconductor Engineering article earlier this year, HBM bolsters local available memory by placing low-latency DRAM closer to the CPU. In addition, HBM DRAM increases memory bandwidth by providing a very wide interface to the SoC of 1024 bits. This means the maximum speed for HBM2 is 2Gbits/s for a total bandwidth of 256Gbytes/s. Although the bit rate is similar to DDR3 at 2.1Gbps, the 8, 128-bit channels provide HBM with about 15X more bandwidth.

Perhaps not surprisingly, mass-market deployment of HBM will present the industry with a number of challenges. This is because 2.5D-packaging technology, along with a silicon interposer, increases manufacturing complexities and cost. In addition, HBM routes thousands of signals (data + control + power/ground) via the interposer to the SoC (for each HBM memory used). Clearly, maximal yields will be critical to making HBM cost effective, especially since there are a number of expensive components being mounted to the interposer, including the SoC and multiple HBM die stacks.

Nevertheless, even with the above-mentioned challenges, the advantage of having – for example – four HBM memory stacks, each with 256Gbytes/sec in close proximity to the CPU, provides both a significant increase in memory density (up to 8Bb per HBM) and bandwidth when compared with existing architectures.

Interested in learning more? The full text of “Partitioning for Power” by Ed Sperling is available on Semiconductor Engineering here.

Rambus inks license agreement with Xilinx

Rambus Press — Mon, 03 Oct 2016 16:01:52 +0000

Rambus has signed a license agreement with Xilinx that covers Rambus’ patented memory controller, SerDes and security technologies.

In addition, the two companies have agreed to evaluate potential collaboration on the use of Rambus’ CryptoManager platform, with Rambus also exploring the use of Xilinx FPGAs in its Smart Data Acceleration (SDA) research program.

“As a leader in the FPGA space, Xilinx has built compelling solutions that are necessary for the growing acceleration needs in the data center,” said Rambus CEO Dr. Ron Black. “Through collaboration, we also see great potential for our CryptoManager platform to serve as the secure foundation that enables remote, dynamic activation of features once the devices are deployed in the field. We look forward to the possibilities of engaging in these programs with the Xilinx teams and providing innovative solutions to our shared customers.”

As we’ve previously discussed on Rambus Press, the CryptoManager security platform creates a trusted path from the SoC manufacturing supply chain to downstream service providers with a complete silicon-to-cloud security solution. CryptoManager includes a Security Engine, which is a flexible root-of-trust implemented as hardware or software, for secure provisioning, configuration, keying and authentication throughout the lifecycle of a device. A local and cloud-based CryptoManager Infrastructure and Trusted Provisioning Services support the Security Engine, offering chipmakers, device OEMs, secure application developers and service providers a scalable and flexible trust management solution.

Meanwhile, the SDA research program focuses on architectures designed to offload computing closer to very large data sets at multiple points in the memory and storage hierarchy. Potential use case scenarios include real-time risk analytics, ad serving, neural imaging, transcoding and genome mapping. Comprising software, firmware, FPGAs and significant amounts of DRAM, the SDA platform operates as an effective test bed for new methods of optimizing and accelerating analytics in extremely large data sets. As such, the SDA’s versatile combination of hardware, software, firmware, drivers and bit files can be precisely tweaked to facilitate architectural exploration of specific applications.

Put simply, the SDA – powered by an FPGA paired with 24 DIMMS – offers high memory densities linked to a flexible computing resource. Currently, the SDA’s base extensible command set is targeted at accelerating and offloading the transformation of common data structures such as those found in Big Data analytics applications. However, the Smart Data Acceleration platform could ultimately be made available over a network where it would serve as a key offload agent in a more disaggregated scenario.

Interested in learning more? You can check out our article archive on CryptoManager here and the SDA research program here.

Smart Data Acceleration with FPGAs and DRAM

Rambus Press — Thu, 18 Aug 2016 16:23:45 +0000

The proliferation of connected devices has significantly increased the amount of data being captured, moved and analyzed. This trend is expected to continue well into the foreseeable future as the rapidly burgeoning Internet of Things (IoT) ramps up. Perhaps not surprisingly, the exponential increase in data has created a number of new bottlenecks in data centers, prompting the industry to examine fresh approaches to system architecture.

Currently, data centers aggregate numerous individual servers into a pool of processing units. Large, data-intensive tasks are distributed across multiple racks of servers. However, this one size fits all approach, typically characterized by a relatively fixed amount of compute, memory, storage and I/O resources in each server, frequently leads to an acute under-utilization of resources. This is because specific tasks may require a tailored amount of each compute resource in real-time. Simply put, the legacy server architecture contributes to low CPU utilization rates, high latencies to access data, reduced power efficiency and increased TCO.

According to Steven Woo, VP of Systems and Solutions at Rambus, two of the most important issues facing systems today are the impact of moving data over long distances to CPUs, and the inherent difficulty of optimizing the performance and power efficiency of data processing.

“This is why we launched our Smart Data Acceleration (SDA) Research Program. We want to address these and other issues by rethinking how systems should be architected in the future,” Woo told Rambus Press during a recent interview in Sunnyvale. “As part of this program, we’ve created the SDA engine – which pairs an FPGA with large capacities of DRAM.”

Essentially, says Woo, the FPGA provides flexible acceleration and offload capabilities, while the platform’s significant memory capacity enables low latency access to large amounts of data. Coupling the FPGA with high memory capacity minimizes data movement by bringing processing resources to the data, allowing applications to benefit effectively from near data processing.

As Woo confirms, the SDA program is currently focused on optimizing the performance and power efficiency of data-intensive workloads for servers and data centers.

“The HPC community – in particular – has identified a number of challenges related to accelerating performance and improving power efficiency. Initiatives to address these issues include the Exascale Computing Project, FastForward and DesignForward,” he said. “With a focus on dramatic improvements in these critical metrics, increasing emphasis is being placed on memory and storage hierarchies to optimize future systems for evolving workloads and tasks. Of course, such improvements will ultimately benefit standard data center workloads as well.”

Woo describes the FPGA and software architecture of the SDA engine as a flexible environment that allows engineers to experiment with near data processing while exploring the interaction between application software, drivers, firmware, FPGA bitfiles and memory. The software layer enables the SDA engine to present itself to the rest of the system in various configurations, including as an ultra-fast solid-state disk, a Key-Value store and a large pool of memory.

“This means the SDA engine can be used across a wide range of applications that require high memory capacity, including transaction processing, in-memory databases, financial services, real-time analytics and risk analysis, imaging and transcoding,” Woo continued. “The versatility of the SDA platform also facilitates a continuum of integration strategies that balance ease of integration with performance improvement.”

For example, says Woo, acting as an ultra-fast solid-state disk, the SDA engine can integrate with existing systems in a matter of minutes by simply loading a driver and mounting the device. Applications can also be modified to take full advantage of the acceleration and offload capabilities of the SDA engine to achieve higher performance gains.

“Testing of the SDA platform configured as an ultra-fast solid state disk confirms higher IOPS rates at much lower latencies – with significantly better latency under load – compared to state-of-the-art Enterprise NVMe SSDs,” Woo concluded. “Across a range of 4KB workloads, the SDA engine can deliver 1M IOPS at latencies of 30 μs. Coupled with PCIe-based switches, multiple SDA engines can work together to provide scalable performance in a compact form factor.”

Interested in learning more about our SDA platform? You can check out our research program page here.

ChipEstimate and Rambus look beyond DDR4

Rambus Press — Tue, 16 Aug 2016 16:47:13 +0000

Frank Ferro, a senior director of product management at Rambus, has penned an article for ChipEstimate about the future of DRAM in the age of the IoT. According to Ferro, the semiconductor industry has traditionally relied on Dennard Scaling and Moore’s Law to ensure the creation of ever more advanced process nodes at a steady cadence.

“However, development costs at each advanced node continue to multiply as Moore’s Law begins to slow and Dennard Scaling fades into the distant past,” he explained.

“Consequently, many in the semiconductor industry are taking a closer look at the advantages of refining the silicon design process at an architectural level, rather than relying primarily on process geometries to solve thorny problems.”

For example, says Ferro, there are a number of distinct physical design challenges associated with architecting higher bandwidth memory and faster PHYs that can no longer be addressed by advanced process nodes alone.

“Indeed, the current generation of DDR4 memory deployed in datacenters runs at 2.4Gbps. The maximum speed grade – 3.2Gbps – is expected to start shipping later this year (2016),” he continued. “Perhaps not surprisingly, achieving a top speed of 3.2Gbps has introduced a number of challenges for both SoC and system designers. More specifically, as memory speeds exceed 2.4Gbps, precise signal integrity analysis of the memory channel is required. This is why there are only a handful of companies with working 3.2Gbps prototype hardware capable of supporting real-world server requirements.”

Over the next five years, says Ferro, server memory will likely demand a 33% increase in bandwidth capability per year to keep pace with processor improvements and avoid serious system bottlenecks. Simply put, DRAM of all variants will have to achieve speeds of over 12Gbps by 2020 for optimal performance.

“Although this figure represents a 4X performance increase over the current DDR4 standard, Rambus Beyond DDR4 silicon has demonstrated that even traditional DRAM signaling still has ample headroom for growth. Such speeds, within reasonable power envelopes, are indeed possible,” Ferro explained. “For example, Rambus’ Beyond DDR4 demo silicon offers a 25% improvement in power efficiency while hitting data transfer rates up to 6.4Gbps in a multi-rank, multi-DIMM configuration. This means the memory interface is three times faster than current DIMMs topping out at 2.133Gbps – and two times the maximum speed specified for DDR4 at 3.2Gbps.”

The 25% power savings, says Ferro, can be attributed to several factors. Firstly, the low-swing signaling reduces the I/O power required on the interface. This design is also ‘asymmetric,’ meaning that complex timing and the equalization circuits are all implemented in the PHY, thus greatly simplifying the DRAM interface and reducing cost. Removing complex timing circuits such as PLLs and DLLs from the DRAM makes it extremely agile, facilitating the rapid entrance and exit from power down mode. Because the memory controller is the originator of all memory requests, it is capable of implementing an aggressive and granular DRAM power management scheme.

Interested in learning more? The full text of “Looking Beyond DDR4 in the Age of the IoT” by Frank Ferro can be read on ChipEstimate here, while our Beyond DDR4 page is available here.

A look back at the Nintendo 64 (N64)

Rambus Press — Mon, 27 Jun 2016 15:36:27 +0000

The long-awaited Nintendo 64 hit the hot neon city streets of Japan back in June 1996. Powered by a 64-bit NEC VR4300 CPU clocked at 93.75 MHz, the fifth generation console was one of the first to implement a unified memory subsystem and packed 4 megabytes of Rambus RDRAM (subsequently expandable to 8MB).

Image Credit: Wikipedia (via Evan-Amos)

According to Rambus Chief Scientist Craig Hampel, the N64 was also the first console to bring workstation-class graphics to the consumer. In addition, the N64 motherboard comprised just 2 layers – using co-planar waveguides – with the memory interface operating at 500MHz, which was 10x faster than any other DRAM at the time.

“For Rambus engineers, our primary challenge was to provide a significant amount of bandwidth with a relatively small amount of memory,” Hampel explained.

Image Credit: Wikipedia (via Yaca2671)

“We managed to do so and helped change gaming forever. The N64 implemented the first high volume application of RDRAM, which was designed, architected and developed by a tiny startup in Mountain View, California.”

As Hampel notes, he experienced an intense ‘aha moment’ about the significance of the N64 when playing Wave Race.

“It was the very first time a game actually felt real to me. It was spectacular, with amazing lighting models, physics and realistic 3D,” he said.

“Initially, at least, I don’t think many of us realized how the N64 was going to redefine gaming forever. However, the Nintendo 64 ultimately became one of the highest revenue consumer electronics products in prior history. In fact, mine still works, and every 6 months or so I fire it up with Wave Race or Super Mario and quietly say to myself ‘Wow!’”

Indeed, as Devin Coldewey of TechCrunch recently opined, the N64 was a “rock-solid” gaming console.

“[It] not only made a lot of us very happy for years and years, but also did important work in the history of gaming. It brought us classic titles that pushed the boundaries of what was expected of games, and made 3D worlds fundamental and integral with gameplay ideas rather than set dressing,” he reminisced.

“If Mario 64 proved that 3D games could be great, the rest of the N64’s lineup showed that 3D could be used in surprising and powerful ways. Wave Race 64 brought phenomenal water physics that wouldn’t be surpassed for years. Mario Kart 64 brought depth and verticality to madcap racing (though I still prefer the original). Ocarina of Time had you exploring a world almost too huge and complex to comprehend.”

Of course, the N64 was not the only console to use Rambus memory, as the Sony Playstation® 2 (PS2) and Playstation® 3 (PS3) also helped to define a new generation of gaming – while significantly raising the performance bar for future systems. Indeed, memory solutions and related innovations developed by Rambus engineers helped advance 3D realism across a number of gaming platforms.

Image Credit: Wikipedia (via Evan-Amos)

“In short, RDRAM played a critical role in defining what was possible in a video game – and I’m not just talking about photo quality realism for the sake of graphics alone,” Hampel concluded. “Rather, it helped push the limit in terms of rendering more accurate, physics-based effects, such as interacting with running water and depicting realistic vehicle impacts, complete with explosions and scattering debris.”

Exploring 2.5D packaging and beyond

Rambus Press — Mon, 02 May 2016 16:13:10 +0000

Frank Ferro, a Senior Director of Product Marketing at Rambus, recently participated in a Semiconductor Engineering roundtable discussion about 2.5D and advanced packaging.

According to Ferro, 2.5D can succeed if customer demand overcomes the additional engineering costs associated with the packaging process.

“Back in the mobile days when we started seeing 3D packaging, it was because we needed space. We needed to get more memory into a smaller footprint,” he explained. “Today we’re seeing bandwidth as a driver in the case of high bandwidth memory (HBM). There’s a need to create another tier in the hierarchy, so customers are interested in looking at the cost tradeoffs of 2.5D using silicon interposer, and HBM versus traditional DRAM. Are the economics there? Yes, for the people who really want it and need it. For the masses, we still have a way to go.”

Ferro also commented on the power aspect of 2.5D packaging.

“Power is important. If you look at HMC (Hybrid Memory Cube), it was really hot two years ago, but has fizzled since then. By serializing all those signals you need high-speed SerDes,” he continued. “And then you have to look at the power of high-speed SerDes, versus HBM, which is wide and relatively slow. Power in HMC might have been less complex when seen from a 2D to 3D evolution standpoint because it was similar, but HBM won out because of lower power and lower complexity.”

Perhaps most importantly, says Ferro, it is essential for the industry to fully understand the 2.5D supply chain.

“If you’re just delivering a chip, then you can ship that chip to the customer. But now you’ve got a memory vendor, an SoC vendor, and an interposer vendor,” Ferro explained. ”How do you test that memory? If something breaks, who’s responsible for it? Now there are pins you can’t physically get to anymore. Three companies have to work together, so you have to get all these companies talking together.”

Simply put, there is a real problem if something goes wrong.

“You’re responsible to the end customer, but in your supply chain, you are still subject to those effects. You make it easier for your end customer, but you still have to deal with it,” he added.

As we’ve previously discussed on Rambus Press, HBM design and implementation can be challenging, as 2.5D-packaging technology inevitably adds various manufacturing complexities, along with silicon interposer costs. To be sure, there are numerous expensive components mounted to the interposer, such as the SoC and multiple HBM devices. Another significant challenge involves routing thousands of signals (data + control + power/ground) via the interposer to the SOC for each HBM memory used.

Despite the above-mentioned challenges, HBM offers a number of distinct capabilities for a new digital age dominated by the IoT. These include moving memory closer to the CPU, while increasing both density and bandwidth. Indeed, HBM takes advantage of existing technologies to create another tier of memory, thus bolstering server memory architecture.

Interested in learning more about 2.5D and advanced packaging? The full text of the roundtable discussion is available on Semiconductor Engineering here (Part 1) and here (Part 2).

From consoles to VR

Rambus Press — Thu, 31 Mar 2016 16:23:36 +0000

The Atari 2600 (or VCS) – which hit the nascent video game market back in 1977 – packed 128 bytes RAM and an 8-bit MOS 6507 CPU clocked at a mere 1.19 MHz. According to Wikipedia, the RAM was tasked with handling run-time data, which included the call stack and the state of the game world. There was no frame buffer.

Image Credit: Wikipedia (via Evan-Amos)

Fast-forward to 1996 and the launch of Nintendo’s N64. Powered by a 64-bit NEC VR4300 CPU clocked at 93.75 MHz, the fifth generation console was one of the first to implement a unified memory subsystem and packed 4 megabytes of Rambus RDRAM (subsequently expandable to 8MB).

Image Credit: Wikipedia (via Yaca2671)

Driven by Moore’s Law and the demands of gaming enthusiasts, consoles were rapidly becoming ever more sophisticated as they pushed – and sometimes even shattered – the limits of bandwidth, capacity and graphics. By the time 2006 rolled around, Sony’s PlayStation® 3 boasted a 3.2 GHz Cell Broadband Engine with 1 PPE & 7 SPEs, 256 MB of XDR DRAM and 256 MB GDDR3 DRAM.

Image Credit: Wikipedia (via Evan-Amos)

A decade later, all eyes in the gaming world are fixed on virtual reality (VR) headsets such as Facebook’s Oculus Rift, Samsung’s Gear VR (powered by Oculus), HTC’s Vive and Sony’s PlayStation VR. Perhaps not surprisingly, implementation and requirements vary wildly for each device. For example, Samsung’s Gear VR is designed to work with a compatible Galaxy device, which acts as the headset’s display and processor. Meanwhile, the actual Gear VR unit is designated as a controller, as it includes a high field of view, as well as an inertial measurement unit (IMU) for rotational tracking.

Image Credit: Oculus Rift

In contrast, system requirements for the Rift (which is contingent upon a PC) stipulate an NVIDIA GTX 970 or AMD 290 GPU, Intel i5-4590, 8GB RAM and HDMI 1.3 video output supporting a 297MHz clock.

“A traditional 1080p game at 60Hz requires 124 million shaded pixels per second. In contrast, the Rift runs at 2160×1200 at 90Hz split over dual displays, consuming 233 million pixels per second,” Oculus’ Atman Binstock explained in a recent blog post.

[youtube https://www.youtube.com/watch?v=amtBUkmHS0w]

“At the default eye-target scale, the Rift’s rendering requirements go much higher: around 400 million shaded pixels per second. This means that by raw rendering costs alone, a VR game will require approximately 3x the GPU power of 1080p rendering.”

In the future, says Binstock, successful consumer VR will likely drive changes in GPUs, OSs, drivers, 3D engines, and apps, ultimately enabling much more efficient low-latency VR performance.

“It’s an exciting time for VR graphics, and I’m looking forward to seeing this evolution,” he added.

Indeed, VR has certainly come a long way since 1991, when the $60,000 Virtuality 1000CS made its way into the arcade scene. The unit featured an HMD to display video and play audio, while players moved and used a 3D joystick to interact with the VR world. According to Tom’s Hardware, the system relied upon an Amiga 3000 to handle most of the game processing.

“Gaming may have significantly evolved over the years, but there is one constant that remains unchanged. Players are always seeking a more immersive experience, enabled by improvements in AI and more realistic and responsive graphics,” Steven Woo, Vice President of Systems and Solutions at Rambus, explained. “As such, gaming continues to be at the forefront, pushing the very limits of numerous technologies, including memory, processing and graphic capabilities.”

Woo, an engineer who participated in the development of memory technologies adopted in Sony’s PlayStation 2 and PlayStation 3 game consoles, also pointed out that VR, although quickly evolving, is still in a relatively nascent stage.

“As Oculus’ Atman Binstock noted, successful consumer VR is likely to drive changes in GPUs, OSs, drivers, 3D engines, and apps, ultimately enabling much more efficient low-latency VR performance,” Woo added. “I’m looking forward to seeing how VR will ultimately take advantage of new memory technology as it evolves over the next few years.”

Balancing cores and memory with Smart Data Acceleration

Rambus Press — Tue, 29 Mar 2016 16:08:46 +0000

Ed Sperling of Semiconductor Engineering recently noted that adding more cores to a processor doesn’t necessarily improve system performance. In fact, designing the wrong size or type of core may actually waste power.

“This has set the stage for a couple of broad shifts in the semiconductor industry,” Sperling explained. “Memory architectures can play an important role here. Most of the current approaches use on-chip SRAM and off-chip DRAM. But different packaging options, coupled with different memory architectures, can change the formula.”

According to Steven Woo, VP of Solutions Marketing at Rambus, memory is certainly capable of “rebalancing” system architectures and addressing bottlenecks.

“You can aggregate memory and make that available to a processor, which allows you to use less processors in a system,” Woo told the publication. “If you look at a data center, CPUs are often heavily underutilized, sometimes to the tune of 10% utilization. Multicore CPUs often have trouble getting enough memory capacity to keep the cores working, causing cores to be starved out. If you have enough memory capacity, it means you can improve CPU utilization to the point that you potentially need to purchase fewer CPUs.”

As Woo previously pointed out, the industry is experiencing a move towards near-data processing, where the data sets are so large that it’s actually cheaper to move the processing closer to the data than to move the data closer to the processing.

“You used to drag data to the processor, sometimes over long distances from storage arrays connected via slower networks. This works when there isn’t much data to be processed. But now that we’re processing data sets that are terabytes to petabytes in size, moving the data to the processing is a major bottleneck. It’s much more efficient to move the computation to the data,” he said. “Semantic awareness is another method that helps to minimize data movement, by allowing processing elements close to the data to understand the structure of that data and process it in a meaningful way without needing to move it to a server first.”

This is precisely why Rambus’ Smart Data Acceleration (SDA) research platform focuses on architectures designed to offload computing closer to very large data sets at multiple points in the memory and storage hierarchy. Potential use case scenarios include in-memory databases, real-time risk analytics, ad serving, neural imaging, transcoding and genome mapping.

Comprising software, firmware, FPGAs and significant amounts of memory, the platform operates as an effective test bed for new methods of optimizing and accelerating analytics in extremely large data sets. As such, the SDA’s versatile combination of hardware, software, firmware, drivers and bit files can be tailored to facilitate architectural exploration and optimization of specific applications.

Interested in learning more? You can check out our official Smart Data Acceleration page here.

Architecting new memory for the IoT

Rambus Press — Mon, 28 Mar 2016 15:51:25 +0000

The once indefatigable Moore’s Law is beginning to slow, even as data, driven by a burgeoning Internet of Things (IoT), continues to increase exponentially. Consequently, a slew of new memory architectures, including those utilizing 2.5D and 3D packaging, are evolving to meet the demands of a new digital age.

Nevertheless, As Ed Sperling of Semiconductor Engineering recently pointed out, there are still more questions than answers about the future of memory, perhaps due to the salient lack of an obvious successor to DDR4.

“[It is unclear] which type of memories to use for what, how they should be packaged and used, and how those new memories will impact data storage further downstream at the disk level,” he explained. “What comes next may be a new memory type, or it may be a new architectural approach using the same technology [as DRAM].”

According to Frank Ferro, a Senior Director of Product Management at Rambus, potential solutions and directions for a beyond DDR4 paradigm includes leveraging existing memory system I/O and architectures to support higher frequencies and multiple memory types on the DDR channel.

“The next generation memory needs to consider advanced I/O techniques; new data bus topologies and the use of improved, lower swing, power efficient, single-ended signaling to reduce bottlenecks,” he said.

Rambus, notes Ferro, already has a prototype memory interface system running at 6.4Gbps (@2 DIMMs per channel), which is more than 2x existing DDR4 data rates. “By doubling the speed of the memory interface and increasing DIMM performance, Rambus has demonstrated that there still is strong roadmap for traditional DDR interfaces.”

In addition, Rambus continues to actively participate in industry conversations about various trends, such as 2.5D/3D packaging and high bandwidth memory (HBM), the latter of which stacks up to 8-DRAM dies.

“From our perspective, 2.5D and 3D packaging is primarily being driven by HBM, which is designed for use in server and network devices,” Ferro explained. “At this point in time, the cost-benefit of HBM varies based on specific use cases, such as those that demand higher DRAM density.”

In turn, says Ferro, HBM is being driven by an insatiable need for more bandwidth by bringing to memory closer to the processor.

“The maximum speed for HBM is 2Gbits/s per pin – for a total bandwidth of 256Gbytes/s,” he confirmed. “And while the bit rate may be somewhat similar to DDR3 at 2.1Gbps, the 8, 128-bit channels gives HBM approximately 15x more bandwidth.”

As Ferro emphasizes, HBM design and implementation can also be challenging, as 2.5D-packaging technology inevitably adds various manufacturing complexities, along with silicon interposer costs.

“There are numerous expensive components mounted to the interposer, such as the SoC and multiple HBM devices,” said Ferro. “Another significant challenge involves routing thousands of signals (data + control + power/ground) via the interposer to the SOC for each HBM memory used. Therefore, a good yield is certainly a critical factor in making the system cost effective.”

Despite the above-mentioned challenges, says Ferro, HBM offers a number of distinct capabilities for a new digital age dominated by the IoT. These include moving memory closer to the CPU, while increasing both density and bandwidth.

“In short,” Ferro added, “HBM takes advantage of existing technologies to create another tier of memory, thus bolstering the overall server memory architecture. At the same time, continued enhancements are needed to the underlying memory and system topology to provide even greater performance as we look out to 2019 and beyond.”