bandwidth Archives - Rambus

Optimizing data centers with DDR4 buffer chips

Rambus Press — Wed, 21 Dec 2016 16:47:25 +0000

DDR4 memory delivers up to 1.5x performance improvement over DDR3, running at 2.4Gbps- 3.2 Gbps, while reducing power by 25% on the memory interface. However, the shift to higher speeds degrades electrical signal integrity, especially when multiple modules are added to a system. Consequently, achieving higher capacities at more advanced speeds has become quite challenging.

To overcome these limitations, specialized clocks and dedicated DDR4 memory buffer chips have been integrated onto DIMMs. Put simply, buffer chips allow server designers to maintain high-speeds with DDR4, while enabling the higher capacity demanded by Big Data applications. More specifically, multiple loads (DRAM devices) tend to reduce the maximum speed which the bus can reach.

Buffer chips help improve system signal integrity by reconditioning the signal coming from the CPU and forwarding it to DRAM, thereby enabling higher operating data rates. In addition, buffer chips facilitate optimized RAS (row address strobe), with the silicon verifying correctness of commands and data.

As expected, DDR4 buffer chips offer several distinct advantages over the previous generation (DDR3), including faster speeds, higher usable bandwidth, larger device density and more banks (16). In addition, DDR4 defines CID for up to 8 high stack addressing, with smaller DDR4 row sizes (x4 devices) lowering power sipping and improving performance for multi-threaded applications.

Moreover, DDR4 buffer chips offer lower voltage consumption at 1.2v, uses (POD) VDDQ termination, reduces I/O current draw and eliminates the need for a voltage pump with VPP. In terms of specific RAS improvements, DDR4 buffer chips support register parity checks and command blocks, optional CRC usage for high data rates, boundary scans (connectivity test mode), MRS readout via MPR and native ECC support for DDR4 SoDIMM.

Clearly, increased memory capacity and performance are critical for data centers tasked with solving today’s complex problems with the storage and analysis of huge data sets. This is precisely why our DDR4 Data Buffer (DB) — iDDR4DB2-GS02 — enables DDR4 Load Reduced Dual Inline Memory Modules (LRDIMMs) to deliver high-bandwidth performance (when combined with our DDR4 RCD) with twice the capacity of DDR4 Registered DIMMs (RDIMMs).

Designed to meet the demanding requirements for real-time, memory-intensive applications, the DB delivers enhanced performance and margin at 2400 Mbps with built-in support for future data rates up to 2666 Mbps. This facilitates the highest speeds and robust operation when multiple LRDIMMs populate the memory channel for the highest system capacities.

The iDDR4DB2-GS02 dual 4-bit bidirectional data register with differential strobes is designed for 1.2 V VDD operation. The device has a dual 4-bit host bus interface connected to a memory controller and a dual 4-bit DRAM interface that is connected to two x4 DRAMs. It also has an input-only control bus interface that is connected to a DDR4 Register. This interface consists of a 4-bit control bus, two dedicated control signals, a voltage reference input and a differential clock input.

All DQS inputs are pseudo-differential with an internal voltage reference, while all DQ outputs are VDD terminated drivers optimized to drive single or dual terminated traces in DDR4 LRDIMM applications. The differential DQS strobes are used to sample the DQ inputs and are regenerated in the DDR4 DB for driving out the DQ outputs on the opposite side of the device. The iDDR4DB2-GS02 also supports dedicated pins for ZQ calibration and for parity error alerts.

Interested in learning more? You can check out our server DIMM chipsets product page here.

Intel says DDR4 is ramping quickly

Rambus Press — Wed, 24 Aug 2016 16:36:48 +0000

Last week at IDF 2016, Intel executive Geof Findley presented a comprehensive overview of the memory industry ecosystem. According to Findley, DDR4 is ramping quickly and should hit 31% of shipments during the second quarter of 2016.

With volume shipments kicking off in 2014, almost all servers are now shipping with DDR4, while most PCs will ship with DDR4 by the end of 2016. In addition, says Findley, DDR4 volume and a price crossover should occur in the first half of 2016, with the upcoming 8GB transition tied to DDR4 adoption.

Image Credit: Intel

As Findley notes, the transition to DDR4 is “inevitable,” with the latest iteration of DDR offering lower power versus DDR3 and higher bandwidth headroom. More specifically, DDR4 provides up to 35% power savings compared to DDR3L; up to 100% performance boost in bandwidth in life of [the] product; 2X density; 8Gb over 4Gb and ultimately 16Gb.

As we’ve previously discussed on Rambus Press, server memory buffer chipsets are playing a critical role in high-speed DDR4 designs, as they allow system architects to fully exploit the high-speeds offered by DDR4 – while also enabling capacity designs that current Big Data applications demand.

Real world benefits of DDR4 buffer chips include time savings that can be measured in nanoseconds, as well as optimized signal integrity facilitated by shorter trace lines. Both translate into improved performance for a wide range of time-sensitive applications, including analytics and real-time language translation via Cloud-based services.

This is precisely why Rambus’ DDR4 chipsets for RDIMM and LRDIMM server modules are designed to deliver top-of-the- line performance and capacity needed to meet the growing demands placed on enterprise and data center systems. More specifically, our JEDEC-compliant DDR4 chipsets feature industry-leading I/O performance and margin, while utilizing advanced power management techniques.

Additional key DDR4 server DIMM chipset features include support for DDR4 up to 2666 Mbps, multi-setting frequency-based power optimization, an operating temperature range of -40° C – 125°, full ROHS compliance and improved ESD/EOS beyond JEDEC requirements.

Interested in learning more about Rambus’ server DIMM chipsets? You can check out our official product page here.

ChipEstimate and Rambus look beyond DDR4

Rambus Press — Tue, 16 Aug 2016 16:47:13 +0000

Frank Ferro, a senior director of product management at Rambus, has penned an article for ChipEstimate about the future of DRAM in the age of the IoT. According to Ferro, the semiconductor industry has traditionally relied on Dennard Scaling and Moore’s Law to ensure the creation of ever more advanced process nodes at a steady cadence.

“However, development costs at each advanced node continue to multiply as Moore’s Law begins to slow and Dennard Scaling fades into the distant past,” he explained.

“Consequently, many in the semiconductor industry are taking a closer look at the advantages of refining the silicon design process at an architectural level, rather than relying primarily on process geometries to solve thorny problems.”

For example, says Ferro, there are a number of distinct physical design challenges associated with architecting higher bandwidth memory and faster PHYs that can no longer be addressed by advanced process nodes alone.

“Indeed, the current generation of DDR4 memory deployed in datacenters runs at 2.4Gbps. The maximum speed grade – 3.2Gbps – is expected to start shipping later this year (2016),” he continued. “Perhaps not surprisingly, achieving a top speed of 3.2Gbps has introduced a number of challenges for both SoC and system designers. More specifically, as memory speeds exceed 2.4Gbps, precise signal integrity analysis of the memory channel is required. This is why there are only a handful of companies with working 3.2Gbps prototype hardware capable of supporting real-world server requirements.”

Over the next five years, says Ferro, server memory will likely demand a 33% increase in bandwidth capability per year to keep pace with processor improvements and avoid serious system bottlenecks. Simply put, DRAM of all variants will have to achieve speeds of over 12Gbps by 2020 for optimal performance.

“Although this figure represents a 4X performance increase over the current DDR4 standard, Rambus Beyond DDR4 silicon has demonstrated that even traditional DRAM signaling still has ample headroom for growth. Such speeds, within reasonable power envelopes, are indeed possible,” Ferro explained. “For example, Rambus’ Beyond DDR4 demo silicon offers a 25% improvement in power efficiency while hitting data transfer rates up to 6.4Gbps in a multi-rank, multi-DIMM configuration. This means the memory interface is three times faster than current DIMMs topping out at 2.133Gbps – and two times the maximum speed specified for DDR4 at 3.2Gbps.”

The 25% power savings, says Ferro, can be attributed to several factors. Firstly, the low-swing signaling reduces the I/O power required on the interface. This design is also ‘asymmetric,’ meaning that complex timing and the equalization circuits are all implemented in the PHY, thus greatly simplifying the DRAM interface and reducing cost. Removing complex timing circuits such as PLLs and DLLs from the DRAM makes it extremely agile, facilitating the rapid entrance and exit from power down mode. Because the memory controller is the originator of all memory requests, it is capable of implementing an aggressive and granular DRAM power management scheme.

Interested in learning more? The full text of “Looking Beyond DDR4 in the Age of the IoT” by Frank Ferro can be read on ChipEstimate here, while our Beyond DDR4 page is available here.

EE Times takes a closer look at Rambus’ 14nm R+ DDR4 PHY

Rambus Press — Mon, 01 Aug 2016 16:36:51 +0000

Gary Hilson of the EE Times has covered Rambus’ recent announcement about the development of its R+ DDR4 PHY on GLOBALFOUNDRIES 14nm LPP process. As the journalist notes, the silicon is the first production-ready 3200 Mbps DDR4 PHY available on GLOBALFOUNDRIES Inc.’s FX-14 ASIC platform using its power-performance optimized 14nm LPP process.

“The Rambus R+ DDR4 PHY intellectual property uses Rambus’ proprietary R+ architecture, based on the DDR industry standard,” Hilson explained. “The PHY is part of the Rambus’ suite of memory and SerDes interface offerings for networking and data center applications. Meeting the performance and capacity demands of those segments [is] a heavy focus for the company.”

Frank Ferro, a senior director of product marketing at Rambus, told the publication that the DFI 4.0-compatible R+ DDR4 PHY will enable customers to differentiate their offerings with improved performance while still maintaining full compatibility with industry standard DDR4 and DDR3/3L/3U interfaces.

“This gets them ahead of the curve in terms of memory performance,” Ferro said.

Indeed, the R+ DDR4 PHY delivers data rates from 800 to 3200 Mbps in multiple memory sub-system options, including die down, DIMM and 3DS. It also supports 16 to 72-bit interfaces, along with single and multi-rank configurations. The overall goal, says Ferro, is to provide system designers with flexibility for both high performance and low power, which is where the GLOBALFOUNDRIES 14nm process comes in. Nevertheless, as Ferro emphasizes, while DDR4 provides a significant performance boost over DDR3, engineers are still finding it challenging to improve the interface between memory and the CPU.

“The CPUs can run faster, and they [have] multiple channels of local DRAM they are accessing, but the CPUs are only as good as the access to the memory,” he explained. “The interface is the key bottleneck in the system.”

Ferro told the EE Times that Rambus is using internal tools to analyze the physical connections between the CPU and the DIMMs. “That’s where the limits come in. I think there’s still a ways to go,” he added.

Another challenge, he notes, is balancing the trade-offs between density and bandwidth by looking at the physical loading onto the bus. Rambus, Ferro confirmed, is currently exploring technology to minimize the loading effect of DIMMs.

From a broader perspective, says Ferro, while the high performance computing (HPC) segment might be what comes to mind first, Rambus is looking to meet the needs of the Facebooks and Instagrams of the world as their data center requirements trickle down to the chip companies. To be sure, Rambus recently announced its intention to acquire Inphi’s memory interconnect business as well as Semtech’s Snowbush serial interface IP.

A look back at the Nintendo 64 (N64)

Rambus Press — Mon, 27 Jun 2016 15:36:27 +0000

The long-awaited Nintendo 64 hit the hot neon city streets of Japan back in June 1996. Powered by a 64-bit NEC VR4300 CPU clocked at 93.75 MHz, the fifth generation console was one of the first to implement a unified memory subsystem and packed 4 megabytes of Rambus RDRAM (subsequently expandable to 8MB).

Image Credit: Wikipedia (via Evan-Amos)

According to Rambus Chief Scientist Craig Hampel, the N64 was also the first console to bring workstation-class graphics to the consumer. In addition, the N64 motherboard comprised just 2 layers – using co-planar waveguides – with the memory interface operating at 500MHz, which was 10x faster than any other DRAM at the time.

“For Rambus engineers, our primary challenge was to provide a significant amount of bandwidth with a relatively small amount of memory,” Hampel explained.

Image Credit: Wikipedia (via Yaca2671)

“We managed to do so and helped change gaming forever. The N64 implemented the first high volume application of RDRAM, which was designed, architected and developed by a tiny startup in Mountain View, California.”

As Hampel notes, he experienced an intense ‘aha moment’ about the significance of the N64 when playing Wave Race.

“It was the very first time a game actually felt real to me. It was spectacular, with amazing lighting models, physics and realistic 3D,” he said.

“Initially, at least, I don’t think many of us realized how the N64 was going to redefine gaming forever. However, the Nintendo 64 ultimately became one of the highest revenue consumer electronics products in prior history. In fact, mine still works, and every 6 months or so I fire it up with Wave Race or Super Mario and quietly say to myself ‘Wow!’”

Indeed, as Devin Coldewey of TechCrunch recently opined, the N64 was a “rock-solid” gaming console.

“[It] not only made a lot of us very happy for years and years, but also did important work in the history of gaming. It brought us classic titles that pushed the boundaries of what was expected of games, and made 3D worlds fundamental and integral with gameplay ideas rather than set dressing,” he reminisced.

“If Mario 64 proved that 3D games could be great, the rest of the N64’s lineup showed that 3D could be used in surprising and powerful ways. Wave Race 64 brought phenomenal water physics that wouldn’t be surpassed for years. Mario Kart 64 brought depth and verticality to madcap racing (though I still prefer the original). Ocarina of Time had you exploring a world almost too huge and complex to comprehend.”

Of course, the N64 was not the only console to use Rambus memory, as the Sony Playstation® 2 (PS2) and Playstation® 3 (PS3) also helped to define a new generation of gaming – while significantly raising the performance bar for future systems. Indeed, memory solutions and related innovations developed by Rambus engineers helped advance 3D realism across a number of gaming platforms.

Image Credit: Wikipedia (via Evan-Amos)

“In short, RDRAM played a critical role in defining what was possible in a video game – and I’m not just talking about photo quality realism for the sake of graphics alone,” Hampel concluded. “Rather, it helped push the limit in terms of rendering more accurate, physics-based effects, such as interacting with running water and depicting realistic vehicle impacts, complete with explosions and scattering debris.”

Optimizing memory bandwidth

Rambus Press — Mon, 20 Jun 2016 16:23:55 +0000

Frank Ferro, a senior director of product management at Rambus, recently sat down with Ed Sperling of Semiconductor Engineering and other industry participants to discuss the slew of new memory initiatives and entrants.

According to Ferro, the initiatives were prompted by the need for improved efficiency from a latency standpoint in the memory hierarchy.

“You have more bandwidth needs, but how do you get that bandwidth more efficiently? Everyone has been using DDR, and maybe getting HBM as another layer in the hierarchy,” he told Sperling. “Right now there’s a big gap with flash. There is a lot of activity trying to fill the gap between DDR and flash with RRAM or XPoint.”

As Ferro notes, the industry is also exploring various server architectures to fill the above-mentioned gap.

“[This] gets more into the system challenges. There are all these multiple processors, and the question is how do we utilize memory more efficiently. That’s the big bottleneck right now,” he observed.

In addition, says Ferro, the industry is also going to need some fast memory at the local level.

“At the extreme level, for an MCU you have a very small amount of ROM and RAM that you have to fit everything into,” he said. “The ability to expand that and not go off-chip will require SRAM. As you get bigger CPUs, that’s more about caches than SRAM.”

In terms of embedded DRAM, Ferro says the concept has been “kicking around” for a long time.

“There are technical advantages to embedded DRAM, but the economics don’t seem to work well. The size is too big and the cost is too high. If it’s vertically integrated, then embedded DRAM could work because you don’t necessarily care,” he concluded. “If I sell a chip that’s bigger than a competitor’s chip, I’m going to lose. But if it’s all vertical, maybe you can take advantage of power and performance savings with embedded DRAM. But we don’t see it.”

Note: The full text of “The Future of Memory” by Ed Sperling can be read on Semiconductor Engineering here.

Exploring 2.5D packaging and beyond

Rambus Press — Mon, 02 May 2016 16:13:10 +0000

Frank Ferro, a Senior Director of Product Marketing at Rambus, recently participated in a Semiconductor Engineering roundtable discussion about 2.5D and advanced packaging.

According to Ferro, 2.5D can succeed if customer demand overcomes the additional engineering costs associated with the packaging process.

“Back in the mobile days when we started seeing 3D packaging, it was because we needed space. We needed to get more memory into a smaller footprint,” he explained. “Today we’re seeing bandwidth as a driver in the case of high bandwidth memory (HBM). There’s a need to create another tier in the hierarchy, so customers are interested in looking at the cost tradeoffs of 2.5D using silicon interposer, and HBM versus traditional DRAM. Are the economics there? Yes, for the people who really want it and need it. For the masses, we still have a way to go.”

Ferro also commented on the power aspect of 2.5D packaging.

“Power is important. If you look at HMC (Hybrid Memory Cube), it was really hot two years ago, but has fizzled since then. By serializing all those signals you need high-speed SerDes,” he continued. “And then you have to look at the power of high-speed SerDes, versus HBM, which is wide and relatively slow. Power in HMC might have been less complex when seen from a 2D to 3D evolution standpoint because it was similar, but HBM won out because of lower power and lower complexity.”

Perhaps most importantly, says Ferro, it is essential for the industry to fully understand the 2.5D supply chain.

“If you’re just delivering a chip, then you can ship that chip to the customer. But now you’ve got a memory vendor, an SoC vendor, and an interposer vendor,” Ferro explained. ”How do you test that memory? If something breaks, who’s responsible for it? Now there are pins you can’t physically get to anymore. Three companies have to work together, so you have to get all these companies talking together.”

Simply put, there is a real problem if something goes wrong.

“You’re responsible to the end customer, but in your supply chain, you are still subject to those effects. You make it easier for your end customer, but you still have to deal with it,” he added.

As we’ve previously discussed on Rambus Press, HBM design and implementation can be challenging, as 2.5D-packaging technology inevitably adds various manufacturing complexities, along with silicon interposer costs. To be sure, there are numerous expensive components mounted to the interposer, such as the SoC and multiple HBM devices. Another significant challenge involves routing thousands of signals (data + control + power/ground) via the interposer to the SOC for each HBM memory used.

Despite the above-mentioned challenges, HBM offers a number of distinct capabilities for a new digital age dominated by the IoT. These include moving memory closer to the CPU, while increasing both density and bandwidth. Indeed, HBM takes advantage of existing technologies to create another tier of memory, thus bolstering server memory architecture.

Interested in learning more about 2.5D and advanced packaging? The full text of the roundtable discussion is available on Semiconductor Engineering here (Part 1) and here (Part 2).

From consoles to VR

Rambus Press — Thu, 31 Mar 2016 16:23:36 +0000

The Atari 2600 (or VCS) – which hit the nascent video game market back in 1977 – packed 128 bytes RAM and an 8-bit MOS 6507 CPU clocked at a mere 1.19 MHz. According to Wikipedia, the RAM was tasked with handling run-time data, which included the call stack and the state of the game world. There was no frame buffer.

Image Credit: Wikipedia (via Evan-Amos)

Fast-forward to 1996 and the launch of Nintendo’s N64. Powered by a 64-bit NEC VR4300 CPU clocked at 93.75 MHz, the fifth generation console was one of the first to implement a unified memory subsystem and packed 4 megabytes of Rambus RDRAM (subsequently expandable to 8MB).

Image Credit: Wikipedia (via Yaca2671)

Driven by Moore’s Law and the demands of gaming enthusiasts, consoles were rapidly becoming ever more sophisticated as they pushed – and sometimes even shattered – the limits of bandwidth, capacity and graphics. By the time 2006 rolled around, Sony’s PlayStation® 3 boasted a 3.2 GHz Cell Broadband Engine with 1 PPE & 7 SPEs, 256 MB of XDR DRAM and 256 MB GDDR3 DRAM.

Image Credit: Wikipedia (via Evan-Amos)

A decade later, all eyes in the gaming world are fixed on virtual reality (VR) headsets such as Facebook’s Oculus Rift, Samsung’s Gear VR (powered by Oculus), HTC’s Vive and Sony’s PlayStation VR. Perhaps not surprisingly, implementation and requirements vary wildly for each device. For example, Samsung’s Gear VR is designed to work with a compatible Galaxy device, which acts as the headset’s display and processor. Meanwhile, the actual Gear VR unit is designated as a controller, as it includes a high field of view, as well as an inertial measurement unit (IMU) for rotational tracking.

Image Credit: Oculus Rift

In contrast, system requirements for the Rift (which is contingent upon a PC) stipulate an NVIDIA GTX 970 or AMD 290 GPU, Intel i5-4590, 8GB RAM and HDMI 1.3 video output supporting a 297MHz clock.

“A traditional 1080p game at 60Hz requires 124 million shaded pixels per second. In contrast, the Rift runs at 2160×1200 at 90Hz split over dual displays, consuming 233 million pixels per second,” Oculus’ Atman Binstock explained in a recent blog post.

[youtube https://www.youtube.com/watch?v=amtBUkmHS0w]

“At the default eye-target scale, the Rift’s rendering requirements go much higher: around 400 million shaded pixels per second. This means that by raw rendering costs alone, a VR game will require approximately 3x the GPU power of 1080p rendering.”

In the future, says Binstock, successful consumer VR will likely drive changes in GPUs, OSs, drivers, 3D engines, and apps, ultimately enabling much more efficient low-latency VR performance.

“It’s an exciting time for VR graphics, and I’m looking forward to seeing this evolution,” he added.

Indeed, VR has certainly come a long way since 1991, when the $60,000 Virtuality 1000CS made its way into the arcade scene. The unit featured an HMD to display video and play audio, while players moved and used a 3D joystick to interact with the VR world. According to Tom’s Hardware, the system relied upon an Amiga 3000 to handle most of the game processing.

“Gaming may have significantly evolved over the years, but there is one constant that remains unchanged. Players are always seeking a more immersive experience, enabled by improvements in AI and more realistic and responsive graphics,” Steven Woo, Vice President of Systems and Solutions at Rambus, explained. “As such, gaming continues to be at the forefront, pushing the very limits of numerous technologies, including memory, processing and graphic capabilities.”

Woo, an engineer who participated in the development of memory technologies adopted in Sony’s PlayStation 2 and PlayStation 3 game consoles, also pointed out that VR, although quickly evolving, is still in a relatively nascent stage.

“As Oculus’ Atman Binstock noted, successful consumer VR is likely to drive changes in GPUs, OSs, drivers, 3D engines, and apps, ultimately enabling much more efficient low-latency VR performance,” Woo added. “I’m looking forward to seeing how VR will ultimately take advantage of new memory technology as it evolves over the next few years.”

Optimizing memory for next-gen computing

Rambus Press — Tue, 22 Dec 2015 16:11:38 +0000

Semiconductor Engineering Editor in Chief Ed Sperling recently noted that getting data in and out of memory is just as important as optimizing the speed and efficiency of a processor.

“[Nevertheless], for years design teams managed to skirt the issue because it was quicker, easier and less expensive to boost processor clock frequencies with a brute-force approach,” he explained. “That worked well enough prior to 90nm, and adding more cores at lower clock speeds filled the gap starting at 65nm.”

According to Sperling, the subsequent solution of choice amounted to packing more SRAM around processors. To be sure, some SoCs are now up to 80% memory, which is not considered the most efficient way to design chips.

For one thing, says Sperling, it puts the onus on operating system, middleware and embedded software teams to integrate the flow of data and make it all work. Indeed, even though this approach has been well tested and market-proven, it too is beginning to run out of steam.

“That puts chipmakers back in front of the original challenge of getting data in and out of memory more efficiently, but with some new hurdles and options,” he confirmed.

Such hurdles include interconnects, wires, thinning gate oxides and an increasing number of cores. Fortunately, the commercialization of fan-outs and 2.5D approaches has allowed chipmakers to rethink how and where to add memory on the Z-axis. Concurrently, new memory types are offering new options to balance cost, performance and data reliability.

As Rambus VP of solutions marketing Steven Woo puts it, there are a number of possible scenarios that could influence the architecture and integration of various IP blocks to evolve a new memory paradigm.

“You could make processors with lots and lots of cache, or you can use smaller die, more processors, and higher-bandwidth memory,” he explained. “For the low power community, you want to look at where power is being wasted. Moving data long distances is wasteful, which is why in phones you see the memories located very close to the processor.”

With something like Wide I/O, says Woo, a system would use a lot of wires at lower speed, rather than a few wires at high speed.

“You get better performance characteristics that way. What we’re witnessing is that the physical design envelope is changing. It’s no longer design in isolation. Packaging is changing, and that’s changing other things,” he continued. “TSVs are a change to the value chain and the way DRAMs are sold. When they’re assembled, there are a host of other issues, like where it can go bad and whether it went bad in assembly. Packaging changes the relationships between the players, too.”

In addition, says Woo, there are established methods for determining where things went wrong, but those signals might now inaccessible. He describes it as a big equation: “When the benefits outweigh the cost of assembly, test and manufacturing, then people adopt it.”

Woo also confirmed that the industry has seen a move to near-data processing – with data sets so large that it has become cheaper to move the processor closer to the data, rather than shift the data closer to the processor.

“There’s also a movement to minimize the data through semantic awareness, where you understand the structure of the data and you walk down a list of what’s done right in memory and an FPGA instead of having to return to a Xeon processor,” he added.

As Sperling points out, one concept has become crystal clear despite a still-evolving model.

“Memory is no longer just a checklist item in any advanced designs. It’s now an integral part of the design, and it can be tweaked, bent and twisted in ways that were largely ignored in the past to improve performance, reduce power, and create differentiation,” he concluded.

Interested in learning more? The full text of “Rethinking Memory” by Ed Sperling is available on Semiconductor Engineering here.

When memory and storage converge

Rambus Press — Thu, 15 Oct 2015 19:46:36 +0000

Earlier this week, Rambus Chief Scientist Craig Hampel gave a keynote presentation at MemCon 2015 that explored the increasingly blurred lines between memory and storage.

As Hampel notes, devices used as memory are typically volatile, byte addressable, directly writable, have deterministic latency and have an endurance greater than 10¹⁵operations. In contrast, storage devices are non-volatile, block addressable, require an erase operation, have varied latency and have endurance limits that are much less than memory devices.

The two also differ in terms of system integration distinctions. Memory, says Hampel, is hardware device parallel, hardware state controlled and contextually unaware. In addition, the CPU waits for memory and isn’t power coherent. Meanwhile, storage devices are abstracted, software state controlled and context of the data is resident in the storage system. Storage is also power coherent and the CPU typically context switches during a memory operation.

Lastly, memory supports a direct CPU interface, 100s of GB/s per CPU, fewer outstanding transactions and fixed scheduling. In contrast, storage features an abstracted interface, 10s of GB/s per CPU, 100s of outstanding transactions and split transaction/dynamic scheduling.

“The application view is probably the most definitive,” Hampel told conference attendees. “Memory is most often associated with partial and intermediate data, while storage is designated for complete and final data, as well as saved and persistent data.”

Despite the differences, says Hampel, the lines between memory and storage are beginning to blur. Indeed, data movement increasingly limits TCO, performance and power efficiency. An alternative paradigm could see compute offload to storage that is both data structure-aware and connected directly to a memory-like storage device.

“Memory and storage will begin to share numerous characteristics at the boundary. As expected, there are numerous software and hardware opportunities for these properties,” he explained. “Memory interfaces (like DDR), with extensions for storage support, are the likely deployment for emerging storage-class memories.”

According to the chief scientist, future converged memory interface requirements include speeds of up to 6.4+Gbps and 2DPC (DRAM & SCM modules on the same channel) and the efficient allocation as well as scheduling of SCM & DRAM bandwidth. In addition, future converged standards should minimize latency and power, while maintaining similar economics and infrastructure for low-risk industry adoption.

“In this context, potential DDR5 directions could include revamped control buses that provide more general purpose control paths – while removing primary and secondary bottlenecks,” he continued. “Extended LRDIMM architecture would support higher data and control rates, as well as caching and mixed DRAM/SCM module types, along with address and data buffers to abstract memory types.”

Similarly, the chief scientist added, improved protocols could potentially support storage class memory over memory type bus, as well as pipeline mixed minimum latency with non-deterministic and varied latencies.

“An upgraded data bus would enable lower swing and power efficient, single-ended signaling such as regulated LVSTL (NGS). It would also help facilitate new data bus topologies to improve data rates,” he concluded.