latency Archives - Rambus

Optimizing memory bandwidth

Rambus Press — Mon, 20 Jun 2016 16:23:55 +0000

Frank Ferro, a senior director of product management at Rambus, recently sat down with Ed Sperling of Semiconductor Engineering and other industry participants to discuss the slew of new memory initiatives and entrants.

According to Ferro, the initiatives were prompted by the need for improved efficiency from a latency standpoint in the memory hierarchy.

“You have more bandwidth needs, but how do you get that bandwidth more efficiently? Everyone has been using DDR, and maybe getting HBM as another layer in the hierarchy,” he told Sperling. “Right now there’s a big gap with flash. There is a lot of activity trying to fill the gap between DDR and flash with RRAM or XPoint.”

As Ferro notes, the industry is also exploring various server architectures to fill the above-mentioned gap.

“[This] gets more into the system challenges. There are all these multiple processors, and the question is how do we utilize memory more efficiently. That’s the big bottleneck right now,” he observed.

In addition, says Ferro, the industry is also going to need some fast memory at the local level.

“At the extreme level, for an MCU you have a very small amount of ROM and RAM that you have to fit everything into,” he said. “The ability to expand that and not go off-chip will require SRAM. As you get bigger CPUs, that’s more about caches than SRAM.”

In terms of embedded DRAM, Ferro says the concept has been “kicking around” for a long time.

“There are technical advantages to embedded DRAM, but the economics don’t seem to work well. The size is too big and the cost is too high. If it’s vertically integrated, then embedded DRAM could work because you don’t necessarily care,” he concluded. “If I sell a chip that’s bigger than a competitor’s chip, I’m going to lose. But if it’s all vertical, maybe you can take advantage of power and performance savings with embedded DRAM. But we don’t see it.”

Note: The full text of “The Future of Memory” by Ed Sperling can be read on Semiconductor Engineering here.

The evolution of augmented reality (AR)

Rambus Press — Thu, 14 Apr 2016 15:52:18 +0000

Wearable augmented reality (AR) devices are still at a relatively nascent stage. As the technology progresses, augmented reality will likely face a number of obstacles, including evolving social mores, strict adherence to Moore’s Law and the challenge of maintaining a seamless user experience.

According to Rambus Fellow Dr. David G. Stork, it is difficult to ascertain if social mores, or more specifically, the expectation to privacy, will serve as an impediment to the adoption of AR technology such as cameras mounted on eyeglasses.

“George Orwell, author of the dystopian masterpiece 1984 about Big Brother and constant surveillance by the state, could not have foreseen the exhibitionism that many people—especially younger ones—are eager to embrace and which fuels numerous surveillance-based reality TV shows including Real Housewives and, most appropriately, Big Brother,” he explained during a recent interview with Rambus Press in Sunnyvale.

“Likewise, many in Generation X seem comfortable in posting all manner of personal information on the web. These examples illustrate how those comfortable with sharing personal information can and will continue to do so. However, if augmented reality technology is to overcome social mores, it is [likely to be contingent upon] the non-users of head-mounted cameras—i.e., the people being watched and recorded. [Will they] submit to such surveillance, usually without prior consent? Again, it is difficult to know how this will play out.”

Stork also discussed the technical challenges facing augmented reality. Indeed, realistic rendering of three-dimensional figures and scenes for AR requires surprisingly powerful computation, and thus power.

“Will the relentless progress in computational power, summarized by Moore’s Law, be sufficient to meet this demand? It seems likely that the augmented-reality applications will continue to grow in sophistication, always using the maximum available computational resources,” he continued. “Recall that in the 1970s computer games such as Pong, Pac Man and Space Invaders—woefully primitive by today’s standards—were nevertheless compelling.”

The human visual system, says Stork, is one of the most extraordinary sophisticated systems of any form—natural or artificial—and exquisitely sensitive to certain visual phenomena.

“One such phenomena is retinal slip or mismatch between the direct view of the scene through the glasses and the three-dimensional virtual figures and avatars that are meant to appear within the scene,” he added. “Such mismatch can be very unsettling and even lead to nausea. The mobile compute power will be used first and foremost to reduce such mismatch to tolerable levels, and remaining computational power applied to rendering of realistic figures. However, latency and accuracy must come first if an augmented reality system is to ever gain commercial success.”

The importance of understanding bandwidth

Rambus Press — Mon, 21 Sep 2015 16:39:47 +0000

Did you know that the terms “latency” and “bandwidth” are frequently misused?

According to Loren Shalinsky, a Strategic Development Director at Rambus, latency refers to how long the CPU needs to wait before the first data is available. Meanwhile, bandwidth describes how fast additional data can be “streamed” after the first data point has arrived.

“Bandwidth becomes a bigger factor in performance when data is stored in ‘chunks’ rather than being randomly distributed,” Shalinsky wrote in a recently published Semiconductor Engineering article. “As an example, programming code tends to be random, as the code needs to respond to the specific input conditions. Large files, where perhaps megabytes or more of sequential data needs to be stored, would represent the other end of the spectrum.”

Read our primer: MACsec explained: Securing data in motion

As Shalinsky points out, modern computer systems adhere to a 4K-sector size, with large files broken up into easier-to-manage chunks of 4096 bytes. Interestingly, the concept of a sector size is actually a holdover from the original hard disk drives. (HDDs). Indeed, even solid-state drives (SSDs) adhere to this traditional paradigm, thereby maintaining compatibility with computer file systems.

To further illustrate the differences between bandwidth and latency, Shalinksy created a detailed chart (see below) that compares expected bandwidth with the bandwidth specified by manufacturers for common and up-and-coming memory solutions.

“For each of these examples, I assume the first access is to a random storage location and, therefore, the latency must be accounted for,” he explained. “Note that when accounting for latency, the calculated bandwidth often pales in comparison to the bandwidth specified in a product brief.”

Understanding application use cases, says Shalinsky, is critical to determining what type of memory is most appropriate for specific use cases. For example, let’s imagine a server running a database application with small record sizes of 1Kbyte in size that are rarely accessed sequentially. Essentially, this means latency dominates performance.

“[Yes], SSDs [do] provide a significant improvement over hard drives,” he continued. “However, their performance is still three orders of magnitude smaller than any DRAM-based memory systems. [Nevertheless], SSDs have continued to move closer to the CPU, reducing their latency along the way.”

However, while SSDs adhering to NVMe aim to lower latencies, this does little to actually affect the NAND device inside SSDs – with an inherent latency of tens to hundreds of microseconds. In fact, even the greater than 50% latency reduction touted by NVMe doesn’t mean the memory gap can be jumped.

“For a database where the record size gets larger, say 8 Kbytes in size, the calculated bandwidth does improve markedly – as the system can now take better advantage of the max bandwidth and spread the ‘cost’ of the latency over more bytes,” Shalinsky confirmed. “By being very strategic in the placement of the data (e.g. for record sizes that are in the megabyte range), all of these systems have the capability of continuously streaming the data, and then bandwidths begin to approach the specified max bandwidth.”

As we noted above, understanding application use cases is critical to determining what type of memory is most appropriate for specific use cases. For example, DRAM-based memory systems are a good fit when it comes to maximizing performance for random operations.

“If you need memory for large records, consider what your budget allows and how much memory capacity and bandwidth you really need. Then you can make an informed decision,” Shalinsky concluded.

Understanding the memory-storage pyramid

Rambus Press — Thu, 27 Aug 2015 17:11:24 +0000

Loren Shalinsky, a Strategic Development Director at Rambus, recently penned a detailed article for Semiconductor Engineering that explores the memory-storage hierarchy.

As he puts it, the hierarchy, or pyramid, is a particularly succinct method of understanding computer systems and the dizzying array of memory options available to the system designer.

“Many different parameters characterize the memory solution,” Shalinsky explained. “Among them are latency (how long the CPU needs to wait before the first data is available) and bandwidth (how fast additional data can be ‘streamed’ after the first data point has arrived), although by my count there are more than 10 different parameters to measure.”

As expected, no single memory sub-system can be considered “best” in all categories. As such, various memory solutions are routinely exploited at different levels of the hierarchy to achieve optimized results. For example, high-end systems, such as servers found in datacenters, are most likely to leverage solutions from every level in the hierarchy.

While not changing the relative placement on the pyramid (see above), memory systems continue to evolve at a steady cadence. As such, future DIMM subsystem improvements are perhaps the easiest to imagine. To be sure, DRAM latency has not changed much over the years, although DRAM data rates continue to increase with an eye on more capacity and bandwidth.

New memory technologies such as HBM or HMC, says Shalinsky, can be sandwiched in-between DIMMs and on-chip memories – with the ability to place gigabytes of data even closer to the CPU than a DIMM.

“Going back 5-10 years, Solid State Drives (SSDs) started to fill the huge gap that originally existed between DIMMs and hard drives,” he continued. “[However], the underlying NAND technology performance has somewhat leveled off (but made extraordinary progress in price reduction), and has therefore left the door open for additional technologies to fill the remaining gaps.”

To be sure, 3D XPoint technology, announced by Intel and Micron earlier this month, seems to be targeting these very gaps.

“While technical details are scarce, we can piece together enough data points to surmise that 3D XPoint could fill one of the two blank levels currently in between SSDs and DIMMS,” he added. “Even with the addition of 3D XPoint, many gaps will continue to exist in the memory hierarchy, leaving no shortage of research avenues for companies in the memory industry.”

It should be noted that Shekhar Borkar, Intel Fellow and director of extreme-scale technologies, recently told The Platform DRAM will be regarded as a first-level, high capacity memory for years to come.

“The bottom line is that for the next ten years, if I am a node designer, I will rely on DRAM as a first-level, high capacity memory, followed by NAND or PCM as the next level for storage,” he said. “Everything else – keep working on it, and when it is ready, I will use it. Today, you are not ready.”

Future challenges for DDR4 and beyond

Rambus Press — Tue, 25 Aug 2015 16:20:31 +0000

Ely Tsern, VP and chief technologist for the Rambus Memory and Interfaces division, has identified five key trends driving future server memory. These include Big Data, additional cores per CPU, a DRAM scaling slowdown, the emergence of storage class memory and the expectation that DDR4 will ultimately reach its speed limit.

“Rambus is working with customers and the industry to determine the most cost-effective ways to address these trends,” he confirmed during a recent presentation at IDF 2015.

“The good news? DDR4 should be capable of scaling to 3.2 – 3.7 Gbps. Simply put, this means DDR4 has ~50% more headroom from systems shipping today.”

According to Tsern, the paradigm required to achieve such speeds within the context of DDR4 includes a maximum of two DIMMS per channel (2DPC), aggressive I/O technologies and fully buffered LRDIMMs optimized for speed. However, DDR4 will ultimately run its course, says Tsern, with applications and systems expected to demand 2x to 3x more bandwidth – a requirement that is unlikely to be met by simply adding more channels.

“The real question? Can DDR architecture be extended to reach 2X again, with speeds up to 6.4+Gbps (at 2DPC)? Of course, a potential DDR5 standard must also support DRAM modules and modules using other memory types, such as NAND Flash and SCM (Storage-Class Memory) on the same channel,” he explained. “Plus, DDR5 should maintain similar economics and infrastructure for low risk industry adoption – while minimizing platform and system cost – vs. DDR4.”

Additional DDR5 requirements outlined by Tsern include minimal CPU and DRAM changes (especially pin count, controller learning), support for both DDP and 3DS, reducing latency and power, as well as the ability to facilitate a smooth transition from DDR4 to DDR5.

“We believe the industry needs to work together on developing next generation DDR solutions, with the goal of doubling current speed with minimal changes,” said Tsern. “Potential DDR5 solution (or directions) include leveraging LRDIMM architecture to support higher frequencies and multiple memory types on the DDR channel; ASIC process scaling; less bus loading; advanced I/O techniques; new data bus topologies and the use of improved, lower swing, power efficient, single-ended signaling and revamped control buses in primary/secondary to remove bottlenecks.”

Rambus, confirms Tsern, already has a DDR5 prototype running at 6.4Gbps (@2 DIMMs per channel).

“In general, memory is an exciting business that faces significant challenges ahead. We at Rambus now have a new focus in server memory with the launch of our R+ DDR4 server memory chipset – RB26 – for RDIMMs and LRDIMMs,” he added. “We’ve also stepped up industry collaboration by joining JEDEC JC-40 and continue to maintain close customer and ecosystem engagements. We plan on delivering high value, JEDEC-standard products for server and datacenter markets. We feel our new chip business extends existing business with a ‘blended’ business model that helps facilitate our engagement with customers and industry on future server solutions.”