How CXL 3.0 Fuels Faster, More Efficient Data Center Performance

Gary Ruggles, Madhumita Sanyal, Richard Solomon

Sep 05, 2022 / 5 min read

Imagine if you could completely reinvent how you solve problems. Instead of breaking a large problem into smaller pieces, for example, what if you had a system that could address the issue in its entirety? These are the possibilities that Compute Express Link™ (CXL™) brings to high-performance computing (HPC) and cloud-based applications. Now that the latest version of the specification, CXL 3.0, has been announced, data centers and supercomputers are on their way to delivering even faster, more efficient performance.

In other words, these powerful systems will no longer be constrained by previous limitations.

It’s good news for anyone working to tackle some of the biggest problems of our time, from human genome mapping to climate change modeling and vaccine discovery. At the same time, CXL 3.0 also offers promising memory-sharing advantages for disaggregated data centers, edge applications like autonomous vehicles, and relatively smaller scale applications that require fast, real-time responses, like video games. In this blog post, we’ll take a closer look at the possibilities that CXL 3.0 can bring to a variety of data-driven applications that demand increasingly higher levels of memory capacity, with higher bandwidth, more security, and lower latency.

Person walking through data center

How Compute Express Link Addresses Growth in Data

Growth in unstructured data—think video, images, text, and voice—is driving new compute architectures, with a greater portion of data center workloads running on accelerators. CXL is known as the “breakthrough” CPU-to-device, cache-coherent interconnect for processors, memory expansion, and accelerators. Running across the standard PCI Express® (PCIe®) physical layer (PHY) (and therefore using standard PCIe PHYs), CXL uses a flexible processor port that can auto-negotiate to either the standard PCIe transaction protocol or the alternate CXL transaction protocols, targeting extremely low latency for new cache and memory transactions.

Largely used by designers of data center servers, supercomputers, and enterprise computing systems for applications like AI and machine learning, the protocol allows CPUs and accelerators to access each other’s memory. Its technical specifications are developed by the CXL Consortium, an open industry standard group. CXL 3.0 doubles the speed of its predecessor, providing data rates up to 64GT/s (the same as PCIe 6.0) without any added latency compared to previous generations. According to the CXL Consortium, the newest specification also features:

  • Advanced switching and fabric capabilities
  • Efficient peer-to-peer communications
  • And fine-grained resource sharing across multiple compute domains

What these features mean is more scalability and optimized system-level flows. There’s also backwards compatibility with preceding generations of CXL, and more importantly, designers may choose to use CXL 3.0 at lower link speeds to take advantage of many of the latest capabilities.

What makes CXL ideal for SoCs used in compute-intensive applications is its ability to maintain memory coherency between the CPU memory space and memory on attached devices. Memory coherency paves the way for higher performance via resource sharing, less complexity of the software stack, and lower overall system cost. Designers can then focus on their application’s target workloads rather than redundant memory management hardware in their design’s accelerators.

The current trend toward data center disaggregation is well-served by CXL 3.0. In a disaggregated architecture, homogenous resources such as storage, compute, memory, and networking are connected via optical interconnects. This approach leads to better platform flexibility, higher density, and better resource utilization, with data center designers able to tap into resources based on the needs of particular workloads. CXL 3.0 treats resources as interchangeable, allowing the more flexible provisioning and management of resources that disaggregated data centers require.

Think Big Right from the Start

By providing a degree of symmetry to its coherency, CXL 3.0 allows accelerators in an SoC to take a more equal role with the host, so both can cache the same data at the same time, rather than sequentially. This results in substantially more efficient performance for certain types of tasks. Memory sharing takes advantage of hardware coherency by allowing CXL-attached memory to be coherently shared across hosts. So, more than one host can simultaneously access a certain section of memory, while every host is able to see that location’s most up-to-date data. With this capability, designers can create clusters of machines to solve big problems through shared memory constructs.

Support for storage-class memory means that when a device is powered off, data isn’t lost. As a result, there’s no longer a need to be concerned about saving data to a disc. If the end application is a video game, for instance, the player would be able to continue where they left off even if the gaming device’s battery died.

The protocol’s advanced fabric capabilities are a shift from previous generations and their traditional tree-based architectures. The new fabric supports up to 4,096 nodes, each able to communicate with one another via a port-based routing (PBR) addressing mechanism. A node can encompass several things: a CPU host, a CXL accelerator (memory included or not), a PCIe device, or a Global Fabric Attached Memory (GFAM) device. The GFAM device allows an array of new possibilities in building systems made up of compute and memory elements that are arranged to satisfy the needs of specific workloads. For example, with access to a terabyte or a petabyte of memory, it’s possible to create whole new models to tackle complex challenges such as mapping the human genome. Rather than writing software and building a compute system based on the philosophy that you must work on a large problem in smaller pieces, you can start with the idea that your system will be able to handle the entire problem all at once.

CXL 3.0 IP Eases Adoption of New Protocol

To ease and accelerate adoption of the latest CXL protocol, Synopsys offers the industry’s first complete CXL 3.0 IP solution, encompassing the controller, PHY, and verification IP to deliver secure, low-latency, high-bandwidth interconnection for AI, machine learning, and cloud computing applications. The solution includes:

  • Synopsys CXL Controller IP, which implements the port logic for building a CXL device, host, or switch, and is configurable for dual-mode applications supporting runtime selectability between device and host mode.
  • Synopsys Integrity and Data Encryption (IDE) Security IP Modules for CXL, which provide confidentiality, integrity, and replay protection for Flow Control UnITs (FLITs) in the case of CXL.cache and CXL.mem protocols and for Transaction Layer Packets (TLP)/FLITs in the case of CXL.io, and is designed for direct connection with the Synopsys controller IP.
  • Synopsys PCIe 5.0 (32GT/s) and 6.0 (64GT/s) PHY IP, which meets demands for higher bandwidth and power efficiency across network interface card (NIC), backplane, riser cards, retimers, and chip-to-chip interfaces; the PHY, available in advanced nodes, interoperates seamlessly with the CXL Controller IP.
  • Synopsys Verification IP (VIP) for CXL, which addresses new verification complexities at each layer for faster verification closure, while providing easy-to-use APIs for migrating from CXL 2.0/PCIe 6.0 to the CXL 3.0 domain. Synopsys verification solutions for CXL and PCIe provide IP-to-system-level verification closure using simulation, emulation, and prototyping platforms.

Built on silicon-proven Synopsys PCIe IP, our CXL IP solution lowers integration risks for device and host applications and helps designers achieve the benefits that CXL 3.0 brings to SoCs for data-intensive applications. As an early CXL contributor, Synopsys had early access to the latest specification, enabling our engineers to deliver a more mature solution compared to competitive offerings.

Summary

Data makes our digital world go ‘round, fueling applications that are tackling some of today’s most pressing concerns, from disease management to a warming climate. New data center architectures, such as the disaggregated approach as well as the integration of accelerators, are emerging in response to incessant demands for more bandwidth and lower latency for these data-heavy applications. The CXL 3.0 specification is ideal for these new approaches, providing twice the speed of its predecessor without any added latency. With Synopsys’ complete CXL 3.0 IP and Verification IP solution, designers can rest assured they’ll experience a smooth design and verification process for their next data center, AI, networking, memory expansion, or acceleration design.

Continue Reading