Imagine if you could completely reinvent how you solve problems. Instead of breaking a large problem into smaller pieces, for example, what if you had a system that could address the issue in its entirety? These are the possibilities that Compute Express Link™ (CXL™) brings to high-performance computing (HPC) and cloud-based applications. Now that the latest version of the specification, CXL 3.0, has been announced, data centers and supercomputers are on their way to delivering even faster, more efficient performance.
In other words, these powerful systems will no longer be constrained by previous limitations.
It’s good news for anyone working to tackle some of the biggest problems of our time, from human genome mapping to climate change modeling and vaccine discovery. At the same time, CXL 3.0 also offers promising memory-sharing advantages for disaggregated data centers, edge applications like autonomous vehicles, and relatively smaller scale applications that require fast, real-time responses, like video games. In this blog post, we’ll take a closer look at the possibilities that CXL 3.0 can bring to a variety of data-driven applications that demand increasingly higher levels of memory capacity, with higher bandwidth, more security, and lower latency.
Growth in unstructured data—think video, images, text, and voice—is driving new compute architectures, with a greater portion of data center workloads running on accelerators. CXL is known as the “breakthrough” CPU-to-device, cache-coherent interconnect for processors, memory expansion, and accelerators. Running across the standard PCI Express® (PCIe®) physical layer (PHY) (and therefore using standard PCIe PHYs), CXL uses a flexible processor port that can auto-negotiate to either the standard PCIe transaction protocol or the alternate CXL transaction protocols, targeting extremely low latency for new cache and memory transactions.
Largely used by designers of data center servers, supercomputers, and enterprise computing systems for applications like AI and machine learning, the protocol allows CPUs and accelerators to access each other’s memory. Its technical specifications are developed by the CXL Consortium, an open industry standard group. CXL 3.0 doubles the speed of its predecessor, providing data rates up to 64GT/s (the same as PCIe 6.0) without any added latency compared to previous generations. According to the CXL Consortium, the newest specification also features:
What these features mean is more scalability and optimized system-level flows. There’s also backwards compatibility with preceding generations of CXL, and more importantly, designers may choose to use CXL 3.0 at lower link speeds to take advantage of many of the latest capabilities.
What makes CXL ideal for SoCs used in compute-intensive applications is its ability to maintain memory coherency between the CPU memory space and memory on attached devices. Memory coherency paves the way for higher performance via resource sharing, less complexity of the software stack, and lower overall system cost. Designers can then focus on their application’s target workloads rather than redundant memory management hardware in their design’s accelerators.
The current trend toward data center disaggregation is well-served by CXL 3.0. In a disaggregated architecture, homogenous resources such as storage, compute, memory, and networking are connected via optical interconnects. This approach leads to better platform flexibility, higher density, and better resource utilization, with data center designers able to tap into resources based on the needs of particular workloads. CXL 3.0 treats resources as interchangeable, allowing the more flexible provisioning and management of resources that disaggregated data centers require.
By providing a degree of symmetry to its coherency, CXL 3.0 allows accelerators in an SoC to take a more equal role with the host, so both can cache the same data at the same time, rather than sequentially. This results in substantially more efficient performance for certain types of tasks. Memory sharing takes advantage of hardware coherency by allowing CXL-attached memory to be coherently shared across hosts. So, more than one host can simultaneously access a certain section of memory, while every host is able to see that location’s most up-to-date data. With this capability, designers can create clusters of machines to solve big problems through shared memory constructs.
Support for storage-class memory means that when a device is powered off, data isn’t lost. As a result, there’s no longer a need to be concerned about saving data to a disc. If the end application is a video game, for instance, the player would be able to continue where they left off even if the gaming device’s battery died.
The protocol’s advanced fabric capabilities are a shift from previous generations and their traditional tree-based architectures. The new fabric supports up to 4,096 nodes, each able to communicate with one another via a port-based routing (PBR) addressing mechanism. A node can encompass several things: a CPU host, a CXL accelerator (memory included or not), a PCIe device, or a Global Fabric Attached Memory (GFAM) device. The GFAM device allows an array of new possibilities in building systems made up of compute and memory elements that are arranged to satisfy the needs of specific workloads. For example, with access to a terabyte or a petabyte of memory, it’s possible to create whole new models to tackle complex challenges such as mapping the human genome. Rather than writing software and building a compute system based on the philosophy that you must work on a large problem in smaller pieces, you can start with the idea that your system will be able to handle the entire problem all at once.
To ease and accelerate adoption of the latest CXL protocol, Synopsys offers the industry’s first complete CXL 3.0 IP solution, encompassing the controller, PHY, and verification IP to deliver secure, low-latency, high-bandwidth interconnection for AI, machine learning, and cloud computing applications. The solution includes:
Built on silicon-proven Synopsys PCIe IP, our CXL IP solution lowers integration risks for device and host applications and helps designers achieve the benefits that CXL 3.0 brings to SoCs for data-intensive applications. As an early CXL contributor, Synopsys had early access to the latest specification, enabling our engineers to deliver a more mature solution compared to competitive offerings.
Data makes our digital world go ‘round, fueling applications that are tackling some of today’s most pressing concerns, from disease management to a warming climate. New data center architectures, such as the disaggregated approach as well as the integration of accelerators, are emerging in response to incessant demands for more bandwidth and lower latency for these data-heavy applications. The CXL 3.0 specification is ideal for these new approaches, providing twice the speed of its predecessor without any added latency. With Synopsys’ complete CXL 3.0 IP and Verification IP solution, designers can rest assured they’ll experience a smooth design and verification process for their next data center, AI, networking, memory expansion, or acceleration design.
Stay up-to-date on the latest electronic design technologies and trends by subscribing to the “From Silicon to Software” blog.