Server architecture is undergoing a massive disruption, fundamentally transitioning from being compute-centric to being more data-centric. In the past decade, CPUs gained an increasing number of cores, but their bandwidth didn’t keep pace. Since the CPU’s overall bandwidth is divided among cores, an increase in core count reduces the effective bandwidth available per core, which consequently lowers the CPU’s single-core performance. In contrast, memory and storage devices have increased bandwidth by 2-3 orders of magnitude in the same period. This means CPUs cannot harness the full potential of modern memory devices without stalling their application performance.
Just like how GPU expedites the processing of highly parallel workloads, a Data Processing Unit (DPU) has been developed to offload the CPU’s data management tasks across the bus along with accelerators, storage and network devices. Despite the processing being deputed to multiple devices, the CPU can only be truly offloaded if devices can themselves write back to system memory without much involvement of the CPU. Also, there are only a few terabytes of DDR and HBM memory that can be mounted per CPU socket, but we could go much further if PCIe ports could host memory. A new non-proprietary standard named CXL emerged in 2019 to address these issues.
What is CXL?
Compute Express Link (CXL) is the high-speed low-latency cache-coherent interconnect standard built on top of the ubiquitous PCIe standard. CXL upgrades the PCIe capabilities, allowing the CPU and accelerators to load-store from each other’s memory. The standard reduces the involvement of the CPU in the process and minimizes redundant data movements across the bus. CXL standard comprises of three protocols – IO, cache and memory – which can be combined in multiple ways to support different usage scenarios, just like the ones shown in the figure below.
Representative Use Cases of CXL
Source: CXL 2.0 Whitepaper
CXL and Other Interconnects
CXL Consortium has received remarkable support from the industry with over 165 members covering virtually all the major manufacturers of CPUs, GPUs, memory, storage and networking equipment. CXL’s membership significantly exceeds that of the other coherent interconnect standards like CCIX and OpenCAPI. Being based on PCIe, CXL’s cable length is limited to 4 inches, which initially confined its scope as an intra-chassis interconnect. But after its recent merger with Gen-Z, CXL will be able to harness Ethernet, extending the cable reach by up to a few tens of metres. Our report ‘CXL: Democratizing Server Disaggregation’ provides a more elaborate comparison of CXL with other coherent interconnects.
CXL Pushing Hyperscalers and HPC Expansion
As an increasing number of services and business functions are being ported to cloud-based platforms, the data centres will grow bigger and more complex to serve the industry’s rising needs. Scaling the server memory beyond a point becomes less attractive when using the DDR or HBM memory (due to physical, power and cost limitations). CXL enables the use of PCIe attached DRAM that can potentially be scaled to petabytes of memory while giving byte-level access to the CPU, just like the DDRx DRAMs.
Furthermore, CXL removes the DDR-mounted DRAM’s 15-watt power limit, creating avenues for faster, lower latency and liquid-cooled memory. CXL creates a gateway for future adoption of persistent memory (P-MEM) when it overcomes the teething issues and begins delivering cost-performance ratios that fall truly between memory and storage.
CXL Simplifies P-MEM Integration Into Server Architecture
Source: CXL 2.0 Whitepaper
Some challenges that CXL faces despite addressing the cable reach limit include latency, asymmetric coherence, lack of peer-to-peer communication and the absence of support for the multi-layer switch, which is explained further in the report ‘CXL: Democratizing Server Disaggregation’.
CXL was built to simplify the interconnection and scalability of accelerators and memory expansion. CXL has a strong potential to dominate the server interconnection market in scenarios where memory expansion prioritizes cost and capacity over latency. CXL-based memory is likely to be seen as a complement to HBM. Together, they can threaten the dominance of DDR-based memory in the server landscape over the coming decade.