Die-to-Die Connectivity for High-Performance Computing 

Scott Durrant

Nov 30, 2020 / 7 min read

Moore’s Law may be slowing down, but the need for advanced high-performance computing (HPC) solutions is heating up. The semiconductor market for cloud computing and data centers is projected to rise to $177.6 billion in 2027, driven by more powerful servers, larger data storage solutions, and higher performance AI accelerators, along with HPC applications such as natural language processing, image processing, and more. While engineers in the past addressed the need for more advanced HPC semiconductor functionality by adding cores onto dies and increasing the size, that solution doesn’t cut it anymore.

Integrating new and complex HPC functionality is pushing die sizes to – and beyond – their manufacturing limits. In addition, while huge monolithic dies offer high performance, they can drive up overall costs. To overcome cost and yield issues, as well as for scalability and flexibility, designers are splitting the system-on-a-chip (SoC) and turning to die-to-die connectivity.

Engineers are now looking to different architectures featuring die-to-die connectivity to fuel innovation in the HPC industry.

Die-to-Die Connectivity and Multi-Chip Modules

Increased workload demands and the need to move data faster have driven SoCs to become more complex with advanced functionality and die sizes that approach the reticle limits of manufacturing equipment. To combat this, designers are partitioning SoCs in smaller modules in multi-chip module (MCM) packaging.

Taking a single SoC and decentralizing it by splitting the chip via the core or the I/O into smaller “chiplets” require MCMs which are mostly placed on organic substrates or an interposer. The connections between chiplets are made through either heterogeneous or homogeneous die-to-die connectivity.

These disaggregated chips require ultra- and extra-short reach (USR/XSR) or High-Bandwidth Interconnect (HBI) links to enable inter-die connectivity with high data rates. In addition to bandwidth, die-to-die connectivity must ensure reliable, power-efficient links with extremely low latency.

Heterogeneous Dies vs. Homogeneous Dies

There are two main approaches designers can take when splitting chips: homogeneous dies and heterogeneous dies, each of which yield their own advantages. The figure below shows these two converging trends for die-to-die connectivity in MCMs; the image on the left depicts die disaggregation (or homogeneous dies) and the image on the right depicts package integration (or heterogeneous dies).

 

Heterogeneous Dies vs. Homogenous Dies | Synopsys

For instance, splitting the die into multiple homogeneous dies (same functionality in each die) reduces the size of individual dies, improving the fabrication yield and providing greater product flexibility.

On the other hand, integrating heterogeneous dies enables the use of process technologies that are cost- and performance- optimized for the implemented function. For example, analog and RF functions do not take advantage of process scaling and are more efficiently implemented in older nodes.

In the cloud computing and HPC space, there is heavy use in scaling devices (a homogeneous application). As you scale circuitry within a package, you may use several homogeneous dies in a package to get higher performance. However, in applications such as edge computing, 5G, and IoT devices, performance scaling is less important than reducing size and cost. In those cases, heterogeneous die applications are more popular.

In addition, the packaging of MCM is evolving, which allows designers to choose from options that best suit their needs. Using organic substrates, interposers, and wafer-level packaging enables low-cost, low-density connection between dies with a conservative amount of I/Os. Silicon interposers are a more expensive and complex alternative, but allow for a very high-density connection between dies.

How Die-to-Die Interfaces Reduce Total Costs

Parallel-based PHY IP supports die disaggregation or homogeneous dies, allowing large SoCs approaching maximum reticle size to improve yield and die cost by increasing scalability.

As SoC dies increase in size to meet ever-increasing performance demands, manufacturing costs increase and yields decline. One mechanism for addressing the cost and complexities of growing die sizes is to build SoCs from multiple smaller, interconnected dies.

Illustrated in the diagram below, this approach can reduce die cost by yielding more usable SoCs at a given manufacturing defect rate, while also providing greater flexibility in performance scaling. On the left, using the monolithic approach yields 26 good die. However, using the 4x dies per package approach on the right increases yields to 53 good SoCs per wafer.

Cost comparison of Monolithic Die vs. 4X Smaller Die | Synopsys

Another way die-to-die interfaces help manage total costs is by facilitating lower cost manufacturing processes. For example, if you’re creating an SoC that has I/O IP that is most efficiently manufactured on a 14nm process (much lower cost than a 7nm process) and CPU cores in 7nm, these die can be manufactured on their separate processes and interconnected in a single package using die-to-die interfaces for a much more affordable solution.

Finally, chiplets are more cost-effective because they can be re-used. Instead of a set of circuitry that only gets used in one chip, you can re-use that circuitry in multiple systems and reduce development costs.

Advantageous Applications of Die-to-Die in HPC

Choosing the right IP for die-to-die connectivity is dependent on the end use case. In high-performance computing and AI applications, large SoCs may be divided into two or more homogeneous dies. However, in networking SoCs, I/O interface IP and processing cores may be implemented on entirely separate dies.

In the case of HPC servers, die sizes are typically approaching 550 mm2 to 800 mm2, which is near the reticle limits of modern lithography machines and is putting a strain on the cost of dies. SoCs for HPC servers can benefit from design approaches that utilize multiple, smaller homogeneous dies and leverage extremely low latency and zero bit-error rate die-to-die interfaces.

Ethernet switch SoCs are tasked with moving data at a rate between 12 Tbps and 25 Tbps, requiring 256 lanes of 100G SerDes interfaces. These specifications simply wouldn’t fit in an 800 mm2 reticle. To compensate for the demand, designers configure the chiplets into a pattern where a core die is surrounded by I/O dies. The connections between the core and I/O dies use a die-to-die transceiver.

Lastly, AI SoC dies include intelligence processing units (IPUs) and distributed SRAMs within proximity to each IPU with low-latency, short-reach access to the SRAM. Synopsys is enabling the scaling AI systems with die-to-die interfaces with improvements to the five key metrics as shown in the figure below.

Key Care abouts for Die-to-Die Interfaces | Synopsys

DesignWare® Die-to-Die PHY IP

Synopsys’ DesignWare® Portfolio offers IP for both serial and parallel interfaces in multiple manufacturing nodes, which enables designers to choose from a wide variety of solutions.

The DesignWare Die-to-Die PHY IP Solutions allow for high bandwidth and short connectivity. They also increase yield – the percentage of dies on a wafer that aren’t discarded during manufacturing – making it a more cost-effective IP solution. This product supports NRZ and PAM-4 signaling from 2.5G to 112G data rates, delivering maximum throughput per die edge for large MCM designs. The parallel-based PHY IP targets advanced 2.5D packaging that takes advantage of much finer pitch die-to-die connections than traditional flip-chip organic substrates.

DesignWare IP solutions are available for an array of different applications, including 56G/112G USR/XSR SerDes and HBI/AIB. The 56G/112G USR/XSR SerDes leverages a low-cost organic substrate with high data rates per lane (112 Gbps) and has low-density package routing. The DesignWare USR/XSR PHY IP is compliant with the OIF CEI-112G and CEI-56G standards for USR and XSR links. The HBI PHY IP delivers 4 Gbps per pin die-to-die connectivity with low latency.

Senior Product Marketing Manager Manuel Mota provides even more information about Synopsys’ DesignWare Die-to-Die PHY IP solutions for SerDes-based 112G USR/XSR and parallel-based HBI interfaces in the video above.

DesignWare Die-to-Die PHY IP offers a number of benefits:

  • Enables ultra- and extra-short reach connectivity in large MCM designs
  • Delivers less than 1pJ/bit for optimal energy efficiency in hyperscale data centers
  • Enables, via compact analog front-end, reliable links up to 50 millimeters for large MCM designs
  • Provides, via flexible architecture, partitioning of the core logic across multiple dies with extremely low latency and bit error rate
  • Combined with the DesignWare 112G/56G Ethernet, HBM2/2E, DDR5/4, and PCI Express 5.0 IP, provides a comprehensive solution for HPC and networking SoCs

Overall, increasing requirements for performance and functionality are forcing designers to split SoCs for hyperscale data center, AI, and networking applications into smaller dies, creating the need for reliable die-to-die PHY IP solutions. Designers now have multiple PHY options to choose from, each with their own characteristics and advantages.

Continue Reading