Designing Thermal Management Solutions for Multi-Die Systems

Synopsys Editorial Staff

May 22, 2023 / 4 min read

With compute demands increasing and Moore’s law waning, integrating multiple dies into a single package to form a multi-die system offers semiconductor companies a more efficient way of meeting aggressive power, performance, area (PPA), and time-to-market requirements. By targeting specific workloads with disaggregated dies, or chiplets, multi-die systems enable designers to scale functionality and rapidly create customized silicon for a wide range of applications including high-performance computing (HPC)automotive, and mobile.

As the broader ecosystem around multi-die systems matures with new electronic design automation (EDA) tools and industry specifications such as UCIe, semiconductor companies are finding greater opportunities for cost-trade-offs as well as success. Despite these positive developments, designers must still contend with many architectural challenges, including thermal limitations. Indeed, chips packed closely together at high integration densities—typically 10,000 to up to one million I/Os per mm2—generate a considerable amount of heat that doesn’t easily or quickly dissipate.

SoC Design Tools Abstract

Preventing Thermal Management Issues

Excessive heat can cause mechanical stress and warping, particularly if temperatures continually exceed optimal ranges. Although heat sinks and other cooling structures dissipate heat, these components also add to device area and cost. A well-planned multi-die system that follows an iterative design process and includes integrated thermal management solutions is a far more efficient way of avoiding potential thermal issues.

In addition, design teams can simulate and analyze thermal behavior in diverse deployment scenarios while implementing architectural revisions to optimize chip layout and improve heat dissipation. Since each die has its own software stack, holistically modeling hardware and software is essential to implementing a design that’s fundamentally robust and thermally sound. Moreover, chip companies must monitor thermal behavior throughout the design process, including during RTL, synthesis, and place and route.

Optimizing heat dissipation and minimizing thermal challenges will ultimately evolve into more of an automated process. Harnessing new EDA tools to build better, faster, and more efficient multi-die systems will become a crucial prerequisite for designers as Dennard scaling fades into the distant past and silicon complexity continues to increase almost exponentially.

Accelerating Multi-Die Systems Adoption

AI-driven EDA and industry collaboration were two key themes highlighted during a recent SNUG Silicon Valley 2023 panel titled “Multi-Die Systems: The Biggest Disruption in Computing for Years.” As panel participants emphasized, the coherent convergence of innovations across the semiconductor industry by EDA, IP, foundry, and OSAT leaders is critical to overcoming thermal design challenges and accelerating the adoption of multi-die systems.

According to John Lee, Ansys GM and VP, packing more and more transistors into multi-die systems creates a steep multi-physics learning curve for designers running thermal simulations. Nevertheless, Lee said the semiconductor world would soon see trillion-transistor chip architectures, with AI and machine learning (ML) helping design teams simulate thermal behavior for over 10 trillion geometries meshed into more than 100 trillion elements. As Lee explained, these advanced designs will “require a village” to architect and test successfully, which is why industry collaboration and additional specifications for multi-die systems are crucial.

François Piednöel, distinguished mSoC chief architect at Mercedes-Benz Research, stated that thermal challenges offer an important opportunity for new multi-die standards and specifications, as well as further cross-industry collaboration. As Piednöel explained, these items are essential for advanced driver-assistance systems (ADAS), which typically include silicon components manufactured by different vendors. A collaborative approach, said Piednöel, will make it significantly easier for automotive design teams to leverage AI-driven EDA and holistically analyze the thermal behavior of multi-die systems and systems on chips (SoCs) throughout an entire vehicle.

SNUG 2023 Day 2

Javier De La Cruz, fellow and senior director of system integration at Arm, expressed similar sentiments. According to Cruz, multi-die systems create new variations of traditional design issues such as thermal limitations that require cross-industry collaboration and partnership. Cruz also pointed out that while thermal behavior may be somewhat easier for AI-driven EDA to analyze on multi-die systems with 2.5D packaging, 3D stacking introduces additional temperature variations that can be challenging to simulate and model.

Henry Sheng, Synopsys R&D group director, noted that engineers can overcome various multi-die systems design challenges like thermal issues with new EDA tools, additional standards, and wider cooperation across the semiconductor world. According to Sheng, multi-die systems share a host of interdependencies that designers can’t efficiently or cost-effectively resolve in isolation. That’s why multi-die systems require a holistic approach spanning EDA, IP, foundries, and OSATs.

Sheng highlighted the UCIe specification as an example of industry collaboration that can help further accelerate the adoption of multi-die systems. Introduced in March 2022, UCIe standardizes multi-die systems by streamlining interoperability between dies on different process technologies from various suppliers. The specification currently supports 2D, 2.5D, and bridge packages, with 3D packaging support expected in future iterations. UCIe, which supports up to 32 Gbps of bandwidth per pin, is the only standard with a complete stack for die-to-die interfaces.

2023: An Inflection Point for Multi-Die Systems

Companies such as AMD, Apple, Amazon Web Services, and Intel already have multi-die systems designs on the market, while other key industry players are making serious inroads in the space. Design starts for multi-die systems are anticipated to grow significantly over the next several years—and 2023 is widely considered an inflection point for this versatile chip architecture.

Given their functional integration, and scaling advantages, multi-die systems are rolling out across all application segments, including HPC, mobile, and automotive. According to Synopsys president and COO Sassine Ghazi, Synopsys is currently tracking more than 100 unique systems designed around multi-die system. “We’re seeing [innovation] across the chain—from architecture all the way to manufacturing—and collaborating to optimize the whole technology stack so multi-die systems can reach scale across global markets.”

Continue Reading