Faster Chiplet & SoC Design: Collaboration with Arm 

Michael Posner, Neel Desai

May 25, 2021 / 4 min read

Chiplets are fast becoming the answer to cost-effectively deliver the high transistor counts at smaller geometries demanded by burgeoning applications like artificial intelligence (AI), cloud and edge computing, high-performance computing (HPC), and 5G infrastructure. By combining multiple, single silicon dies onto one package, chiplets provide another way to extend Moore’s Law while enabling product modularity and optimization of process node selection based on function. However, meeting power, performance, and area (PPA) targets for chiplets as well as larger, faster, and more complex SoCs continues to be a race as designers strive to achieve increasingly stringent time-to-market goals.

The Arm® Neoverse™ V1 and N2 platforms provide an answer for these designers, supporting the workloads, performance demands, and power efficiency requirements for chiplets and SoCs used in high-performance and exascale computing. And thanks to a long-standing, strategic collaboration between Arm and Synopsys, designers working with these high-performance processor platforms are equipped to achieve PPA targets while shortening time to tapeout from weeks to days. As part of the collaboration, Synopsys provides optimized design, verification, silicon IP, software security, and quality solutions, along with reference flows, for Arm-based SoCs, including the Neoverse™ V1 and N2 platforms.

Armed for High-Performance, Hyperscale Computing Demands

The Arm Neoverse V1 platform features a CPU microarchitecture that’s suited for HPC-type workloads. The first Arm-designed core to support scalable vector extension (SVE), Neoverse V1 provides 50% instructions per cycle (IPC) improvement compared to the previous generation Neoverse N1, enabling excellent speedups for HPC and machine-learning workloads. The Arm Neoverse N2 platform, the first based on Armv9, provides 40% IPC improvement over N1 at similar power efficiency, making it ideal for 5G and scale-out cloud deployments.

Generating the maximum performance from these Arm cores calls for an integrated, convergent RTL-to-GDSII design flow. The design flow from Synopsys, tuned to the needs of the Arm Neoverse platforms, fits the bill. Central to the flow is the Synopsys Fusion Compiler digital implementation solution, whose RTL-to-GDSII architecture is based on a single database and a single data model. Different algorithms come into play at different points in the process to optimize for PPA based on each unique end application. However, all the solutions in the flow, from synthesis to implementation to signoff, can talk to each other through a common infrastructure. Tightly integrated synthesis provides full-flow correlated design exploration, so designers can converge quickly on their optimal SoC architecture. All of this provides the head start needed to reach the target PPA sooner than with competitive flows. In addition, the platform features multi-objective optimization technology built around golden signoff engines that allows you to optimize all the PPA targets concurrently, also contributing to the efficiency gain. With today’s processor cores featuring 4-GHz+ frequencies and being up to 2x larger than those of the previous generation, the higher throughput, scalability, and productivity that a common data model brings are beneficial.

Tightly integrated synthesis provides full-flow correlated design exploration, so designers can converge quickly on their optimal SoC architecture.

In an example from Arm, consider these achievements from a Neoverse N1 with 2.1 million instances at the 7nm process using Fusion Compiler:

  • 8x faster time to results
  • 4% higher frequency
  • 8% smaller area

Arm and Synopsys typically begin working on core-specific reference flows very early in the Arm core development cycle. Thanks to the early start, Synopsys engineers can develop new tool capabilities and methodologies while the core is in development. The unified architecture provided by Synopsys allows development of targeted capabilities in a seamless fashion to meet performance and performance/watt requirements. Once the core is released to early adopters, we have a solution that includes recipes and methodologies to help customers achieve optimal PPA on these cores quickly. These optimized reference flows are available via our QuickStart Implementation Kits (QIKs).

The Synopsys full-flow solutions included in this collaboration are:

Mitigating Performance Bottlenecks and Design Risks with IP

Getting the highest performance out of the processor is only part of the challenge. After all, if you can’t get data on and off across the interconnect, then the overall system design fails. This is where the broad Synopsys IP portfolio comes into play. On the IP side, our deep collaboration with Arm includes Arm’s CMN-700 coherent mesh interconnect and our DesignWare Interface IP. We continue to incorporate capabilities in our IP to facilitate the highest performance functionalities. In addition, we strive to ensure that our IP solutions support the latest specifications, such as die-to-die, PCI Express® 5.0 and 6.0, DDR5, and CXL. For example, the integration between CMN-700 and Synopsys DesignWare DDR5 Controller offers end-to-end quality of service (QoS), allowing efficient data flow for optimal performance and quality of silicon.

Arm provides customers with Neoverse reference design material, along with end-to-end testing to ensure interoperability and compliance at the system level. Synopsys ensures interoperability and compliance with protocol specifications, but, most importantly, our IP maximizes performance. For example, our IP supports PCIe 6.0 at 64 GT/s, DDR5 at 6400 Mb/s, and die-to-die at 112 Gb/s. Together, these efforts help mitigate any performance bottlenecks and lower customer risks while making the final solution easier to use.

IP blocks also play an essential role in supporting chiplet architectures. Die-to-die connectivity provides the connections between chiplets in advanced multi-die chips. These disaggregated chips rely on ultra- and extra-short reach (USR/XSR) or high-bandwidth interconnect (HBI) links for inter-die connectivity with high data rates in any packaging technology. The die-to-die links in modern processing chips often need to support very high bandwidth. Die-to-Die PHY IP for SerDes-based 112G USR/XSR and parallel-based HBI interfaces can support the connectivity demands for high edge efficiency (amount of die edge consumed for a given bandwidth), as well as ensure reliable, power-efficient links with very low latency.

In addition, DesignWare IP is designed with capabilities specific to Neoverse use cases, so that designers can tap into the full performance of their interfaces through the Arm subsystem. For example, the Die-to-Die IP solution for XSR links implements an optimized interface to CMN-700, enabling a very low-latency CXL/CCIX link between two Neoverse interconnects located in separate dies.

Delivering the Bandwidth and Performance for Today’s Demands

Self-driving cars, social media platforms, cryptocurrencies, and automated factories are just a few examples of the many compute-intensive applications that are placing greater demands on SoCs in terms of increased workload and the need to move data faster. In this environment, chiplets have emerged as an answer. Designing chiplets and other complex SoCs to meet aggressive PPA and time-to-market targets calls for processor platforms, a design and verification flow, and IP that are specially tuned to delivering on these demands. Through our close collaboration, Synopsys and Arm provide the technologies to accomplish this. With Synopsys design, verification, and IP solutions optimized for Arm processor cores, you get the bandwidth and performance needed for hyperscale computing applications to thrive.

Continue Reading