Silicon-Proven IP for SoC Design In HPC Applications

Scott Durrant

Aug 16, 2021 / 8 min read

The use of silicon-proven IP is a well-established practice in the world of chip design. In fact, this timesaving and quality-enhancing method to develop complex systems-on-chip (SoCs) is being used in greater volume and across a growing range of applications. Particularly in high-growth and dynamic market segments, IP-based design has proven itself as a means to significantly reduce development time, ensure better quality of results, and enable more efficiency in engineering resources that can instead focus on their unique value-add and differentiation.

Nowhere is this truer than in the fast-paced high-performance computing (HPC) space. The scope of HPC has expanded over the past several years as chip developers find innovative and more efficient ways to pack additional horsepower into smaller, more energy-efficient chips and more interconnected silicon architectures. What used to be the domain of large-scale, super high-end computing use cases, HPC-enabled applications now run the gamut of market sectors, from enterprise to consumer to automotive and even into edge-based applications.

At the core of the HPC market’s growth is the massive and relentless increase in data consumption. While hyperscalers building large data centers to manage this huge increase in digital traffic are the most visible manifestation of this, it’s a trend that permeates across all areas of our hyper-connected society. We see tremendous data traffic growth from online collaboration, smartphones and other IoT devices, video streaming, augmented and virtual reality (AR/VR) applications, and connected artificial intelligence (AI) devices.

Chip developers of all types—including the companies directly providing the data center resources—are driving new chip architectures for these data-intensive needs. The most obvious needs are in the traditional critical areas of compute, storage, and networking which must scale to unprecedented levels. On top of that, data consumption is pushing the need for innovative approaches in other emerging areas. For example, the expansion of cloud services to the edge of the network requires new compute and storage models. The same goes for the broad deployment of AI for processing and extraction of insights from extreme quantities of data, a trend similarly pushing the envelope of chip performance, capacity, and interconnect. In addition, as machine-to-machine communication, streaming video, AR and VR, and other applications generate increasing amounts of data, the entire cloud infrastructure must be re-thought.

All of this is driving a new generation of approaches to simultaneously minimize data movement and maximize the speed at which data is transferred from one location to another, whether that data transfer is across long distances or from one chip to another within a server.

In all cases, SoC and system developers are looking to proven, scalable, and quick-to-integrate IP to enable the key attributes they need to manage the processing, networking, storage, and security needs of data inherent in state-of-the-art HPC applications. Performance is the underlying must-have, and designers building SoCs for HPC applications need a combination of high-performance and low-latency IP solutions to help deliver the total system throughput that provides the benefits of HPC to many different application areas.

Let’s look at some of the key functions that IP can help enable in the world of HPC.

HPC Data Center

Compute Processing—Especially in Data Servers

As data volumes increase, there is an insatiable need for faster server interfaces to move data within and between servers. Any impediment or less-than-optimal connectivity creates unacceptable latency that has numerous downstream consequences to users. Minimizing data movement as much as possible and providing high-bandwidth, low-latency interfaces for moving data when required are key to maximizing performance and minimizing both latency and power consumption.

Some of the important areas where IP can be leveraged to meet these requirements include:

  • Implementing DDR5 interfaces that are moving to 6400 Mbps
  • Supporting the evolution of PCI Express (PCIe) interfaces as they move from PCIe 4.0 at 16 GT/s to PCIe 5.0 at 32 GT/s and PCIe 6.0 at 64 GT/s
  • Streamlining the development of NVMe SSDs moving from PCIe 3.0 to PCIe 5.0 for a 4x bandwidth improvement
  • Utilizing Compute Express Link (CXL) to reduce the amount of data movement required in a system by allowing multiple processors/accelerators to share data and memory efficiently
  • Leveraging new high-speed SerDes technology at 56 Gbps and 112 Gbps that use PAM4 encoding and supporting protocols to enable faster interfaces between devices, including die, chips, accelerators, and backplanes

In addition to the interfaces listed above, various types of memory can meet the capacity, power, and performance requirements of use cases. Where memory capacity is the primary concern, DDR5 is the logical memory choice. When memory bandwidth is the most important factor, HBM2E provides high-speed access to data in memory.

Networking Infrastructure

The growth in data creation and consumption is generating demand for faster network speeds. Ethernet has become the de facto standard for server-to-server communication in modern HPC applications, notably data centers. Ethernet data frames travel through the server units over various channels and media types. Integrating the MAC and PHY in an Ethernet system reduces design turnaround time and offers differentiated performance. Many data centers are increasing network interface speeds from the server to the top-of-rack (ToR) switch from 25 GbE to 100 GbE. 400-GbE infrastructure is being installed from the ToR switch to leaf and spine switches, and between data center facilities.

Leading Ethernet switch vendors are already developing 800-Gbps switches based on 112G SerDes, and 1.6-Tbps Ethernet will likely be introduced within the next few years as data volume continues to increase. Infrastructure switches supporting 400-Gbps Ethernet ports can be implemented with 56G x 8 or 112G x 4 SerDes electrical interfaces.

All of this is driving the need for reliable IP to improve designer efficiency. To that end, Synopsys provides a complete 200G/400G and 800G Ethernet controller and PHY IP solution that includes the PCS, PMD, PMA, and auto negotiation functionalities.

Security

In addition to faster interfaces and more efficient memories, protecting data is critical for cloud computing. To accurately protect the confidentiality, integrity, and availability of data to authorized users, standards organizations are incorporating security requirements into data interface protocols. Implementing the requisite security algorithms in these high-speed interfaces requires high-quality cryptography IP for data encryption and decryption, security protocol accelerator IP to implement high-speed secure protocols, and trusted execution environments to provide root of trust and secure key management.

Synopsys offers a broad portfolio of highly integrated security IP solutions that use a common set of standards-based building blocks and security concepts to enable the most efficient silicon design and highest levels of security in HPC applications. These security IP solutions help prevent a wide range of evolving threats in connected devices such as theft, tampering, side channels attacks, malware, and data breaches.

Storage

Recent advances in the storage industry are facilitating the management of growing amounts of data, as well as the use of accelerators to process data. These advances include the use of computational storage, persistent memory, cache coherent interfaces to persistent storage, and next-generation NVMe interfaces for higher data transfer speeds.

NVMe-based solid-state drives (SSDs) can utilize a PCIe interface to directly connect to the server CPU and function as a cache accelerator allowing frequently accessed data, or “hot” data, to be cached extremely fast. High-performance PCIe-based NVMe SSDs with extremely efficient input/output operation and low-read latency improve server efficiency and prevent having to access data through an external storage device.

The use of cache coherent interfaces in storage applications improves performance and reduces data movement by enabling multiple devices to maintain cache coherency across shared memory. CXL is one such interface. Built on PCIe 5.0, CXL 2.0 provides data transfer at 32 GT/s for cache, memory, and I/O devices. NVMe storage devices are adopting PCIe 5.0 interfaces to increase SSD throughput to 4 GB/s per PCIe lane. This is a 4x speed increase from PCIe 3.0, which has typically been implemented in x86 servers to date.

The Synopsys DesignWare Interface IP portfolio supports high-speed protocols such as PCIe, USB, and DDR and is optimized to help designers meet their high-throughput, low-power, and low-latency connectivity for cloud computing storage applications.

Visual Computing

Cloud applications have evolved to include more visual content. As a result, support for visual computing, including streaming video for business applications (e.g., online collaboration) and entertainment (e.g., movies and AR/VR), and image analysis (e.g., ADAS, security, and other applications that require real-time image recognition), has emerged as an additional function of cloud infrastructure. The proliferation of visual computing as a cloud service has led to the integration of high-performance GPUs into cloud servers, connected to the host CPU infrastructure via high-speed accelerator interfaces.

Embedded vision applications typically introduce unique challenges related to power and cost compared to data center use cases. And they need a degree of flexibility to meet the demands of rapidly evolving markets, use cases, and standards.

Synopsys offers a suite of fully programmable and configurable IP cores that are optimized for embedded vision applications, combining the flexibility of software solutions with the low cost and low power consumption of hardware. The embedded vision processors integrate an optional high-performance deep neural network (DNN) accelerator for fast, accurate execution of convolutional neural networks (CNNs) or recurrent neural networks (RNNs). The vision processors can easily be integrated into an SoC platform and can be used with any host processors and operate in parallel with the host.

Edge Infrastructure

A growing trend in HPC is to move compute closer to the data and where it is collected. This not only improves efficiency and performance, but also provides security benefits as well. To this end, cloud service providers are partnering with telecommunications companies to deliver cloud services in multi-access edge compute (MEC) platforms. However, deployment of cloud services in the edge infrastructure requires that the equipment on which the cloud services are running be tolerant of the edge environment. The edge does not necessarily have the same physical space, environmental controls, or power availability as a typical cloud data center. As a result, the lower the allowable latency, the closer to the edge a service will need to be deployed and the lower its allowable power consumption may be.

Synopsys IP products for the critical computer, networking, storage and memory functions described above are optimized for the performance, low latency, and low-power requirements of edge applications.

Machine Learning and AI Accelerators

AI accelerators designed for edge computing are intended to minimize latency and power consumption while providing the high compute capabilities required for artificial intelligence, including massive matrix multiplication and tensor operations. Performance per watt (TOPS/W) is a metric often used to benchmark these solutions that utilize a variety of memory configurations including on-chip SRAM, LPDDR, DDR, and HBM to support the processing elements.

PCIe and CXL provide high bandwidth channels to connect accelerators to application processors to offload high-intensity compute operations. For both edge and cloud systems, processing loads often require more performance than can be delivered by a single accelerator die, requiring multiple accelerators to be installed. High-speed chip-to-chip interfaces such as 56-Gbps/112-Gbps SerDes and HBI parallel interface enable AI accelerator solution scaling to address these needs.

The Open Compute Project (OCP) organization is driving an effort to standardize chip-to-chip interconnects, which could simplify and improve interoperability of the accelerator scaling interface. The intent is to provide a common interface for heterogeneous “chiplets” to enable development of SoCs using common functional building blocks. If adopted by the industry, this effort will streamline the process of developing accelerator SoCs by reducing development time and associated costs.

IP-Enabled Design Meets the Demands of HPC Applications

A robust and proven offering of processing, interface, and foundation IP is optimized for high performance, low latency, and low power across any HPC application or requirement. Synopsys provides a comprehensive portfolio of high-quality, silicon-proven IP that enables designers to develop SoCs supporting today’s and tomorrow’s HPC applications.

Continue Reading