AI Hardware Accelerates Innovation 

Stelios Diamantidis

Jan 06, 2021 / 3 min read

While software has long been a driver of innovation in a variety of applications, hardware is fast becoming a core enabler in the artificial intelligence (AI) world. Facial recognition, self-driving cars, virtual assistance, and many others are relying on AI hardware, whose market is projected to reach $65 billion by 2025.

Why has hardware become such a dominant force in the AI space? It comes down to the need for parallelized computing systems, like neural networks, that can process massive amounts of data and train themselves iteratively. This is the design-by-optimization paradigm, and traditional architectures that execute software were not designed for this.

In this environment marked by voluminous amounts of data, hardware systems such as AI accelerators take center stage. As a high-performance parallel computational machine, an AI accelerator is made to efficiently process AI processing workloads, such as neural networks. AI accelerators contribute a number of benefits:

  • Significantly better energy efficiency compared to general-purpose compute machines
  • Low latency of computation to enable real-time applications
  • Scalability to reach a level of performance speed enhancement that can even linearly scale with the number of cores utilized
  • Heterogeneous architecture, which allows a given system to accommodate multiple specialized processors for specific tasks
AI Accelerators

AI Accelerators Support Data Centers and the Edge

AI accelerators operate in two key realms: data centers and the edge. Today’s data centers—particularly hyperscale data centers that may support as many as thousands of physical servers and millions of virtual machines—demand massively scalable compute architectures. This has prompted some in the chip industry to go big in the name of accelerating AI workloads. For example, Cerebras has created the Wafer-Scale Engine (WSE) for its Cerebras CS-1 deep-learning system. At 46,225mm2 with 1.2 trillion transistors and 400,000 AI-optimized cores, the WSE is the biggest chip built so far. By providing more compute, memory, and communication bandwidth, the WSE can support AI research at speeds and scale that were previously impossible. At the other end of the spectrum is the edge, where real estate for hardware is limited and energy efficiency is essential. Here, edge SoCs with AI accelerator IP integrated inside can quickly deliver the intelligence needed to support applications such as interactive programs that run on smartphones or robotics in automated factories. Given the variety of applications where intelligence is at the edge, AI accelerators that support them must be optimized for characteristics such as real-time computational latency, ultra-high energy efficiency, fail-safe operation, and high reliability.

Not every AI application needs a chip as large as the WSE. Other types of hardware AI accelerators include:

Each of these types of chips can be combined by the tens or the hundreds to form larger systems that can process large neural networks. For example, Google’s TPU can be merged in pod configurations that bring more than 100 petaFLOPS of processing power for training neural network models. Megatron, from the Applied Deep Learning Research team at NVIDIA, delivers an 8.3 billion parameter transformer language model with 8-way model parallelism and 64-way data parallelism for natural language processing. Executing this model required development of the NVIDIA A100 GPU, which delivers 312 teraFLOPS of FP16 compute power. Another emerging hardware type is the CGRA, which provides nice tradeoffs between performance/energy efficiency and flexibility for programming different networks.

In this discussion of AI hardware, one cannot neglect the software stack that enables system-level performance and ensures that the AI hardware is fully utilized. Open-source software platforms like TensorFlow provide tools, libraries, and other resources for developers to easily build and deploy machine learning applications. Machine learning compilers, such as Facebook Glow, are emerging to help facilitate connectivity between the high-level software frameworks and different AI accelerators.

Comprehensive AI Design Tools

While hardware has become a critical component in AI applications, designing these components continues to be uniquely challenging, especially as the cloud and the edge push the power, performance, and area (PPA) limits of current silicon technologies. For data centers, hardware designs are marked by multiple levels of physical hierarchy, locally synchronous and globally asynchronous architectures, massive dimensions, and fragmented floorplans. At the edge, AI designs must be able to handle hundreds of design corners, ultra-low power requirements, heterogenous integration, and extreme variability.

By offering the industry’s most comprehensive AI design portfolio, Synopsys can help AI hardware designers overcome some of these challenges. Our products run the gamut from IP for edge devices to the Zebu® Server 4 emulation system for fast bring-up of complex workloads to the Fusion Design Platform for full-flow, AI-enhanced quality-of-results (QoR) and time-to-results (TTR) for IC design. Synopsys has also introduced DSO.ai™ (Design Space Optimization AI), the industry’s first autonomous AI application for chip design. DSO.ai searches for optimization targets in very large solution spaces of chip design. It automates less consequential decisions in design workflows and can, thus, substantially accelerate the design of specialized AI accelerators.

As AI applications become more deeply integrated in our lives, hardware such as AI accelerators will continue to be critical to enable the real-time responses that make intelligent devices and systems valuable.

Continue Reading