Edge AI Chip Design: Meeting PPA Targets for AI SoCs 

Gordon Cooper

Oct 25, 2021 / 4 min read

Thinking about artificial intelligence (AI) commonly conjures up images of robots working on a manufacturing line or in a warehouse. Or big data analytics systems that rapidly generate actionable insights, such as climate change models and healthcare forecasting. But increasingly, AI is making a number of smaller scale devices smarter, too.

AI at the edge, or on-device edge AI, executes AI algorithms directly on devices like drones, smart speakers, augmented reality/virtual reality (AR/VR) headsets, and smartphones. The algorithms use data collected in real time by the sensors inside the devices.

Implementing AI on edge devices calls for SoCs delivering high performance within limited power and area budgets to support real-time, data-intensive computing. It’s a challenging proposition, for sure, but one that deserves attention given the billions of such devices already out in the world and the new ones that are introduced each day. In this blog post, I’ll focus on the needs of embedded vision applications and discuss various techniques to increase bandwidth and reduce latency and power consumption for the processors that play an integral role in edge AI SoCs.

VR Headset

AI Everywhere

According to analysis by Tractica, AI is shifting from the domain of data center and cloud applications to the edge. The market intelligence firm projects that AI edge device shipments will increase from 161.4 million units in 2018 to 2.6 billion units by 2025. Deloitte figures that roughly 750 million edge AI chips were sold in 2020, with the consumer device market representing 90% of the share. In our smart everything era, we are indeed seeing greater intelligence in the things we use each day, from the phones we can unlock with our faces to super-resolution technology for surveillance cameras and multi-function printers to cars that can brake, park and even drive themselves without our intervention.

Young Female Identified by Biometric Facial Recognition Scanning Process from Her Smartphone. Futuristic Concept: Projector Identifies Individual by Illuminating Face by Dots and Scanning with Laser

It’s no wonder that AI processing at the edge is so appealing. For one thing, it requires little investment in infrastructure compared to traditional AI processing that relies on massive cloud-based data centers. With a microprocessor, sensors, and algorithms, a device can almost instantly—and efficiently—process data and make predictions.

AI Processor Options

There are many ways to bring AI to your edge vision-processing application. While a GPU definitely has the horsepower to serve as an AI accelerator, it’s not the most efficient choice for space-constrained and/or battery-operated devices. Another option is a low-cost, off-the-shelf microprocessor, but only if the application does not require a real-time response, as these processors generally don’t provide the speed or latency needed. You can also develop your own processor or license one from an established vendor. The performance you’ll need, along with your available resources, will guide you to the right choice. Will you be able to meet your frame-per-second target at a certain power budget? Do you have the in-house skill set to do so? Does the volume of end devices you’ll produce justify the expense of having your own chip development team? Are you also prepared to invest heavily in software resources to create needed development tools to support your SoC?

Edge AI chips are designed with models that are trained to deliver insights for their particular end applications. Running a neural network (NN) algorithm through your architecture will give you an indication of the processor’s performance. For real-time, embedded systems, you’ll also need to factor in parameters like power, latency, bandwidth, and area for a more realistic benchmarking picture.

Designing your own processor provides the freedom to differentiate, provided you have the team and expertise to do so. However, you need to ensure that your processor can support new AI algorithms as they become available. For very custom requirements, application-specific instruction-set processor (ASIP) tools allow you to create ASIPs to simplify the design process for a specific domain or set of applications.

Licensing processor IP shortens your time-to-market and eliminates the need to invest in a design team. Proven and tested processor IP that can be programmed and configured provides the flexibility to support new AI algorithms as they emerge.

More Bandwidth, Lower Latency and Power

As performance requirements for real-time AI performance increase, so do the challenges of bandwidth, latency, and power. There’s a variety of techniques available to increase bandwidth and reduce latency and power in embedded vision processors. For higher bandwidth, there is a trade-off between on-chip SRAM and use of external memories like LPDDR. Increasing external LPDDRs will increase bandwidth, but also increase power due to greater data movement on the external bus. At higher levels of performance, power becomes an issue. Bandwidth and power can be improved by increasing on-chip SRAM. However, adding more memory to the chip to support increasing amounts of data also brings cost up, along with the area impact. A technique called tiling can reduce memory utilization and power by bringing in segments of a sub-divided image versus the entire image at once for rendering. Compression, which minimizes the image size, also makes more efficient use of memory resources while improving latency. And frame-based partitioning of an image takes advantage of parallel processing to provide throughput and latency benefits.

With our long history of developing flexible, high-performance embedded vision processors, Synopsys can help you meet the bandwidth, power, latency, and time-to-market challenges of your edge AI applications. Synopsys DesignWare® ARC® EV Embedded Vision Processors integrate seamlessly into SoCs and are optimized for embedded vision applications, supporting any neural network graph with real-time performance. The processor IP is fully programmable and configurable and includes an optional high-performance deep neural network (DNN) accelerator for fast, accurate execution of CNNs or recurrent neural networks. Also included are a robust software development toolchain including a neural network compiler.

Summary

Edge devices with capabilities like always-on facial recognition, super resolution, and object detection are becoming ubiquitous in our everyday lives. Given the form factor of these devices, the underlying technologies must deliver high performance within very tight power and area budgets. Embedded vision processor IP can help you address the technology challenges and deliver SoCs that are fueling our smart everything world.

Continue Reading