How Cameras Use AI & Neural Network Image Processing 

Ron Lowman

Jun 29, 2022 / 5 min read

Cameras help keep us connected with each other, maintain our safety in our cars and inside buildings, and even help to enhance the quality and throughput of the goods that come out of our automated factories. With today’s digital cameras, it is typical for camera manufacturers and enthusiasts to seek out lenses that offer the highest resolution images possible. AI algorithms, however, are changing the landscape, providing a new, and less costly, way to greatly enhance the clarity, sharpness, and overall quality of images captured. This applies to every camera, from the camera you take on vacation, to the cameras used in cars, drones, surveillance, and even doorbells and ATMs.

With cameras being embedded into more electronic systems, the emergence of AI in cameras is welcome news. In this blog post, I’ll examine the various neural networks used in camera applications, the balancing act between camera lens choice and neural networks implemented, and how IP and embedded vision processors help optimize the designs.

security camera

Neural Networks for Image Processing

Digital imaging technology replaces film with bits and bytes, with image quality measured in terms of the number of pixels. The more of these tiny colored dots in an image, the higher its resolution. Lenses in a traditional camera focus light on film to create the image. In a digital camera, an image sensor, typically either a complementary metal oxide semiconductor (CMOS) or a charge coupled device (CCD), converts light into electrical charges. With a CMOS image sensor, commonly used in smartphones, a color-filter layer provides color, while photodiodes convert the light into electrical signals that ultimately form the digital image. For some applications, like artificial vision and image recognition, a CMOS sensor works with an on-chip image processing circuit to produce the visual. With CCD image sensors, which are popular in machine-vision systems, the CCDs are transistorized light sensors on an IC that integrate the light they receive, converting the electrons into electrical signals that the camera ultimately outputs into video or still image formats.

The Vienna University of Technology (TU Wien) has developed an ultra-fast image sensor with a built-in neural network that can be trained to recognize certain objects in nanoseconds. Without requiring a computer to read and process the entirety of the image data, this chip, according to its creators, has potential in scientific experiments or other specialized applications. For now, though, neural networks are typically run on embedded vision processors or neural processing units (NPUs) to perform functions such as image quality improvement and object, people, or facial identification, with some advanced region-of-interest isolation.

Deep-learning neural networks are used in a variety of applications: speech recognition for smart speakers, facial recognition for mobile devices, and pedestrian detection in autonomous vehicles, to name just a few. Their value lies in their keen ability to identify patterns within data sets, more often even better than humans can. There are a variety of different neural network types that are applicable to camera-based applications, helping to sharpen blurry images, deliver more vivid colors, and clean up pixel bleeding. They can also perform specific tasks, like isolate a region of interest. For example, in surveillance systems, neural networks can build feature maps that highlight the most relevant parts of an image and provide a sharp visual of a person’s face or perform pedestrian counting on an image of, say, a street scene, and not the sky. By processing only the parts of the image that are of interest, the algorithms can help reduce the amount of memory and compute resources needed—a key consideration for edge applications.

A few key neural networks relevant for vision applications include:

  • Convolutional neural networks (CNNs): One of the most popular image processing algorithms, CNNs provide high-accuracy object classification, with relatively smaller datasets to learn from
  • Recurrent neural networks (RNNs): With the ability to learn very complex relationships from data sequences, RNNs model sequences and, in image processing, can be used for image classification
  • Transformers: Initially used in voice and natural language processing, transformers can provide excellent vision results via self-attention and learn more about the image than CNNs; however, they must train on much larger data sets (often in the cloud)

Each type has its pros and cons for camera-based applications. CNNs have proven to be the most production-worthy implementation over the past several years, especially at the edge. Now, proposals that leverage regions of interest using both CNNs and transformers together are obtaining more accurate results.

Balancing Cost and Power in the Quest for Quality

Deciding what type of neural network(s) to use, along with choosing a camera lens, is a complex proposition that calls for balancing some tradeoffs in power and cost. More expensive lenses deliver higher resolutions—today’s smartphones offer over 100MP+ resolutions—providing more accuracy and the ability for machine vision systems to see more at longer distances. This is also where AI can step in. Rather than incorporating the costly 100MP+ lens, designers can opt for a lower cost, lower resolution lens and pair this with a neural network to upscale the images for greater sharpness and clarity.

Determining your neural network and lens options calls for answering a few key questions: What kind of visual acuity does your application require? What types of upscaling are needed to enlarge and/or enhance the quality of your images? Is low power consumption a key criterion for the end application? Trial and error, as well as modeling and simulation, can lead you to answers on the optimal neural network(s) and lens for your end application.

Power estimation for neural networks, which involves understanding how the hardware running them will perform, is not an easy task because there are a lot of configurable pieces. This is where modeling can be used. When modeling for answers, it’s important to model as close to the final product as possible. Some designers will apply a broad range of custom and industry-standard algorithms to get an idea of the breadth and range of power values for their specific hardware.  To better understand system performance, prototyping and emulation technologies can support the modeling exercise. For example, Synopsys ZeBu® emulation system can be used to accurately model power consumption of SoC systems running neural networks.

The complexity of neural networks is increasing, as is the amount of input data to these systems (in the form of pixels and color resolution). To accommodate these demands for higher bandwidth, efficient data transfer, and large compute and memory arrays, Synopsys offers deep expertise in AI designs along with a broad portfolio of underlying technologies, including:

Summary

From smartphones that can capture beautiful portraits to cars that provide a second pair of “eyes” on the road, sharp-eyed surveillance systems, and highly productive automated factories, more applications are making use of cameras. Camera-based applications, in turn, are increasingly tapping into AI algorithms to enhance image quality. A variety of neural networks has emerged to support embedded vision requirements—often generating the high image resolution that high-cost lenses have traditionally provided.

Ultimately, it’s all about finding the right balance of neural network type, camera lenses, sensor and memory interfaces, efficient processors, and other underlying technologies. The optimal mix lets us all enjoy beautiful photographs, safer cars and buildings, and an array of goods produced with high quality and efficiency.

Continue Reading