Fault Simulation Techniques for Growing Chip Complexity

Brian Davenport, Rimpy Chugh

Jun 01, 2022 / 6 min read

For a team working hard to create a silicon design that meets its target applications’ aggressive specs, the moment of truth often arises during the chip verification process. Specifically, fault simulation is a key verification step when silicon designers get an indication of how resilient their chip might be to faults or errors. This step, however, has become more difficult as chip complexity increases to meet the demands of mission-critical applications such as autonomous vehicles, medical equipment, and military/aerospace systems.

How can fault simulation techniques continue to address the needs of today’s complex chips? And how can they do so in the face of ambitious safety and resiliency goals?

In this blog post, we’ll discuss the challenges of fault simulation, what’s new in the technologies available, why there’s a need to consider functional safety in the equation, and how a unified verification approach presents a good step forward.

Silicon Chip (angle)

Deterring Chip Design Bugs

Fault simulation is often conducted when the design has already gone through tapeout; ideally, not many changes should happen at this point. The desire is to shift the simulation effort left so that engineers can enhance their test patterns when such changes will be most impactful. Effective fault simulation spans three primary phases in the semiconductor lifecycle:

  1. In the development phase, fault simulation should demonstrate and document that design and verification flows are robust. In other words, the technology provides assurance that implementation tools and flows won’t introduce design bugs (systematic faults) and that functional verification tools and flows won’t fail to report design bugs. This also involves ensuring the design verification methodologies are robust enough, thereby providing high confidence for zero defects.
  2. In manufacturing, fault simulation helps to reduce defective parts per million (DPPM) stemming from random faults by observing functional patterns in the design for test.
  3. Finally, in operation, fault simulation demonstrates and documents that safety mechanisms operate properly. Safety mechanisms are triggered in the presence of faulty behavior (not otherwise), and they’re effective in reaching a safe design state.

The fault simulation process looks at all potential failures in the design and determines whether they can be detected. The coverage goal of a design that should be evaluated—the diagnostic coverage (DC) —is based on how safety-critical the design is. For example, an SoC for an advanced driver assistance system (ADAS)—whose reliability has a direct impact on lives—will have a higher coverage goal (perhaps as high as 99%) than an SoC for a pair of wireless earbuds. But what if you can only achieve 97% coverage? How do you close the coverage gap?

Achieving  high diagnostic coverage while determining whether potential faults in a design can be detected is a challenging task. The process requires determining whether  a myriad of testbenches or stimuli to test the design under various scenarios is enough—a seemingly endless proposition without a clear way to determine how valuable each testbench or stimulus is for fault coverage.

What’s more, as chip designs become increasingly complex as well as larger, simulation runs will take longer. Imagine having to simulate up to a few million faults to measure diagnostic coverage! Then there’s the need to ensure functional safety compliance for certain, safety-critical designs. This process has been known to add up to 30% to the entire functional verification effort, of which fault simulation is an integral part.

How Fault Simulation and Functional Verification Work Hand-in-Hand

Functional verification involves testing various functionalities of a chip design to ensure that the design is operating within the target parameters and getting the right results based on these parameters. In other words, is the design behaving as intended? In fault simulation, the question becomes: if I inject an error into the design that causes the design to fail, can I detect whether this will happen or, even better, can my design still behave the way it’s supposed to? Do I have a valid test environment? If a fault occurred, will my design be resilient enough to endure?

Both functional verification and fault simulation have their own coverage metrics. However, in the interest of efficiency, designers typically look to see how they can leverage test mechanisms from functional verification for fault simulation. In both situations, there’s an almost limitless number of tests that can be run to exhaustively verify the design. Of course, “limitless” doesn’t really support time-to-market targets, given how labor-intensive manually writing software test libraries can be. As such, any technologies that automate functional verification and fault simulation can be a significant benefit to design productivity.

Functional safety compliance adds another twist. Safety-critical automotive applications, for instance, adhere to the ISO 26262 functional safety standard. ISO 26262 outlines a risk classification system called Automotive Safety Integrity Levels (ASILs), whose aim is to mitigate potential hazards stemming from the malfunctioning behavior of electrical and electronic (E/E) systems. ASIL D represents the strictest level and is applied to automotive applications such as ADAS. From a fault simulation standpoint, verification engineers require robust diagnostic tests to verify safety mechanisms will adhere to the requirements of ISO 26262 as well as the IEC 61508 industrial safety standard. At more critical levels, such as ASIL D, coverage demands will also be at higher levels and associated safety mechanisms should be more resilient and, hence, reliable.

Accelerating Fault Simulation Through Automation

Ultimately, verification engineers, faced with continual pressure for faster turnaround times, need techniques to minimize the fault injection effort. A uniform platform that extends automated capabilities from functional verification to fault simulation, such as the Synopsys unified functional safety verification platform, could be the answer. The platform consists of:

  • Synopsys VC Z01X™ concurrent fault simulation solution, which injects faults throughout digital automotive devices, simulating effects to facilitate the development of robust diagnostic tests and verify that safety mechanisms comply with fault injection requirements outlined by ISO 26262 and IEC 61508. The Synopsys VC Z01X solution features various reporting mechanisms that help you understand why and where your design has low coverage. Armed with this insight, you’ll have a better sense of whether you need to write new test patterns or make a design change.
  • Synopsys VC Functional Safety Manager, a scalable, automated, and comprehensive Failure Modes and Effect Analysis (FMEA) and Failure Modes, Effects and Diagnostic Analysis (FMEDA) solution.
  • Synopsys VC Formal™ Functional Safety App, which provides comprehensive analysis and debug to quickly identify root causes using the Synopsys Verdi® automated debug system. The Synopsys Verdi debug system reviews schematics and annotates where faults have occurred, helping to make the debug process more efficient.
  • Synopsys TestMAX FuSa functional safety analysis solution, which performs analysis early in the design flow—in RTL or gate-level netlists—to help improve ISO 26262 functional safety metrics.
  • Synopsys PrimeSim™ Continuum unified workflow of next-gen circuit simulation technologies, which provides analog fault injection.
  • Synopsys ZeBu® emulation system, which delivers the speed to facilitate fault emulation.
Unified Fault Campaign Diagram | Synopsys

Synopsys unified functional safety verification platform.

With the platform’s Synopsys VC Z01X component, you can reuse your functional verification testbench for fault simulation, so there’s no need for a separate logic simulation bring-up effort. In addition to accelerating the path to coverage closure, the Synopsys VC Z01X solution also provides a unified fault environment covering simulation, formal, and emulation. The functional safety verification platform provides intelligence on the effectiveness of your testbenches. For example, say you’ve got 100 test cases. After some analysis of the testbench and circuit activity, the platform might determine that only a handful of them provide valuable fault coverage, while the others are essentially wasted simulation cycles. For functional safety compliance, the integrated solution helps to verify that your target ASIL has been achieved.

What’s Next in Fault Simulation Technologies?

Over the years, fault simulation technologies have evolved to provide more effective simulation, from the optimization of cells to speed up the process to more efficient simulation of memories. There’s also a move now toward different fault models. Synopsys technologies inject model-based faults, designed to mimic hardware defects. As chip geometries decrease, new models are emerging that mimic the new effects we are seeing, such as slow vias, bridging, and electromagnetic interference.

Growing chip design complexity will necessitate new ways to use available memory and to accelerate simulation time. Synopsys’ unified functional safety verification platform provides a foundation on which to build. While a critical step, fault simulation alone can’t cover everything, especially when it comes to functional safety. A complete solution that takes advantage of fault simulation along with all the other flavors of verification and debug, from architecture to synthesis to layout, will be essential for the road ahead.

Learn More: New Webinar

Dive deeper into fault simulation by attending our upcoming webinar, “Functional Verification to Fault Simulation: Considerations and Efficient Bring-Up,” at 10a.m. PDT on June 15. Register today!

Continue Reading