From Silicon To Software

 

Integration Challenges for Multi-Billion-Gate ASICs: Part 1 – Clock Domain Crossing

ASIC design flow

By Rimpy Chugh, Sr. Product Marketing Manager, Synopsys Silicon Realization Group 

Clock Domain Crossing (CDC) Errors Can Break Your ASIC!

Driven by multiple third-party IP blocks, external interfaces, and variable frequency power saving functions, today’s multi-billion gate ASICs have dozens or sometimes even hundreds of asynchronous clock domains. Conventional RTL simulation is not designed to verify metastability effects which cause data transfer issues across asynchronous clock boundaries. Static timing analysis (STA) does not address asynchronous clock domains issues. As such, it’s difficult to rely on both of these methods.

CDC is a well-documented and understood problem for digital designers, essentially arising from the four common clock domain crossing scenarios below. Meta-stability arising from jitter between asynchronous clock domains can result in functional failures if the appropriate clock synchronizers are not present. What’s more, there are more complex paths and scenarios that can be buried deep into the design, such as the re-convergence problem where logic that combines multiple synchronized paths together can result in a timing mismatch due to synchronizer uncertainty. It is a class of bug that typically cannot be worked around in the final silicon, so getting it wrong can cost you a chip re-spin. An expensive mistake!

Clock domain crossing verification

But why is this well understood problem becoming an increasingly difficult challenge for developers of multi-billion gate ASICs? What can you do to scale-up your CDC verification approach to meet that challenge? CDC verification is a critical signoff criteria for tape-out, but what are the key challenges in achieving this for modern ASIC development teams?

The Turnaround Time Problem

Since CDC-clean is a must-have for release signoff, the time and effort consumed to achieve this is non-trivial and must be fully accounted for in the product development lifecycle. Of course, the effort to achieve this will scale with design size. A modern multi-billion-gate ASIC, with hundreds of clocks and potentially millions of CDC clock crossings, might take days of compute time and require terabytes of memory to run a full-chip, flat-level CDC analysis. Turnaround time can be a significant issue here. What can be done about that?

Firstly, as with everything in the development of large ASICs, you need to take a divide-and-conquer approach. A hierarchical, bottom-up approach allows you to run CDC analysis one block at a time, just as you would for synthesis and static timing analysis. This way, CDC analysis can shift-left to earlier stages of the development flow, with an iterative approach to cleaning CDC issues as you go, block by block, and not leaving CDC analysis to be done just before release when fixing CDC bugs can be costly and disruptive. Then, as you move up to the next level of hierarchy, you can substitute the cleaned blocks with abstract CDC models, containing only the clock paths that are relevant to the next level of integration, and abstracting away all internal-only clock crossing paths. When that sub-system is CDC-clean, do it again at the next level of hierarchy, and so on. Synopsys VC SpyGlass® CDC supports an efficient hierarchical approach with the CDC signoff abstraction model (SAM) flow, which can yield 3x or greater turnaround time improvements with multi-factor reductions in memory requirements with no degradation of quality of results (QoR).

The White Noise Problem

We’ve talked about the compute time cost, but what about the human engineering cost? The next significant challenge with CDC analysis is the violations white noise problem. When you have millions of CDC crossing paths in your design, the volume of violations can be overwhelming and identifying the real problems can be a bit like looking for a needle in a haystack. The problem, of course, is that you might miss an important violation that could lead to CDC bug escape. So, this is a genuinely critical concern for developers of large ASICs, and the manual-analysis approach is unreliable. Thankfully, data science comes to the rescue. Machine learning (ML) approaches are well suited to this type of categorization problem and can be used to cluster violations into a manageable number of signatures with common root causes. When you do this, suddenly the violation analysis problem becomes much more tractable as you can immediately see that there are hundreds or thousands of violations attributable to the same issues. In some cases, you may find that >95% of the violations are covered by the top 5 clusters. Fixing those top five issues will dramatically reduce the violation white noise and make it far less likely that you will miss those remaining needles in the haystack.

VC SpyGlass CDC solves this white noise problem by using ML to perform root-cause analysis (RCA) on the violations output data. Not only does this ML RCA approach identify the violation clusters, but it also identifies the possible causes with a debug clue and a root-cause to guide the developer towards the solution. The corrective action could be an RTL change due to missing synchronization, for example, but often it will be a refinement or addition to the CDC constraints file. This process of constraints refinement quickly iterates towards a huge reduction in violations and fast identification of genuine CDC issues that need to be fixed in the design.

Are My Constraints Correct?

Since we are talking about constraints, this is another area requiring care. CDC analysis is a constraint-driven flow. Developers write the constraints and, of course, incorrect constraints could lead to incorrect CDC analysis, with genuine CDC violations being masked by a constraint error. A masked violation could lead to broken silicon. Whenever design workflows require an input constraint file, e.g. in the form of a Synopsys Design Constraints (SDC) file, it is critical to review these important input files for correctness. One way to double-check these constraints is to convert them into dynamic assertions that can be validated in your normal dynamic verification environments such as simulation. This approach gives you an added level of constraints validation.

Are My Waivers Correct?

In addition to constraints, you may have violation waivers. Again, this is typically an input file (manually generated from analysis) to the CDC analysis workflow. Getting the waivers wrong could lead to masking of genuine CDC errors. Even when waivers are initially correct, the designer might need to make a late RTL change or a late netlist ECO, for example to address a functional or a performance issue. When this happens, waivers need to be checked, as a condition that was previously valid may no longer hold true.

More Difficult Convergence Problems

Although most CDC issues can be analyzed statically, there are some cases where a dynamic approach is necessary, such as more complex re-convergence scenarios that occur through much deeper paths within the design. For example, in sequential re-convergence the depth is not defined and issues can exist at any depth.

A good approach for these more problematic cases is simulation with metastability injection. VC SpyGlass CDC generates a CDC database of metastability models that will dynamically inject random jitter at simulation runtime based on configurable probabilities.  Synopsys VCS® simulation natively reads the DB at runtime. Failures can be debugged in Synopsys Verdi® automated debug, where metastability-injected signals can be probed and a coverage report guides the user to the signals where CDC was monitored and reports on how many jitter insertions happened.

Dealing with Third-Party IP Blocks

As mentioned, most multi-billion gate ASICs will be constructed from many third-party IP blocks. How do you deal with CDC in the case of these IPs? What approach should you take? Has your supplier given you CDC-clean IP?

They may have provided you with CDC constraints that can be integrated into your flow but may not have provided you with a signoff abstraction model. In this case, one solution is to create a wrapper level around the IP block and use that to generate the SAM that will flow into your hierarchical approach. You certainly don’t want to run flat CDC across all of the third-party IP blocks that make up your ASIC.

The Debug Productivity Problem

As with all verification workflows, the effectiveness of your debug tools will greatly affect your productivity. In the case of CDC debug, a combination of good schematic visualization with waveform analysis is the most effective solution. Moreover, you want this debug environment to be familiar and consistent across multiple verification platforms. The Verdi debug solution provides the consistency for cross-platform standardization that leads to high debug productivity for CDC.

How to Handle MBIST?

One final challenge to consider is how to handle MBIST insertion. Usually done towards the end of the product development lifecycle, MBIST can account for around 3% of the total logic in your final design. It’s no surprise that MBIST insertion can lead to a large uptick in CDC crossings for your design. This must not be forgotten when looking at CDC signoff for final release.

There is a pragmatic solution to this problem. Iterate your design towards CDC-clean pre-MBIST, insert MBIST, and then iterate towards CDC-clean again post insertion. Treating the MBIST clock crossing paths separately ensures that this additional problem is contained and tractable.

A Static Solution: VC SpyGlass CDC

VC SpyGlass CDC provides a comprehensive CDC signoff methodology with scalable performance and capacity and high debug productivity.

It is one of several static analysis solutions from Synopsys that is integrated into the Synopsys Verification Continuum® platform.  It works natively with other tools such as VCS simulation and delivers a consistent high-productivity debug experience thanks to integration with the Verdi debugger.

 

Clock domain crossing techniques

In Part 2 of Integration Challenges for Multi-Billion Gate ASICs we will talk about reset domain crossing (RDC) challenges, because RDC errors can ALSO break your ASIC. Stay tuned!

In Case You Missed It

Catch up on these verification solutions related blog posts: