RTL Debugging via FPGA Prototyping: SoC Design Challenges 

Rob Parris

Oct 06, 2021 / 5 min read

Challenges of RTL Debug Via FPGA Prototyping

In “High Debug Productivity Is the FPGA Prototyping Game Changer: Part 1,” we talked about how the debug capabilities of a modern FPGA prototyping platform like the Synopsys HAPS®-100 prototyping system are transformational, enabling developers to achieve high debug productivity levels previously only achievable with emulation and simulation platforms.

That’s not to say that FPGA debug has the same debug capabilities as simulation and emulation, but that debug on your FPGA prototyping system is now a realistic proposition. No need to spend days and weeks re-creating chopped down testcases that are small enough to replay on an emulator or simulator where debug is generally easier and faster. After all, your FPGA prototyping system validation environment may be unique and not reproduceable on emulation or simulation. Even if you can tolerate the long wait times to run in a slower environment, it may not be feasible to fully replicate the stimulus sequences leading up to a failure, or fast enough to scale up to the sheer number of cycles required to find deeply hidden bugs in the first case. Therefore, debug on a fast FPGA prototyping system is a necessity for fast bug resolution.

In part 2, we consider what the key challenges in performing RTL debug using an FPGA prototyping system are.

Business man, programmer, software engineer working on laptop computer

Challenge #1: Sample Depth of Signals Is Insufficient

When you’re running test payloads at speeds of up to 20-50 MHz for complex SoCs and up to 500 MHz for interface IP, which is realistic with a HAPS-100 prototyping system, a lot can happen in a short space of time! Especially when you are debugging using the at-speed debug capabilities of Synopsys ProtoCompiler® DTD (Deep Trace Debug) to capture a recycling trace buffer that can be used for waveform-based debug analysis. It all comes down to the number of cycles between the timepoint of failure detection and the debug window of interest and whether your trace buffer is deep enough to snapshot the relevant time-slice of interest. If the trace buffer is too shallow, you then refine the triggers to successively move the captured trace window back to earlier time-slices by triggering earlier.

To meet this challenge, HAPS-100 provides 4x the performance and 4x deeper sample queues to trace signals of interest, using on-chip memory, when compared with previous generations.

Challenge #2: Not Knowing How Broad to Instrument Signals When the Design Is Failing for Unknown Reasons

Needless to say, design sizes are growing exponentially. Design complexity is increasing, and debug becomes correspondingly more complex. DTD can get you a long way in debug and will probably meet the needs of most of your debugging scenarios with at-speed trace capture. If you determine that additional signals need to be traced while performing debug with the selected DTD probes, you have the option to reconfigure the probe points, perform a fast incremental rebuild of the relevant FPGA, and run again.

Sometimes, however, you don’t know what portions of the design are active, so full signal visibility is needed. This means you require functionality that dumps the states of every register in the FPGA, with little to no overhead of FPGA resources. You then want to be able stop clocks, download all relevant register data, advance one cycle, then loop N times. FSDB processing needs to be done independently on the host, so that it does not impact runtime performance. Data expansion technology can then map the FPGA registers to the RTL source code and extrapolate all intermediate signal values. Using Synopsys ProtoCompiler GSV (Global Signal Visibility) manages these challenges, so you now have full-visibility RTL debug capability.

Designs also have multiple clocks running at different frequencies and interfaces to the real world.  ProtoCompiler allows you to generate an IICE (intelligent in-circuit emulator) model and debug each clock independently.

Challenge #3: Turnaround Time Between Debug Sessions

FPGAs present a challenge of turnaround time (TAT); i.e., the time taken to re-spin the FPGA to either modify the RTL design or to adjust the instrumentation necessary for debug. Each re-spin can take multiple hours to complete, depending on whether a full rebuild is necessary or a faster incremental compile can be performed.

What’s needed are tool features that help reduce debug TAT by reducing re-spins. For example, the capability to add mux groups, which reduce memory footprint and maximize the number of signals. This means you can create and instrument multiple different mux groups using a single IICE.

Often the debug requirement is for a single FPGA on a multi-FPGA design. In this case, you must be able to instrument a single FPGA for fast TAT, which requires software to be capable of incremental single FPGA instrumentation.

Along with capabilities to address the TAT challenges described above, ProtoCompiler also offers incremental compile, so you can run incremental compile on only a portion of the design, thus improving TAT.

Challenge #4: Ease of Use

As an RTL designer your goal is to verify and debug RTL code using RTL waveform analysis. This is how you operate when executing test cycles using all other verification platforms, be it simulation, formal, or emulation. The same holds true for FPGAs. You do not want to have to debug your design at the FPGA netlist level, especially when the design has been partitioned across multiple FPGAs, with debug instrumentation logic in place and FPGA partitioning structures such as time division multiplexing (TDM) to manage signal bandwidths between partitioned FPGAs. Being able to perform efficient RTL-level debug with waveform analysis is made possible by using the familiar Verdi® debug GUI.

ProtoCompiler and HAPS-100 work together to make design debug easier, and most of this is done under the hood for the user.

Design Debug Process Chart | Synopsys

It is important that a single .fsdb is generated for a partitioned multi-FPGA design. This is required as it elevates debug up to the RTL design level. Users can Instrument their design, using $dumpvars in Unified Compiler. In addition, instrumented signal names are back-annotated from the compiled FPGA netlist to match the original RTL.

Verdi integration provides an automated way for setting up the complete Verdi debug environment for design analysis in the ProtoCompiler flow and provides improved RTL signal correlation for data expansion.

Summary

So, performing RTL design debug on FPGA prototyping systems is a tractable problem after all. It is essential to be able to run enough pre-silicon testing cycles so that bugs which are normally only ever found post-silicon can be avoided. Scaled-out FPGA prototyping platforms can approximate to silicon levels of throughput for the running of system validation test payloads, but you need high debug productivity capabilities to use this as a practical design validation and verification platform. These game-changing debug capabilities have been present in the HAPS prototyping family for many years. With HAPS-100, things recently got even better!

Continue Reading