HOME    COMMUNITY    BLOGS & FORUMS    Analog Insights: Analog/Mixed-Signal Design and Verification Blog
Analog Insights: Analog/Mixed-Signal Design and Verification Blog

Q&A with STE: How to speed up your Mixed Signal Verification with no accuracy loss??

Posted by Hélène Thibiéroz on March 21st, 2013

Attractive title, isn’t it? :)  I wanted to share with you some benchmarks we conducted with key partners using Discovery-AMS multi-core technology. This feature is available in 2013.3 release for Mixed Signal Verification and allows you to considerably speed up your simulation. Two key advantages of Discovery-AMS are performance and versatility. By combining the efficiency of a Fast-Spice solver with multithreading, we were able to boost performance up to 10X.

We implemented this feature first in CustomSim/Xa then Discovery-AMS. Using a revolutionary technique based on Newton-Raphson, CustomSim/Xa can significantly speed up large partitions on multiple CPUs without loss in accuracy. CustomSim/XA multi-core’s major target is Analog Circuits with Large Partitions and Synchronous Groups.

During our beta testing, XA’s Multi-core technology showed the value in the simulation like IREM, high accuracy full-chip simulation and final full-chip verification. Those simulations usually take 2 or 3 days; with multi-core technology many of our beta customers could complete their simulations within 1 day.

Here is for example one comment from our initial beta testers within our Solution designs Group regarding our FastSpice feature for Analog designs (CustomSim/Xa  only):

“We use XA only for top level transient analysis. We see that simulation speed up is proportional to the number of cores we are using, eg simulation is ~8 times faster if we use 8 cores. This mean that we are able to simulate top level within one week – without mt CustomSim/Xa simulation is running ~ 30 days. “

In this post, ST Ericsson demonstrates a similar significant performance trend on a PMU (Power Management Unit) while using Discovery-AMS for Mixed Signal verification. I asked Francois Ravatin to share his honest and unbiased opinion (yes, I promise:)) on this new feature.

Francois Ravatin has joined in 1994 Thomson Consumer Electronic Components (ST Microelectronic – Thomson Multimedia joint venture) in digital libraries and tools support team. In 1998 he joined ST Microelectronics as analog mixed-signal designer in Wireless division and then Display division. He moved in 2007 as AMS verification engineer in analog & RF design flow group at ST Ericsson.

Q – Can you describe the results you have seen using Discovery-AMS multi-core technology?  Did it further improve your verification flow?

Our Discovery-AMS flow is based on a spice on top netlist. The design under test (DUT) is spice with leaf cells in VHDL-RN for analog IPs and Verilog or VHDL for digital RTL. The stimulus is define in VHDL-RN. The figure1 below describe one of our scenario used in our flow.

Figure 1: spice on top flow description

So stimulus VHDL-RN is driving:

  • All ports of the DUT :
    • All analog and digital signals with sources.
    • Interface : serial links (I2C, SPI), USB.
  • External components values (R, L, C, etc values)
  • Spy analog and digital nets in the DUT to create assertions.

Early this year, Synopsys give us the possibility to test multi-core on Discovery-AMS Real Number flow. I decided to try it on a complex PMU to increase the coverage verification. Our verification process includes different scenarios where we mixed on demand transistor and digital blocks. What we know is that adding more and more sensitive spice Analog IPs as DC-DC increased the runtime therefore our first test was to have all the DC-DC in spice then check the runtime. I did not observed any accuracy degradation and the gain was around 3.5x on #12 cores.

Several tests had been done on this test case with different core numbers. Results are below:

After this first promising result I decided to fix the measuring rod higher by adding the digital core in spice too. Such configuration includes now around 1.3 Millions of transistors for a total of 5 Millions of devices.

The figure below represents CPU time versus transient time.

Figure 2: multi-core vs one core comparison

Yellow/white curve defines the CPU time using one core as the green curve refers to the one using twelve cores.

How to interpret such slope modification? :

Technically speaking high clock frequency starts at 37ms, propagated directly into the digital core. New event as DC-DC power-up appeared also at 38ms. Those additional extra activities can be immediately observed by looking at the different slope values after 37ms and even more at 38ms. Green curve slope did not change in the same range than the yellow one, this show the efficiency of multi core.

At the end, run time in single core was close to 27 days (estimated) and less than 9 days with #12 cores. We observed here again a 3X speed up even if the total number of transistors increased by 2x compared to the first test. Nine days is a very short run time regarding the complexity and the activity of the test case :).

Speed up is directly dependant on two points; this first one is the partitioning and more precisely the number of devices into that large partitions. The second point is link to the test bench itself and how it reacts with those partitions, which means do we have or not activities in such parts!!! Considering these two conditions met, multi-core take all the power so you will probably adopt it as this kind of verification has to be done generally just before the tape out where every day is crucial wins.  This improvement of “full spice” simulation (test bench still in digital) open a new area of verifications before tape out but also useful for design debug after silicon validation.

Q- Besides performance, did this feature allow you to further improve your verification methodology?

Yes, in our case multi-core is complementary with VHDL-RN modelling. The goal of models is to speed up simulations to achieve complex sequences with an acceptable run time by using a linear model of switched blocks but, some functionality can be missed as behavioural models may lead to spice misalignment. So there is a need to check the blocks start-up in spice at IC level to require matching the silicon measurement reference, which might not be the case when verification is done at IP level. Multi core first results are promising, we will continue in this way for the future.

Q- Have you tested Discovery-AMS multi-core technology on other circuits? Which performance gain did you see?

Yes, after the PMU this new technology has been tested on other designs type as for example a display which includes a PLL. The gain was even more impressive as we reach a 6.6X speed up improvement on #8 cores (from 32hrs to less than 5hrs). Such number allows us to have several runs into a single working day which is really a key point.

Q- A lot of EDA Fast-spice simulators require extensive and circuit specific settings. Can you comment on ease of use? When testing this feature, did you spend a lot of time setting simulator options?

Clearly it was another point for us to test and this is what we have done on a high sensitive analog design where we have changed our current CustomSim setting (local setting on IPs) by only one command (global setting). Results were really positive as we reach a 2x performance improvement using #8 cores.

Merci Francois !

I hope you enjoyed this post. As usual, comments/questions/feedback are welcome.

  • Print
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • LinkedIn
  • RSS
  • Twitter

4 Responses to “Q&A with STE: How to speed up your Mixed Signal Verification with no accuracy loss??”

  1. Martin says:

    Hello Hélène,

    Interesting article with the experts of STE. I see you use VHDL-RN (real-numbers) to model the analog blocks by abstracting it what you call “linear model of switched blocks”. Can you be a bit more specific on what this means? real-number modeling has nothing to do with analog, its only discrete-event, and there is no notion of linear behaviour as such.

    Also the improvements you achieve with multi-core, I dpo not understand. Good load-balancing mechanism will distribute multi-user activities over all available cores. In efficient production simulation environments there are no cores “doing nothing”. If your improvements scale linearly over the cores, it looks like you are on an exec-host with is running idle at the other cores?

    Could you elaborate a bit more what the actual Synopsys simulation kernel improvements of XA are? With VHDL-RN and multi-core techniques, it seems XA does not offer much to the equation, it’s more the simplification of the DUT with real-value representations and offloading the problem to the simulation farm.

    Regards,
    Martin

  2. Francois says:

    Hello Martin,

    “linear model” is, in some way, a misuse of language. The goal is, with IC top level verifications, to have simplify models like a PWL source controlled by inputs. The purpose here is not to reproduce in depth behavior of spice cell.

    Regards,
    Francois

  3. Hello Martin,
    To further answer some of your questions, the linear speedup can only be achieved when other cores are not heavily loaded. This multi-core capability offers a way to finish one long simulation faster when you have the computing resource. Without the available computing resource (or the computing resources are heavily loaded), the speedup is certainly limited. I am not sure I however fully understand your question. Are you associating multi-core with distributed computing? Our multi-core is to finish one long simulation faster. From your comment, you seem to refer to the distributed computing used for large simulation tasks (as for example corners or MC), which is a different feature.
    From an algorithm standpoint, this feature is based on a revolutionary technique based on Newton-Raphson, that enables to speed up large partitions on multiple cores without loss in accuracy. I am afraid I can’t say more without getting in trouble :)
    Best,
    Helene

  4. Martin says:

    Hello Helene and Francois,

    Many thanks for the clarifications and explanations.

    @Francois: I fully agree – we misuse the HDL languages here to model analog functions in a non-analog way; its not piece-wise-linear; real-number modeling follows a sample-and-hold regime, which result in strange results at the A/D boundary of mixed-signal/mixed-level designs. I agree we do not what to include spice-level here. But there are smarter techniques available to deal with analog abstractions which maintain some of the signal-flow and/or continuous-time properties of the analog signals.

    @Helene: My comment was discussing multi-core, not distributed computing. Yes, the simulator should indeed prepare and resolve the analog equation system in a different manner to benefit from the multi-core and multi-threading capabilities of the OS. In practice however, the actual load on the machine defines the gain of such approach.

    Regards,
    Martin

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>