Posted by Hélène Thibiéroz on March 21, 2013
Attractive title, isn’t it? 🙂 I wanted to share with you some benchmarks we conducted with key partners using Discovery-AMS multi-core technology. This feature is available in 2013.3 release for Mixed Signal Verification and allows you to considerably speed up your simulation. Two key advantages of Discovery-AMS are performance and versatility. By combining the efficiency of a Fast-Spice solver with multithreading, we were able to boost performance up to 10X.
We implemented this feature first in CustomSim/Xa then Discovery-AMS. Using a revolutionary technique based on Newton-Raphson, CustomSim/Xa can significantly speed up large partitions on multiple CPUs without loss in accuracy. CustomSim/XA multi-core’s major target is Analog Circuits with Large Partitions and Synchronous Groups.
During our beta testing, XA’s Multi-core technology showed the value in the simulation like IREM, high accuracy full-chip simulation and final full-chip verification. Those simulations usually take 2 or 3 days; with multi-core technology many of our beta customers could complete their simulations within 1 day.
Here is for example one comment from our initial beta testers within our Solution designs Group regarding our FastSpice feature for Analog designs (CustomSim/Xa only):
“We use XA only for top level transient analysis. We see that simulation speed up is proportional to the number of cores we are using, eg simulation is ~8 times faster if we use 8 cores. This mean that we are able to simulate top level within one week – without mt CustomSim/Xa simulation is running ~ 30 days. “
In this post, ST Ericsson demonstrates a similar significant performance trend on a PMU (Power Management Unit) while using Discovery-AMS for Mixed Signal verification. I asked Francois Ravatin to share his honest and unbiased opinion (yes, I promise:)) on this new feature.
Francois Ravatin has joined in 1994 Thomson Consumer Electronic Components (ST Microelectronic – Thomson Multimedia joint venture) in digital libraries and tools support team. In 1998 he joined ST Microelectronics as analog mixed-signal designer in Wireless division and then Display division. He moved in 2007 as AMS verification engineer in analog & RF design flow group at ST Ericsson.
Q – Can you describe the results you have seen using Discovery-AMS multi-core technology? Did it further improve your verification flow?
Our Discovery-AMS flow is based on a spice on top netlist. The design under test (DUT) is spice with leaf cells in VHDL-RN for analog IPs and Verilog or VHDL for digital RTL. The stimulus is define in VHDL-RN. The figure1 below describe one of our scenario used in our flow.
Figure 1: spice on top flow description
So stimulus VHDL-RN is driving:
Early this year, Synopsys give us the possibility to test multi-core on Discovery-AMS Real Number flow. I decided to try it on a complex PMU to increase the coverage verification. Our verification process includes different scenarios where we mixed on demand transistor and digital blocks. What we know is that adding more and more sensitive spice Analog IPs as DC-DC increased the runtime therefore our first test was to have all the DC-DC in spice then check the runtime. I did not observe any accuracy degradation and the gain was around 3.5x on #12 cores.
Several tests had been done on this test case with different core numbers. Results are below:
After this first promising result I decided to fix the measuring rod higher by adding the digital core in spice too. Such configuration includes now around 1.3 Millions of transistors for a total of 5 Millions of devices.
The figure below represents CPU time versus transient time.
Figure 2: multi-core vs one core comparison
Yellow/white curve defines the CPU time using one core as the green curve refers to the one using twelve cores.
How to interpret such slope modification? :
Technically speaking high clock frequency starts at 37ms, propagated directly into the digital core. New event as DC-DC power-up appeared also at 38ms. Those additional extra activities can be immediately observed by looking at the different slope values after 37ms and even more at 38ms. Green curve slope did not change in the same range than the yellow one, this show the efficiency of multi core.
At the end, run time in single core was close to 27 days (estimated) and less than 9 days with #12 cores. We observed here again a 3X speed up even if the total number of transistors increased by 2x compared to the first test. Nine days is a very short run time regarding the complexity and the activity of the test case :).
Speed up is directly dependent on two points; this first one is the partitioning and more precisely the number of devices into that large partitions. The second point is link to the test bench itself and how it reacts with those partitions, which means do we have or not activities in such parts!!! Considering these two conditions met, multi-core take all the power so you will probably adopt it as this kind of verification has to be done generally just before the tape out where every day is crucial wins. This improvement of “full spice” simulation (test bench still in digital) open a new area of verifications before tape out but also useful for design debug after silicon validation.
Q- Besides performance, did this feature allow you to further improve your verification methodology?
Yes, in our case multi-core is complementary with VHDL-RN modelling. The goal of models is to speed up simulations to achieve complex sequences with an acceptable run time by using a linear model of switched blocks but, some functionality can be missed as behavioural models may lead to spice misalignment. So there is a need to check the blocks start-up in spice at IC level to require matching the silicon measurement reference, which might not be the case when verification is done at IP level. Multi core first results are promising, we will continue in this way for the future.
Q- Have you tested Discovery-AMS multi-core technology on other circuits? Which performance gain did you see?
Yes, after the PMU this new technology has been tested on other designs type as for example a display which includes a PLL. The gain was even more impressive as we reach a 6.6X speed up improvement on #8 cores (from 32hrs to less than 5hrs). Such number allows us to have several runs into a single working day which is really a key point.
Q- A lot of EDA Fast-spice simulators require extensive and circuit specific settings. Can you comment on ease of use? When testing this feature, did you spend a lot of time setting simulator options?
Clearly it was another point for us to test and this is what we have done on a high sensitive analog design where we have changed our current CustomSim setting (local setting on IPs) by only one command (global setting). Results were really positive as we reach a 2x performance improvement using #8 cores.
Merci Francois !
I hope you enjoyed this post. As usual, comments/questions/feedback are welcome.