HOME    COMMUNITY    BLOGS & FORUMS    Breaking The Three Laws
Breaking The Three Laws
 
  • About

    Breaking the Three Laws is dedicated to discussing technically challenging ASIC prototyping problems and sharing solutions.
  • About the Author

    Michael (Mick) Posner joined Synopsys in 1994 and is currently Director of Product Marketing for Synopsys' FPGA-Based Prototyping Solutions. Previously, he has held various product marketing, application consultant and technical marketing manager positions at Synopsys. He holds a Bachelor Degree in Electronic and Computer Engineering from the University of Brighton, England.

Archive for the 'ASIC Verification' Category

Expectation setting for FPGA-based prototyping

Posted by Michael Posner on 8th May 2015

The myths of FPGA-based prototyping are still being proliferated and I have taken on a quest to educate the world on what FPGA-based prototyping really delivers. The latest myth propagation is seen below and was cut from a recent posting on a well-known industry website. You are all smart, you will be able to work out where it comes from. Fun read.

Myths of FPGA-based prototyping being propogated

Charlie hits the nail on the head, you need both speed (for Real World IO and OS boot) along with accuracy.

However the comments around it are what I am talking about, they continue to propagate the myths. The top 3 recapped :-

  • Capacity limited to less than 100 Million ASIC Gates
  • It takes months to get prototype working
  • Limited debug visibility

I’ve busted these myths a number of times but you know what they say, tell’em once, tell’em twice, tell’em three times then hit’em with a stick. (ok so I added the stick bit to the saying but that was to increase the humor level of my blogs as some have complained I’m become far too serious)

Capacity is not limited to 100 Million ASIC gates. Proof point: HAPS-70 scales today to twenty four (24) FPGA’s supporting 288 Million ASIC gates and we even have customers successful in prototyping with more FPGA’s. High performance is maintained with operational performance only limited by pin-multiplexing ratio’s. These large sized prototypes can operate in the 10’s of MHz ranges. If I look into my crystal ball (which is very accurate) I see a vision of a HAPS platform supporting in excess of 1 Billion ASIC gates while maintaining scalability and flexibility to support a complete range of designs.

Time to first prototype does not take months, customer usage results with HAPS ProtoCompiler exhibits proven success in as little as one week. The note in the screen shot above says that RTL needs to be rewritten for FPGA, this is not a true fact. Many transformations for FPGA are fully automated in HAPS ProtoCompiler so you maintain a single “Golden” RTL code base.

Debug capabilities have leapfrogged over the last couple of years with HAPS Deep Trace Debug delivering visibility of 1000’s of debug signal bits across multiple FPGA’s and across multiple system setups. While this is not the same level as you have in a simulator or in an emulator you have to remember that you don’t actually need full visibility as typically the RTL being prototyped is more stable as it has passed the block level tests. And just wait and see what my crystal ball is predicting for debug in the next generation of HAPS systems.

So join with me on the quest to rid the world of the myths of FPGA-based prototyping.

  • Become educated: Read the whitepaper dispelling the myths of FPGA-based prototyping
  • Subscribe to my blog and stay up to date with the latest in FPGA-based prototyping

To SUBSCRIBE use the Subscribe link in the left hand navigation bar.

Another option to subscribe is as follows:

• Go into Outlook

• Right click on “RSS Feeds”

• Click on “Add a new RSS Feed”

• Paste in the following “http://feeds.feedburner.com/synopsysoc/breaking”

• Click on “Accept” or “Yes” or whatever the dialogue box says.

Posted in ASIC Verification, Bug Hunting, Debug, Early Software Development, HW/SW Integration, Man Hours Savings, Performance Optimization, System Validation, Use Modes | No Comments »

Comparison of Prototyping bridge vs. Hybrid Prototypes

Posted by Michael Posner on 1st May 2015

DesignWare Hybrid IP Prototyping Kits

This week Synopsys’ announced the availability of the DesignWare Hybrid IP Prototyping Kits: http://www.synopsys.com/IP/ip-accelerated/Pages/hybrid-ip-prototyping-kits.aspx The Synopsys DesignWare® Hybrid IP Prototyping Kits pre-integrate a Virtualizer™ Development Kit (VDK) and a DesignWare IP Prototyping Kit to accelerate IP prototyping, software development and integration of DesignWare IP in 64-bit ARM®-based designs. Hybrid IP Prototyping Kits enable designers to accelerate hardware/software integration and full system validation, thus reducing the overall product design cycle. The included Linaro® Linux® software stack, reference drivers, and pre-verified DesignWare IP reference design allow users to start implementing and validating IP in an SoC context in minutes.

Now I’ve spoken about Hybrid Prototyping a number of times, the most recent was Valuable Software Driven Validation where I discussed how users are deploying Hybrid Prototyping to accelerate IP validation. The DesignWare Hybrid IP Prototyping Kit of course comes with validated IP, Synopsys does that work, but many IP’s required customized drivers which are application specific. It’s this software development, within the context of a CPU subsystem, which the kit focuses on accelerating.

I was asked a question this week which I think is important to clarify, what is the difference between a prototyping bridge and Hybrid Prototyping. A prototyping bridge is a native PCIe host to prototype physical connection with standard interfaces such as AMBA AXI to connect to the user design under test in the hardware. This is what I have previously called a memory mapped interface. Synopsys provides such a prototyping bridge example, you can find it in SolvNet buried in the HAPS documentation

HAPS Prototyping bridge example on SolvNet

A prototyping bridge like this is good for test cases where you want to stream data to a design under test on the prototyping hardware. You will need to write a customized PCIe driver, which is memory mapped on the host workstation, which you build on top of to create the custom application test code.

HAPS PCIe Prototyping Bridge example

Within the small context of this need the prototyping bridge works well. However its usefulness reduces very quickly due to the following limited capabilities

  • Only provides  1x AXI master and 1x AXI slave interface, what happens if you need more?
  • No support for mixing with other AMBA protocols or sideband signals (interrupt, GPIO, etc.)
  • No control over the AMBA protocol parameters (you get what you get through the PCIe interface)
  • Manually effort to instantiate into design (Might require changes in golden RTL to fix interfaces offered)

I’m not saying there is not a place for this type of prototyping bridge, our own DesignWare USB IP team use such a bridge to enable the standard, off the shelf PCIe based USB host drivers to be tested against the IP. It’s a standalone environment and as the driver is PCIe based it’s not directly reusable when the IP is integrated into a  the end ARM/ARC/MIPS/Tensilica/Other SoC.

Enter Hybrid Prototyping from Synopsys. Hybrid has none of the restrictions as noted above with a prototyping bridge. You can insert multiple transactors into a design, configure them to your direct need in an automated fashion. With Hybrid the software that you are running is the same as the end SoC software, as in in the example of an ARM-based SoC you are executing ARM code.

Hybrid Prototyping design with CPU subsystem and RTL design

The key to Hybrid Prototyping is the transactors. A transactor translate between the Virtual SystemC abstract level across to cycle accurate protocol specific pin level interface. Synopsys delivers off the shelf transactors for the AMBA protocols and more. There are two sides of a transactors, the software side and the RTL hardware side.

HAPS high level view of Hybrid Prototyping transactors

On the software side, the interface which is exposed to the user is the abstracted SystemC level interface, read/write etc. This is what software engineers understand, the transactor looks just like the software that they are used to coding against. All the nitty gritty engineering of the transactors is done by Synopsys so the user can become immediately productive.

HAPS Transactor, Software side

On the hardware side, the interface is RTL, again all the deep protocol stuff is done by Synopsys so the engineers instantiate the transactor into their design just like any other AMBA based design block.

HAPS Transactor Hardware side

The HAPS ProtoCompiler flow automatically understands the transactors and seamlessly connects up the physical interface, UMRBus, with no user intervention.

The Virtualizer environment delivers amazing software debug as well, here is a view of the software debug capabilities which the DesignWare Hybrid IP Prototyping kit delivers.

VPExplorer, amazing software debug

So in a comparison of a prototyping bridge and Hybrid Prototyping, Hybrid wins hands down.

  • Predefined environment, fully supported, configurable, automatic hook-up in user design
  • Runs SoC specific software which can be directly run on the final product
  • Multiple protocols, multiple instances per design
  • Amazing software debug, especially for multi-core

In most cases a Hybrid Prototyping environment can replace the use of a prototyping bridge because it can also be driven via a C/C++ or native TCL interface, just like what you would do with the prototyping bridge. The additional advantage is that the same environment can easily be expanded to a full Hybrid Prototype with Virtual Prototype connection without having to change the design.

Posted in ASIC Verification, Debug, DWC IP Prototyping Kits, Early Software Development, HW/SW Integration, Hybrid Prototyping, In-System Software Validation, IP Validation, Use Modes | No Comments »

SYNOPSYS SETS NEW STANDARDS FOR FPGA-BASED PROTOTYPING WITH COMPLETE PROTOTYPING PLATFORM

Posted by Michael Posner on 23rd April 2015

SYNOPSYS SETS NEW STANDARDS FOR FPGA-BASED PROTOTYPING WITH COMPLETE PROTOTYPING PLATFORM…. Yes, we did this way back in 2010 with the launch of the HAPS-60 complete solution, and then raised the bar in 2012 with the launch of the evolutionary HAPS-70 complete solution. Synopsys HAPS is a proven integrated solution delivering the fastest time to operational prototype, highest system performance, superior debug and advanced capabilities including Hybrid Prototyping and global server farm access.

HAPS Solution

This week I’m feeling feisty so my blog is going to be a little more edgy than normal. The launch of the HAPS-60 series in 2010 delivered the first integrated FPGA-Based prototyping solution with key capabilities such as automated deployment of unique HAPS High Speed Time-Domain Multiplexing (pin-multiplexing) schemes in Synopsys’ Certify. Host connected, globally accessible hardware with the HAPS Universal Multi-Resource Bus, UMRBus, as well as advanced data streaming and platform connectivity. The solution included integrated superior debug visualization for bug hunting. Yes, in 2010, over 5 years ago, Synopsys set the new standard for FPGA-based prototyping with a comprehensive prototyping platform. Since then, our solution has rapidly evolved delivering far greater value.

HAPS-70 hardware, just becuase I like the picture

The HAPS-70 (which, by the way, was selected as Electronic Design http://electronicdesign.com/ “Best of 2012” recipient) with fully integrated HAPS ProtoCompiler, the prototyping implementation environment, accelerated the deployment of prototypes by providing advances in automation including time to first prototyping modes and timing biased partitioning.

HAPS ProtoCompiler, the leading FPGA-based prototyping implementation tool

Synopsys has always been the leader in debug visibility and the HAPS integrated debug capabilities enables at speed debug across multiple-FPGA’s in addition to integration with the leading Synopsys Verdi debug visualization software.

HAPS Debug, superiour debug visualization

The HAPS UMRBus has for multiple generations enabled the hardware to be a globally accessible resource for server farm and multi-user scenarios in addition to enabling data streaming modes and Hybrid Prototyping capabilities.

HAPS UMRBus, global access, farm usage, advanced data streaming modes

At about the same time as the HAPS-70, Synopsys launched the first commercial Hybrid Prototyping solution. HAPS Hybrid Prototyping enables HAPS to be connected with Virtualizer, Virtual Prototype delivering early prototyping capabilities, IP and in context validation scenarios.

HAPS Hybrid Prototyping, accellerating the availability of prototypes

Talking of IP, Synopsys is the leader in interface IP and offers DesignWare IP Prototyping kits for immediate software development and prototyping of key IP titles.

DesignWare IP Prototyping Kits, immediate availability

All this wrapped with the global expert support, eco-system of HAPS Connect partners, professional services. This is how we define a complete solution. What I am trying to illustrate is that Synopsys is now, and will continue to be the technology leader in FPGA-based prototyping. Synopsys continues to invest and the HAPS next generation solution will raise the bar again ensuring that our integrated FPGA-based prototyping products meet your requirements today and way into the future.

Posted in Admin and General, ASIC Verification, Bug Hunting, DWC IP Prototyping Kits, Early Software Development, HW/SW Integration, Hybrid Prototyping, In-System Software Validation, IP Validation, Man Hours Savings, Milestones, Performance Optimization, Real Time Prototyping, Support, System Validation, Use Modes | 1 Comment »

Reduce WNS by up to 60%, sometime more

Posted by Michael Posner on 17th April 2015

Bats with White Nose Syndrome. Please help reduce the spread of this and wash boots, clothes and equipment between caves

The WNS I am talking about is Worst Case Negative Slack and not White Nose Syndrome, a disease in North American bats which, as of 2012, was associated with at least 5.7 million to 6.7 million bat deaths. Please help and stop the spread of this nasty disease. Poor little bats have no defense against it. The WNS I’m going to talk about is Worst Case Negative Slack of a prototyping design, reduce WNS and prototype execution performance increases.

A couple of weeks back I blogged on Timing Biased Partitioning and received a number of follow up questions and comments. This blog is to hopefully answer those and provide more information on the Synopsys capabilities to optimize for the highest system performance on your HAPS-based prototype.

The first question, actually statement was from one of the Synopsys engineers who correctly pointed out that my blog title only covers a fraction of what HAPS ProtoCompiler does in the area of prototype performance optimization. In addition to reducing the number of multi-hop paths during the automated partition stage, ProtoCompiler can also reduce the path length and automatically use a lower pin mux ration on multi-hop paths. The combination of these result in the highest performance prototype. In essence timing biased capabilities cross the partition, system route and system generate stages of the prototyping design flow.

Something that I did not mention in the previous blog was the recommendations for pin mux ratios for optimized performance, so here they are.

  • All paths are not critical
    • Some paths don’t need to be fast
    • False paths and asynchronous clock crossing
    • Slow clocks and debug paths
  • Some paths are just fast, pipeline paths with little logic
  • Don’t use one HAPS HSTDM ratio everywhere
    • Lower ratios on critical paths
    • Higher ratios on non-critical path
    • HAPS ProtoCompiler supports ratios up to 128:1
  • HAPS Hardware Traces are precious
    • High ratios on non-critical paths, frees up traces for critical paths (HAPS flexible interconnect)
  • No cost to mixing ratios with HAPS HSTDM
    • Source sync clock is shared across ratios
    • No overhead of mixing ratios

Much of this is automated in HAPS ProtoCompiler but the 2nd question was why these timing biased capabilities are not default “ON”. The answer is that typically the goal at the start of the project is Time to First Prototype (TTFP), and you sacrifice performance optimization to get a valid solution in the least amount of time. Optimization for performance, while automated, increases the runtime of the tool. The recommendation is that you utilize the HAPS ProtoCompiler TTFP mode to generate a feasible solution and hand this off to your developers. While it might not be performance optimized your developers will thank you as you delivered it very quickly. They can be very productive debugging the initial HW/SW integration, board support software and completing initial OS boot procedures. With your developers busy and happy you have an extra day or so to optimize the platform for performance. Now you turn on timing biased capabilities as you can afford the slightly longer runtime to a feasible solution. This is an iterative process as you play with partition, route and physical interconnect on the HAPS systems.

The results of HAPS ProtoCompiler timing biased capabilities are astonishing and I was able to get my hands on the results of these capabilities from a suite of test designs. This suite of designs consist of real customer designs which we have gather over time (with permission). The goal of this testing was to judge the automated capabilities of the tools.

HAPS & ProtoCompiler test suite of designs for timing bias optimization benchmarks

First the “hop” reduction with multi_hop_path optimization enabled is amazing. It’s hard to see in the picture but all designs yielded multi-hop path reduction with the capability enabled.

HAPS ProtoCompiler multi-hop reduction. Less hops = higher system execution performance

Second, the effect to worse case negative slack showed up to 60% reduction. Reduce WNS and performance is improved !!!!

HAPS ProtoCompiler timing bias optimization WNS reduction yields up to 60% execution performance improvement

The funny thing is that the effect on runtime is not huge so while above we recommend a TTFP flow first and then a timing optimized flow you can be successful in generating a timing optimized solution right out of the starting gate. Well at least a version where you have enabled the capabilities but spend no time analyzing the output. Remember, to get the most out of the HAPS solution you should tailor the HAPS hardware flexible interconnect to the SoC partition needs.

I’ve not had much time for projects recently and the next couple of months are busy, busy, busy with business stuff but I have been making slow progress on my new gaming console in a briefcase. Below you can see pictures of the custom controllers, I had to make them small to ensure they fit inside a briefcase. The second picture is a mock up of the monitor and controllers in the briefcase. You open the case and the monitor pulls up and can be rotated for vertical and horizontal play. The whole system is powered by two 7 ah 12v sealed batteries which based on the current draw should enable 5 hours of play before needing to be charged. There are 912 games installed, all the old school favorites like pacman, donkey kong, street fighter, 1945 etc…

If you like this or other previous posts, send this URL to your friends and tell them to Subscribe to this Blog.To SUBSCRIBE use the Subscribe link in the left hand navigation bar.

Another option to subscribe is as follows:

• Go into Outlook

• Right click on “RSS Feeds”

• Click on “Add a new RSS Feed”

• Paste in the following “http://feeds.feedburner.com/synopsysoc/breaking”

• Click on “Accept” or “Yes” or whatever the dialogue box says.

Mick Built Toys, new gaming console in a briefcase controllers

Mick Built Toys, prototype of monitor and controllers in a briefcase

Posted in ASIC Verification, Early Software Development, HW/SW Integration, In-System Software Validation, Man Hours Savings, Mick's Projects, Milestones, Performance Optimization, Project management, System Validation, Use Modes | No Comments »

Prototyping Over 700 Million ASIC Gates Using Xilinx Virtex-7 2000T FPGAs

Posted by Michael Posner on 10th April 2015

HAPS Super Chain Testing at Synopsys HAPS Lab, 64 FPGA's operating together

You read the title correctly, this blog discusses prototyping over 700 Million ASIC gates using the Xilinx Virtex-7 2000T FPGA’s. To get to this capacity you need to utilize sixty four (64) FPGA’s. The picture above was taken in the Synopsys HAPS lab and shows part of our Super Chain testing. As noted previously, HAPS documented seamless deployable capacity is 288 Million ASIC gates, which is six HAPS-70 (four) FPGA systems chained together, a total of twenty four (24) FPGA’s. However we have customers where this is not enough. The HAPS solution is modular and scalable with base building blocks of x1, x2 and x4 FPGA systems and supported with a HAPS-Aware design tool flow.

The HAPS capabilities and software infrastructure enables the solution to scale with ease but Synopsys does not claim support for capabilities until they have been tested and validated. Once the HAPS systems are configured in the Super Chain they act as one unified massive prototyping system. Thanks to our synchronized clocking capabilities the prototyped design still utilizes the HAPS High Speed Time-Domain Pin Multiplexing, HSTDM, which enables the highest system performance. The above picture super chain models sixty four FPGA’s operating synchronously with HSTDM between all FPGA’s ensuring the highest system performance. That’s over 700 Million ASIC gates (12 million ASIC gates per FPGA)…

While HAPS can scale to these huge systems that does not mean that users of just x1, x2, x4 or x8 FPGA’s do not benefit from this testing. Testing of such large systems ensures that the communication, clock synchronization, HSTDM and other capabilities are bullet proof which benefits the smaller system usage ensuring maximum reliability and uptime when used in server farms or on the user’s desk.

Off subject, while visiting Mountain View CA I noticed that one of our creative R&D engineers had come in over the weekend and decorated their Cube space for Easter

Easter Cube decoration

Absolutely amazing don’t you think! Anyway I introduced myself to the R&D engineer and congratulated them. Apparently they do this once in a while and shared a couple of pictures of their previous cube creations.

Cube decoration Cube2 Cube3 Cube4

Crazy cool right!!! I also think that this R&D engineer might have just a little too much time on their hands. Or maybe they are just like me and maximizes every second or every day. I personally think there is a business here, cubicle decoration in a box…. Would you buy a box of decorations to jazz up your cube?

Posted in ASIC Verification, Debug, Early Software Development, Humor, HW/SW Integration, In-System Software Validation, Man Hours Savings, Project management, Support | No Comments »

Success Prototyping with UltraScale VU440 devices

Posted by Michael Posner on 3rd April 2015

UltraScale based HAPS system operation in Synopsys lab

It’s been a while since Xilinx shipped the first UltraScale VU440 engineering sample devices to Synopsys so I thought it time to deliver a short update on development progress. It might be hard to see in the above but that is a picture of one of the new development HAPS systems for the UltraScale VU440 devices. I say hard to see not only as the picture quality is low but also because we have the system completely configured with intelligent interconnect as part of our stringent characterization and functional validation process.

Each module is individually tested, see picture below as an example, this is the controller module in standalone test. The controller module hosts the HAPS supervisor which controls the system and manages advanced capabilities such as the Universal Multi-Resource Bus, UMRBus for short. Once all individual modules are signed-off they are assembled and the system is validated.

New HAPS Control module in standalone test jig

So far both the Xilinx VU440 devices and the new systems are functioning well. Xilinx has posted an errata on the engineering sample VU440 devices but these issues do not preclude the devices from being useful for system development or actual usage as part of a production prototyping project. All IO’s are operational as well as the transceiver GTH links. We have been filling the devices with high speed toggle designs as part of the performance and power characterization and smaller IP designs for other test purposes so we have not compared the utilization between V7 and UltraScale devices yet. We still predict that the UltraScale VU440 devices will deliver ~26 Million ASIC gate capacity, about 2.2X increase over the V7 2000T devices.

As a teaser for future blogs, the new integrated solution is expected to deliver

  • Highest performance w/superior partitioning & new time domain pin-multiplexing schemes
  • Always available debug with deep trace storage
  • Fastest time-to-first-prototype with HAPS aware prototyping software
  • Rapid Turn-around Time from RTL to Bit file with incremental flows
  • Native integration for regression farm & remote accessibility
  • Both HW and SW tool flow is Modular & scalable to over 24 FPGA’s (Over 600 Million ASIC Gate capacity)
  • Hybrid Prototyping ready

Preserves existing HAPS investment

  • Interoperable with HAPS-70 & HAPS-DX, (mix and match HAPS V7-based systems with UltraScale systems) same form factor, I/O voltages, HT3 connectors, daughter boards, cables

In the coming months I’ll post more information on these new and unique capabilities.

HAPS Integrated solution

Posted in Admin and General, ASIC Verification, Bug Hunting, Daughter Boards, Debug, DWC IP Prototyping Kits, Early Software Development, FPGA-Based Prototyping, HW/SW Integration, Hybrid Prototyping, In-System Software Validation, IP Validation, Man Hours Savings, Milestones, Project management, Real Time Prototyping, System Validation, UltraScale, Use Modes | Comments Off

Automated Timing Biased Partitioning

Posted by Michael Posner on 20th March 2015

I spent a lot of time this week talking about timing biased automated partitioning which is the ability of the ProtoCompiler partition engine to generate a partition and pin multiplexing implementation with the goal of maximizing performance of the final prototype implementation. As it’s getting close to Easter I thought it was fitting to discuss ProtoCompiler’s Multi-Hop optimization algorithm. (Easter, Easter bunny, hop, you see what I did there?)

Firstly what is a Hop you ask? In the image below you can see the ASIC design example with combinatorial logic between two register points

ASIC Design with logic clouds between register points

It’s possible during partitioning for FPGA-based prototyping that these two registers end up in different FPGA’s as seen in the image below.

Example of a single "Hop" between FPGAs on a physical prototype

This is what is known as a single hop. The design has been split up with the starting and ending register points in separate FPGA’s. However, without timing biased capabilities it’s possible that a partition is created that spans multiple FPGA’s. An example of this Multi-Hop is represented in the image below

Example of what a multi-hop is

In our example above FPGA B has basically become a feed through FPGA and while this helps a partition engine get to a partition solution it has a dramatic effect on the overall performance of a prototype platform. To explain the timing impact lets go back to our original ASIC design and review the timing between the register points

Timing delay of this logic in ASIC implementation

In our made up example the combinatorial logic has a timing impact of about 10ns. However in our FPGA partition where you have the critical path split up over multiple FPGA’s timing becomes much more significant. The reason for this is that you typically have to apply pin multiplexing between FPGA’s as you have more signals than you have physical pins (One of the three ASIC Prototyping three laws which are the grounding facts driving this blog) Now if you review the timing impact with a typical pin multiplexing scheme inserted you suddenly see the impact of a Multi-Hop

Tming impact of multi-hops and pin multiplexing

But not to worry if you are using ProtoCompiler as the partition engine is not only fast but it’s timing biased and includes a Multi-Hop optimization algorithm.

When Multi-Hop optimization is enabled ProtoCompiler partition engine will:

  • Focus on reducing the number of multi-hops, with a goal of zero
  • If multi-hops are needed to complete the partition the focus turns to reducing the path length of the multi-hop
  • Avoids pin-multiplexing on multi-hop path
  • If pin-multiplexing is needed the focus turns to using the lowest pin-multiplexing ratio on the multi-hop path
  • Selects pin-multiplexing ratio based on timing slack

Knowing that eliminating multi-hops would lead to higher prototype performance you might think that by default the partition engine should not allow any multi-hops. However multi-hops do play a vital role sometimes in enabling an automated partition solution to be found.

ProtoCompiler’s timing biased multi-hop optimization is making a huge impact on the resulting HAPS prototyping performance. Across a suite of over 40 ASIC designs ProtoCompilers timing biased optimizations improved the clock period by an average of 50ns. HUGE improvement in resulting performance. Across this suite of designs, ProtoCompiler reduced the number of nets that are included in multi-hop paths of length two or greater by up to 80%. For most designs in the suite, the number of paths of length three or greater was reduced to ZERO. Also, for most designs in the suite, the pin multiplexing ratio of the multi-hop path nets required to get feasible automated partition was reduced to one (i.e. no pin-multiplexing required). Fantastic. Not only is ProtoCompiler’s partition engine super-fast running in minutes for multi-million ASIC gate design but the out of the box results are phenomenal.

I’m out on vacation for a week (yes even I need time off once in a while) so no blog next week.

Posted in ASIC Verification, FPGA-Based Prototyping, Tips and Traps | Comments Off

Valuable Software Drive Validation

Posted by Michael Posner on 6th March 2015

Software Driven Verification of a CMOS sensor encoder design using Hybrid Prototyping

Software driven validation is becoming very popular as it enables the same SW code you are developing for the final product to be used to verify the product under development. This has multiple benefits such as reduced verification effort from minimizing duplicated effort to create test scenarios in addition to writing the actual SW code. It also flushes out more bugs as you are running the real SW code, or close to it, to verify the design under test so inherently it’s covering much of the user space. So why is not everyone verifying designs like this?

While you can use co-simulation and run SW code against an RTL model it’s going to be slow as the RTL executes in the simulator magnitudes slower than real hardware. The benefit is that you get great RTL debug in this mode. Emulation helps as it’s magnitudes faster than simulation but what if the design requires a physical hardware driver or the design interfaces requires real world input such as from a sensor? Not to worry, this is exactly what HAPS Hybrid Prototyping was designed for.

Review the picture above. On the left hand side the customer used a Virtual Prototype of a ARM Cortex-A15 to run Linux, drivers and firmware. This Virtual Prototype is executing on Synopsys’ Virtualizer platform. On the right hand side the customer implemented their sensor and encoder subsystem on the HAPS FPGA-based prototyping platform. The physical prototype included the design under test RTL as well as physical daughter boards to enable the real CMOS sensor to be used to “feed” the design.

In the middle is the key to Hybrid Prototyping, the transactor. Transactors translate the high level transactors to real protocol based pin-toggles in the RTL. In this HAPS Hybrid Prototype the software running on Virtualizer instructs the design running on HAPS to grab an image from the CMOS sensor. This real world data is process by the DUT then transferred onto the Virtual prototype, just like it would in a real system for further processing. The output image and any artifacts of the manipulation can be viewed from this host side.

The customer was able to accelerate the verification of the DUT by weeks using Synopsys’ Hybrid Prototyping. The customer continued to use this platform extending the their early software development efforts resulting in greater quality and capabilities in preparation for the test chip. But there is more… The environment was setup is immediately re-usable for other projects and designs under test. If you look at the Virtual side you can see that this subsystem should run all sorts of software code so can be loaded for multiple scenarios across multiple design targets. On the HAPS prototype side you can see that different design under test quickly plug into the standard AMBA bus infrastructure, even larger subsystems. Hybrid Prototyping is supported across all the HAPS hardware, HAPS-70 and HAPS-DX.

There are many off-the-shelf Virtualizer Virtual Development Kits (VDK’s) to start from. In addition on the physical prototype side the complete software infrastructure and transactors come neatly packaged in ProtoCompiler so no need to try and piece together lots of different parts.

HAPS ProtoCompiler. Includes all the software and transactor IP needed to deploy Hybrid Prototyoing for most ARM based designs

Hybrid Prototyping is highly valuable for design verification and system validation in addition to being easy to develop and deploy.

Posted in ASIC Verification, Early Software Development, HW/SW Integration, Hybrid Prototyping, In-System Software Validation, IP Validation, Real Time Prototyping, System Validation, Use Modes | Comments Off

Fight Club: Automated vs. Hand Crafted Pin Multiplexing

Posted by Michael Posner on 27th February 2015

HAPS HT3 connectors and cables

This week a prototyping engineer challenged me that “his” customized and hand crafted pin multiplexing capability was “better” than the HAPS High Speed Time-Domain Multiplexing, HSTDM. My response, “Faster, maybe, better NO”. This blog explains why the HAPS HSTDM capability will beat out a custom coded multiplexing capability hands down every time.

First let’s list of the positives and negatives of a custom coded pin multiplexing capability

Positives

  • It’s tailored to the exact design requirements
  • It’s tuned for a specific set of FPGA pins and inter-FPGA connections to push performance to the limits

Negatives

  • It’s hand crafted meaning effort to develop
  • It has to be manually inserted into the design
  • It breaks if the design requirements change
  • It’s tuned so will need to be customized to different FPGA pins and inter-FPGA connections
  • Might be unreliable leading to mystery ghost bugs to chase down

I am sure the list of negatives is longer but I would bet that you already get the idea. While it’s possible to craft a pin-multiplexing block that eeks out every possible drip of performance the overhead of insertion, modification to different design and hardware requirements and testing makes it inferior to the HAPS HSTDM capability.

HAPS HSTDM

HAPS HSTDM was designed to deliver an automated, cycle accurate, highest performance, reliable, modular and scalable pin multiplexing solution for the HAPS systems. Automated insertion through ProtoCompiler ensures that the usage is as unobtrusive as possible. HAPS HSTDM is tested to run on every qualified IO pin across the HAPS-70 system. It will reliably run on any HAPS platform, we can claim this as the HAPS hardware itself is performance tested as part of the production manufacturing tests ensuring that all systems and interconnects meet the minimum required performance for HSTDM operation. It supports multiple ratios meeting the need of many different design requirements. It is very high performance using the latest differential signaling and training techniques with built in error detection. Just looking at this list it’s clear that HAPS HSTDM has many advantages over custom.

But wait, there is more…. I would challenge that using the flexible capabilities of the HAPS hardware interconnect combined with the HAPS HSTDM capabilities that the overall HAPS prototype will run at a higher system performance. I’ve talked about this capabilities a couple of times. The HAPS systems do not have any dedicated PCB traces between FPGA’s. All interconnect is done via intelligent cabling. This method enables the HAPS hardware to be customized to better match the DUT’s interconnect needs. This means you can create more interconnect density where the DUT needs it. More dense interconnect can help reduce the overall pin multiplexing ratio required resulting in higher performance system operation. Remember your prototype is only as fast as the slowest link.

HAPS Flexible Interconnect for increased performance with HSTDM

This HAPS flexible interconnect combined with the HAPS HSTDM automated and deployed by ProtoCompiler is a very powerful solution and this is why I claim that it’s “better” than a hand crafted scheme.

More HAPS and HSTDM results

Do you agree?

Posted in ASIC Verification, Bug Hunting, Daughter Boards, Debug, Project management, Support, Use Modes | Comments Off

Part Deux: How many ASIC Gates does it take to fill an FPGA?

Posted by Michael Posner on 20th February 2015

Last week’s blog How many ASIC Gates does it take to fill an FPGA? definitely stirred the pot. Part Deux (two?) goes back to basics filling in some gaps and follows up with data supplied by my good friends over at Xilinx.

So first for clarification, apparently not everyone who reads my blog understands what a Xilinx Logic Cell, LC, is or what a Look Up Table, LUT, is. I was going to create a nice set of slides explaining but thanks again to Xilinx, here is one they prepared earlier.  It should be noted that Xilinx provided me with written permission to re-use these as part of my blog. The full (PUBLIC) Xilinx presentation can be found here: http://www.xilinx.com/training/downloads/what-is-the-difference-between-an-fpga-and-an-asic.pptx

The Xilinx Logic Cell basics, a cell including combinatorial logic, arithmetic logic and a register. Your RTL source code is “mapped” into these Logic Cells.

XIlinx Logic Cell

A Logic Cell is the basic building block within the Xilinx devices and there is a *lot* of these per device, 4.4 Million in the new Xilinx Virtex UltraScale VU440. Yes that’s right, they are very, very, very small.

Part of the Logic Cell is a Look Up Table which is used to implement combinatorial logic.

Xilinx LUT

My contact over at Xilinx also provided me the Xilinx used standard calculation of equivalent ASIC gates of the Xilinx devices.

–//– From Xilinx

Here are the basic principles that we’ve tried to stick to when generating ASIC gate count numbers:

  • –       6 to 24 gates per LUT (depending on the number of inputs used)
  • –       RAM bits are equivalent
  • –       Up to 100 ASIC gates per I/O;
  • –       7 gates per register

So for the VU440 here’s how this shakes down:

VU440

LUTs :

  • Minimum    6 x 2,532,960 LUTs = 15,197,760
  • Maximum 24 x 2,532,960 LUTs =  60, 791, 040 ASIC gates

IOs :  100 x 1456 = 145600 ASIC gates

Registers : 7 x 5.065,920 = 35,461,440 ASIC gates

So by this math we could have claimed anywhere from 50,804,800 to 96,398,080 ASIC gate equivalent in the VU440.

–//– End From Xilinx

So great, this passes what I call the basic “Stupid” test, as in there is sound logic and defendable data behind the calculation. It also highlights the large variance in “gate counts” which is significantly influenced by the number of inputs used as part of the look up table calculation. It’s exactly as I stated in last week’s blog, the conversion from ASIC gates to FPGA ASIC gate equivalent is design specific. Some designs will map well, some not so much.

In the material that Xilinx supplied I also spotted this slide which reinforces this point.

ASIC to FPGA mapping considerations

Specifically for Prototypers, the conclusion is very important. The Prototypers goal is to NOT modify the golden RTL source for prototyping otherwise you will not be validating a true representation of the design. Yes, some modifications are needed such as RAM’s, gated clock conversion but these can be verified as identical and are an acceptable trade-off. Outside of this the RTL is not customized for FPGA meaning that many of the dedicated resources cannot be directly mapped to. This leads to inefficient mapping of the RTL code to FPGA and is why a “fudge” factor is required in the ASIC gate equivalent calculations. The restriction is typically the Look Up Tables for combinatorial logic mapping, you run out of those before anything else. (not all the time but most of the time). By being conservative in the ASIC gate count capacity claims the vendor can ensure that expectations are met for a majority of designs

Posted in ASIC Verification, FPGA-Based Prototyping, Man Hours Savings, Milestones, Use Modes | Comments Off