HOME    COMMUNITY    BLOGS & FORUMS    Breaking The Three Laws
Breaking The Three Laws
 
  • About

    Breaking the Three Laws is dedicated to discussing technically challenging ASIC prototyping problems and sharing solutions.
  • About the Author

    Michael (Mick) Posner joined Synopsys in 1994 and is currently Director of Product Marketing for Synopsys' FPGA-Based Prototyping Solutions. Previously, he has held various product marketing, application consultant and technical marketing manager positions at Synopsys. He holds a Bachelor Degree in Electronic and Computer Engineering from the University of Brighton, England.

Prototyping a PowerVR Series6XT GPU using an optimized flow from Synopsys

Posted by Michael Posner on July 24th, 2015

Block diagram on Imagination PowerVR Series6XT GPU

I ran across this blog on Imaginations website which covers details on prototyping the PowerVR Series6XT on HAPS: http://blog.imgtec.com/powervr/prototyping-a-powervr-series6xt-gpu-using-an-optimized-flow-from-synopsys

I highly recommend reviewing the material as it provides insight into not only how to prototype large GPU’s but also how to quickly scale multi-FPGA prototypes.

Short blog this week as I’m off to do a little camping and when I camp I like to camp in style.

Tepui tent installed on top of my Toyota truck

I love my little retro-style teardrop camper and my tent on top of my truck. Enjoy.

  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn

Posted in Early Software Development, HW/SW Integration, In-System Software Validation, IP Validation, Man Hours Savings, Performance Optimization, Project management, Real Time Prototyping, System Validation, Use Modes | No Comments »

Xilinx UltraScale VU440 based HAPS Solution Shipping (HAPS/ProtoCompiler)

Posted by Michael Posner on July 17th, 2015

HAPS PMM Polishes HAPS Xilinx UltraScale VU440 based system during photo shoot

That’s right, Synopsys’ new FPGA-based prototyping solution of HAPS hardware with integrated ProtoCompiler software is shipping…………… to the lucky early adopters.

This week I popped into one of the Synopsys conference rooms and snapped off a couple of mobile phone pictures during the professional photo shoot of the next generation of HAPS systems utilizing the Xilinx UltraScale VU440 FPGA devices. The systems are fully operational, powered by the built-in HAPS supervisor firmware and fully integrated ProtoCompiler software. Before the systems ship off to early adopter customers we wanted to snap off a couple of pictures for our official launch materials. Above, the HAPS product marketing manager polishes one of the new systems to remove R&D finger prints. (Note the ESD protection before doing this!)

Below, a view of the 4-FPGA and 1-FPGA version, I didn’t like the way the 2-FPGA picture turned out but we offer that form factor as well.

Mick snaps of a mobile phone picture of the new 4-FPGA HAPS Xilinx UltraScale VU440 based systems

Extreme close up of HAPS 1-FPGA Xilinx UltraScale VU440 based system

We also populated a set of 19” server rack frames with systems to capture the look and feel of the HAPS hardware in a server farm remotely managed scenario. We support in excess of 1.5 Billion ASIC gates which is sixty four (64) FPGA’s operating within a system synchronous chain with automated ProtoCompiler design flow. Sadly my mobile phone picture did not look great, really fuzzy, but I’ll see if I can get another picture to post at a later date.

It should be noted that these HAPS system are classed as Pre-Production, Synopsys production candidate hardware with Xilinx Engineering Sample FPGA’s as that is all that is available in respect to the FPGA devices at this time. Full Production systems (Hardware & ProtoCompiler software) will be officially available timed with the production availability of FPGA devices from Xilinx. As you can see through, if you want early access, Synopsys can service your need now with these pre-production systems and early access to the ProtoCompiler software.

As a refresher, the key benefits of the new fully integrated solution include but is not limited to:

  • The fastest time to operational prototype, on average 2 weeks from initial RTL, and the fastest RTL-to-Bit file flow for rapid incremental turn-around
  • Highest performance with an end-to-end timing-driven multi-FPGA flow & new high-speed TDM-based pin-multiplexing schemes
  • Global synchronous (no pin multiplexing) operation of up to 100 MHz
  • Always available, built-in debug delivers superior debug visualization over thousands of RTL-level signals
  • Global accessibility, regression farm support and multi-design capabilities
  • Modular & scalable to over 1.5B ASIC gates – 1 to 64 Xilinx UltraScale VU440 devices.
  • Preserves existing HAPS investment, interoperable with HAPS-70, mix and match design flow and hardware, same form factor, I/O voltages, HT3 connectors, daughter boards, cables

As a 2nd refresher, (or the first time for those who missed them) here is a list of my blogs around the next generation Xilinx UltraScale-Based HAPS solution, hardware and ProtoCompiler SW.

Xilinx UltraScale solution focused.

In between these blogs you will find a stack of other posts great information (if I do say so myself)

To SUBSCRIBE use the Subscribe link in the left hand navigation bar.

Another option to subscribe is as follows:

• Go into Outlook

• Right click on “RSS Feeds”

• Click on “Add a new RSS Feed”

• Paste in the following “http://feeds.feedburner.com/synopsysoc/breaking”

• Click on “Accept” or “Yes” or whatever the dialogue box says.

  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn

Posted in UltraScale | 2 Comments »

Welcome new HAPS Connect Partners, Sunstone Circuits, Screaming Circuits and Radix Co., Ltd

Posted by Michael Posner on July 13th, 2015

HAPS Connect Partner Services for HAPS systems

I’m back from vacation (which was great, thanks for asking) and happy to announce three new HAPS Connect partners, Sunstone Circuits, Screaming Circuits and Radix Co., Ltd. A list of all the HAPS Connect partners can be found following this link.

Sunstone Circuits is the established leader in providing innovative and reliable printed circuit board (PCB) solutions for the electronic design industry. With over 40 years of experience in delivering high quality, on-time PCB prototypes, Sunstone Circuits is committed to delivering production-grade small-quantity PCB manufacturing, from layout to fabrication, from board test to assembly (in partnership with Screaming Circuits), Sunstone Circuits can help you build out your HAPS system quickly and effectively. http://www.sunstone.com

Screaming Circuits assembles prototype and small volume production PC boards. We’ll build as few as one board for a quick-turn prototype, and up to thousands for low-volume production. We’ll build from your kit, or, with our long-time PC board fab partner, Sunstone Circuits, and parts suppliers, such as Digi-key, Arrow, and others, we’ll handle the whole procurement and build process for you. We’ve built boards that have gone up into space, down into the ocean, and everywhere in between. ITAR and IPC Class III are available. http://www.screamingcircuits.com

Radrix Co., Ltd. is a university venture company based in Japan. Radrix provides various engineering solutions, including system design and development for the signal processing and wireless communication field. Radrix has developed chipsets, ASIC and FPGA level design and system simulators for next-generation wireless LAN 802.11n/ac and other next-generation digital communication systems. http://www.radrix.com/?lang=en

Please join me in welcoming these three new HAPS Connect Partners. The HAPS Connect Program expands the choice of HapsTrak and Multi-Gigabit board and service offerings available for HAPS systems. The HAPS Connect Program helps customers to: Develop HAPS prototypes faster by leveraging compatible daughter boards from leading industry hardware vendors; Reduce project risk by taking advantage of hardware and services from vendors with HAPS system expertise; Save on prototype development costs and resources by using products and services tailored for HAPS systems. Interested in being a Synopsys HAPS Connect Partner? Contact us for more information on how to join the HAPS Connect Program, hapsconnect@synopsys.com

I love nature and Hawaii has a lot of it, check out this huge snail found in Maui

Check out the size of this snail... and this was one of the small ones

These guys turned their backs to me, apparently they don’t like the Posner paparazzi.

These little animals didn't like their picture being taken

  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn

Posted in HAPS Connect Program | No Comments »

Exploratory Place and Route improves timing by up to 10%

Posted by Michael Posner on July 1st, 2015

Example of a routing hot spot in FPGA

This week I am going to discuss a new and unique capability in the Synopsys FPGA/FPGA-based prototyping tools that get you the double whammy, helps solve FPGA routing congestion and improves FPGA performance. The new capability is called Exploratory place and route (PAR). Oh, before I forget I’m off on vacation for a week so no blog next week. I know this is going to break my posting flow and even though FPGA-based prototyping is always on my mind even when I’m hiking or snorkeling or kayak surfing or just sitting on the beach relaxing, I’m going to resist the urge to post by not traveling with my laptop.

While the new FPGA devices have more routing, designs sometimes run into congestion, especially ASIC designs being prototyped as they are not specifically designed for FPGA resources. Mitigating routing congestion is a difficult task and runtime scales with design sizes and complexity. One size does not fit all, as in there is not a single silver bullet option to solving routing congestion as there are many causes for it. In prototyping one way to reduce congestion is to simply partition across more FPGA’s. In ProtoCompiler it’s easy to do this but of course this is at the cost of additional hardware. Another means for tackling the problem is parallelization of the place and route task driving the tools with different seeds. The Synopsys Exploratory Place & Route feature exploits parallelism along with design characterization to make ‘smart’ choices for P&R configurations

ProtoCompiler’s Exploratory Place and Route feature first scans the design and generates a signature based on the architecture. ProtoCompiler then invokes parallel Vivado P&R jobs, and learns from each results so as to determine the optimum P&R settings to meet timing and route the design.

Flow for the new Synopsys exploratory place and route capability

During synthesis, the Synopsys mapper generates design characteristic and design statistics information. These characteristics are used to select Vivado Place & Route configurations and launches parallel P&R jobs. These jobs are monitored and subsequent jobs are launched, provided the Worst Negative Slack (WNS) for each individual job is < 0. Once a P&R job has WNS > 0, the monitor terminates all remaining jobs. Hey presto, rapidly find the best P&R settings based on your design. No longer do you have to rely on guess work or default to what worked well in the past as that strategy might not be suitable for the new design.

Synopsys Exploratory PAR results, up to 10% improvement in timing

Total number of possible jobs will be run until the best result (based worst negative slack of the design) is achieved. The best results are written to the Place & Route directory (including .dcp files, log files etc.) along with a par_explorer.log which contains a description of each place and route job run. Synopsys’ Exploratory Place & Route parallelizes the problem across multiple cores or multiple machines. When a satisfactory result is achieved, all remaining jobs are terminated, and the successful run is saved. We have seen up to 10% timing improvement from utilizing this new capability (not represented in the pic above)

Super cool, basically takes the guess work out of configuring the place and route. This capability is an integral part of the complete timing driven flow delivered in ProtoCompiler and which I discussed in an earlier blog. In summary to achieve the highest possible prototype performance you need a complete RTL to Bit file, end to end flow. ProtoCompiler is the only tool in the market able to offer this.

HAPS ProtoCompiler end-to-end timing driven flow for highest performance operation

Average results at each stage of HAPS ProtoCompilers timing driven flow optimizations

To SUBSCRIBE use the Subscribe link in the left hand navigation bar.

Another option to subscribe is as follows:

• Go into Outlook

• Right click on “RSS Feeds”

• Click on “Add a new RSS Feed”

• Paste in the following “http://feeds.feedburner.com/synopsysoc/breaking”

• Click on “Accept” or “Yes” or whatever the dialogue box says.

  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn

Posted in Performance Optimization, Use Modes | No Comments »

Intel’s FPGA-Based Prototyping presentations from SNUG Israel

Posted by Michael Posner on June 27th, 2015

Recently at  SNUG in Israel I was lucky enough to attend two presentations created and delivered by Intel teams on their use of FPGA-based prototyping. The first: “Methodology and Best Practices deployed by Intel for FPGA-based prototyping” discussed various technics they employ to streamline the creation of an FPGA-based prototype. It’s like a mini methodology guide so I highly recommend you review the material.

http://www.synopsys.com/community/snug/pages/proceedingLp.aspx?loc=Israel&locy=2015

Intel presentation from SNUG Israel on FPGA-based Prototyping of SoC's

The second paper titles  “Large Scale IP Prototyping” is a great example of multi-FPGA designs using Synopsys’ HAPS/ProtoCompiler solution and specifically the HAPS High Speed Time Domain Multiplexing to pass ~25K signals between FPGA’s. The material presents Intel’s usage and results and again I recommend downloading and reviewing the material.

http://www.synopsys.com/community/snug/pages/proceedingLp.aspx?loc=Israel&locy=2015

Intel presentation on Large IP Prototyping using HAPS and ProtoCompiler

Oh, you need to have a Synopsys SolvNet ID to download….. Oh#2, I just noticed the proceedings are not posted yet. I am reliably informed that they will be posted shortly.

Many of you know that I travel internationally on business on a regular basis and have asked how I cope with the constant time changes. I employ two simply methods to manage jet lag, #1 No alcohol while traveling at all. This helps when you are only getting 3-5 hours of sleep and #2 Coffee

Best jet lag #2 Coffeeeeee

Luckily while in the UK they serve up vats/buckets of coffee that require two handles to hold the weight. This is a six shot “eye opener”

To SUBSCRIBE use the Subscribe link in the left hand navigation bar.

Another option to subscribe is as follows:

• Go into Outlook

• Right click on “RSS Feeds”

• Click on “Add a new RSS Feed”

• Paste in the following “http://feeds.feedburner.com/synopsysoc/breaking”

• Click on “Accept” or “Yes” or whatever the dialogue box says.

  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn

Posted in ASIC Verification, Debug, Early Software Development, FPGA-Based Prototyping, FPMM Methods, Getting Started, HW/SW Integration, In-System Software Validation, IP Validation, Man Hours Savings, Milestones, Performance Optimization, Project management, System Validation, Technical, Tips and Traps, Use Modes | No Comments »

Multi-Design Use modes, Farming out UltraScale FPGA-based Prototypes (and Dead Sea Salt Cubes)

Posted by Michael Posner on June 22nd, 2015

Globalization driving the need for new prototyping usage models

Shalom! (I was in Israel this week, read below about Dead Sea Salt Cubes). One of the ever changing dynamics of electronic design is the globalization of design teams. More and more project team boundaries have been broken down with team members separated across the globe. This effects the design, verification and validation process as the teams need to share resources for efficiency and cost reasons. This is also true for FPGA-based prototyping hardware. The HAPS systems have supported global remote access for a number of generations via the Synopsys Universal Multi-Resource Bus, UMRBus.

The HAPS global access capability has enabled users to remotely use a system regardless where they are in the world as long as the HAPS host server can be accessed. Self-checking verification regressions are by far the most popular execution mode where the design is loaded, a test is executed and the results saved to the host for pass/fail confirmation. Many users use the dynamic capabilities of the UMRBus to interactively monitor and debug the systems remotely. We even have users which configure a web cam along with the system so they can view live, real time video output from the system.

In this usage mode many customer “share” a pool of HAPS systems installed in an FPGA-based prototyping server farm. This farm of HAPS systems is utilized across multiple projects for multiple levels of development from the smaller block and IP all the way up to the largest SoC.

HAPS Multi-Design mode supporting IP, Subsystem and SoC

The key to supporting a farm based usage model is the ability to easily create a FPGA images which are unique and portable across systems. When you manage a farm of prototypes you want to be able to get high utilization of the available resources so the images need to be compatible across the farm infrastructure. HAPS ProtoCompiler supports a defined methodology and automated capabilities to enable the easy creation of these portable images. These images are single or multi-FPGA spanning designs using the differentiated HAPS capabilities such as high speed time domain multiplexing for highest performance operation, debug for highest visibility and even Hybrid Prototyping where the HAPS systems connect to the Virtualizer based Virtual Prototypes.

Using the HAPS multi-design mode you can load multiple of the same design images, multiple separate design images or of course a mix of these. The benefit of loading the same design image multiple times is when you are running verification regression tests on the same design. You can get far higher regression through put from the high performance of the HAPS prototype multiplied by the parallel instances of the design executing. This is a great usage model for those overnight regression tests. To maximize resource utilization multiple different designs can be loaded on the available resources across multiple projects.

The HAPS systems themselves are highly configurable which provides for a highly flexible farm deployment model either tailored for specific projects or generically in a more traditional server farm global resource.

HAPS Farm based usage models

We have seen that you can utilize the combination of the HAPS ProtoCompiler with HAPS flexible routing interconnect to tailor the hardware and partition to a specific SoC architecture to achieve the highest performance operation. ProtoCompiler determines the best routing interconnect for HAPS systems and this topology is deployed in the farm. The project(s) design iterations with similar architecture use the same hardware configuration and all subsequent iterations are targeted at this “locked” configuration. Highest performance is achieved. There would be stacks of systems configured for specific hardware architectures. Even better, under peek project demands or emergency schedule requirements the systems can be reconfigured and redeployed rapidly to support a specific project increased need.

Some users prefer to “lock” the hardware architecture across all hardware platforms for easy replication and management in the server farm. This means they configure all the hardware in the exactly the same way, same routing, same daughter boards, cookie cutter replication. HAPS and ProtoCompiler also support this deployment model. In this mode of usage ProtoCompiler is fed the locked configuration and then it will “fit” the design to the set architecture. This mode has the advantage of easy hardware replication and management  in the farm but could come at the expense of a little performance if the design cannot be efficiently partitioned across the set hardware topology. We typically see a mix of these deployment modes being used. Some projects insist on the highest performance, others either achieve the performance they require or have lax performance needs.

Don’t forget, with HAPS you get the best of both worlds because the software and systems are highly flexible. Tailor for the highest performance and cookie cutter replication supported. When the project is finished the systems can be reconfigured and redeployed to match the needs of the next set of projects for the highest return on investment.

Finally and sometimes the only reason folks read my blog, the off topic section of my blog. During my visit to Israel I was lucky enough to spend a weekend in this beautiful country and take in a little relaxation personal time. I’ve visited Israel many times before but usually it’s in and out with only business. This time I got a whole two days for personal time between business trips. I chose to spend one day enjoying the beach, hiking and swimming, (pic below of the beach) and the other to go on a quest to the Dead Sea. This was no normal tour, this was to be a search for the elusive Dead Sea Salt Cube which the local Synopsys Israeli team thought was a myth.

View of the beach in Israel

The Dead Sea Salt Cubes are a natural phenomenon where the salt of the dead sea crystalizes in almost perfect cubes, naturally. Below is a picture from the internet of the so called Dead Sea Salt Cubes.

Dead Sea Salt Cubes picture from the internet

Impossible you think right, cubes don’t form in nature. I did my research up front and the science is supportive of the phenomenon, salt crystals form in a unified positive/negative architecture resulting in cubes. But could this really happen out in the wild. The Dead sea is the only place where the salinity of the water is high enough for such a phenomenon to be possible so I went on a search. The first challenge was finding a private tour guide willing to drive me to various locations around the Dead Sea and not the typically tourist traps. At a high cost I found a 65 year old guide who was happy to do something outside of the ordinary. (even though he thought that the whole salt cube phenomenon was fake and made up on the internet but he was happy to take my money)

Well I am pleased to announce that the phenomenon is proven plausible, I found evidence (see below) of this elusive formation. OK, so it’s not the perfect cube like seen in the online video and pictures but it still proves that its possible for cubes with straight lines to form in nature. Amazing!!! The tour guide was astounded and he was very happy that I teach him something new about the Dead Sea after such a long time as a tour guide.

Mick Finds Dead Sea Salt Cube out in the wild

If for some reason you go looking for the Dead Sea salt cube, a word of warning. The Dead Sea has been shrinking at an alarming rate and the edge of the water is getting far away from the road and resorts. Sink holes are a huge danger, this is where water dissolves the subterranean layers and the upper layers of ground literally fall into the earth. There is significant sink hole danger if you chose to go outside the resorts of the Dead Sea. All along the Northern region down to the Southern tip are warning signs and visible landmarks of the sink holes. There is even what looks to have been a beautiful Dead Sea resort which is now closed and you can see sink holes all around it as well as parts of the buildings subsided into large holes. It’s prohibited to venture to the East of the road that runs parallel to the Dead Sea (Israeli side) outside of the official tourist spots. The cube formation picture above was not found at the run of the mill tourist spot where people float around in the water and cover themselves in mud.

Dead Sea Chunk

My tour guide got very excited by my finding and starting talking to the local folks who work and live by the Dead Sea. What we found out is that I was very lucky to find a cube in the summer months as they typically only occur in the colder winter months. This one was up a little from the shore in a dry area. The hotter water temperatures of the summer months dissolve the cubes and no new cubes form. The best time to find the cubes is in the winter months when the water is colder which generates the right conditions for the salt crystal cubes to form. However it’s not that simple. It has to be the right conditions, cold but not too cold, wet but not too wet so the cubes don’t form every winter. Every time I visit Israel from now on in the winter period I plan to continue my quest for the Dead Sea Salt cube.

Mick fins other Dead Sea salt formations

While I only found a small number of cube formations I did find a load of other amazing salt crystal formations. These are actually more beautiful than the cube but I am an engineer and the idea of cubes forming in nature is just plain amazing.

If you like this or other previous posts, send this URL to your friends and tell them to Subscribe to this Blog.

To SUBSCRIBE use the Subscribe link in the left hand navigation bar.

Another option to subscribe is as follows:

• Go into Outlook

• Right click on “RSS Feeds”

• Click on “Add a new RSS Feed”

• Paste in the following “http://feeds.feedburner.com/synopsysoc/breaking”

• Click on “Accept” or “Yes” or whatever the dialogue box says.

  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn

Posted in UltraScale | No Comments »

Highest Performance Xilinx UltraScale-based Prototypes

Posted by Michael Posner on June 12th, 2015

High Frequency Operation

While individually the Xilinx UltraScale VU440 devices deliver increase performance, Xilinx quotes the (-1) speed grade having the same logic performance as the Xilinx Virtex-7 2000T (-2) speed grade, this unfortunately has very little effect on multi-FPGA prototype performance. The reason is that the performance bottleneck is not the speed of the logic in the FPGA device itself, it’s the overall multi-FPGA interconnectivity, we like to call this the pin-multiplexing bottleneck. Take the simple example below where an SoC is partitioned across multiple FPGA’s.

Representation of IO bottleneck after multi-fpga partition

In an FPGA-based prototype you are physical IO limited so when the number of signals that need to pass between FPGA’s exceeds the number of physical IO’s you need to utilize pin-multiplexing to share IO’s. The higher the pin-multiplexing ratio, the lower the overall system frequency achievable. This blog is all about minimizing the pin-multiplexing ratio.

To achieve the highest FPGA-based prototyping system performance you first need to start with a timing driven implementation flow. The HAPS ProtoCompiler tool delivers an end-to-end timing driven flow for this purpose. It’s critical to have timing driven capabilities at each stage of the design flow otherwise you have an uncontrollable open loop resulting in sub-optimal results. HAPS ProtoCompiler delivers timing driven capabilities in the area of partitioning, system route, system level timing analysis, system level time budgeting, FPGA synthesis and optimization and finally forward constraint generation to guide FPGA place and route.

HAPS ProtoCompiler end-to-end timing driven flow for highest performance operation

The HAPS ProtoCompiler timing driven flow addresses the need for speed at each level of the flow:-

  • Partition: Reduce the number and length of multi-hop paths
  • System route: Optimizes the total path & pin mux ratios
  • System level timing analysis: Provide early and accurate performance estimates
  • System level timing budgeting: Convert system-level constraints into timing constraints optimized for individual FPGAs
  • FPGA Synthesis & Optimization: Improve performance by reducing route congestion. Faster TaT from distributed compile & mapping
  • Guided & Optimized P&R: Pass FPGA constraints to Vivado for predictability & best performance

And the results speak for themselves:-

Average results at each stage of HAPS ProtoCompilers timing driven flow optimizations

HAPS ProtoCompiler delivers end to end timing optimization resulting in the highest performance operation. The main value of FPGA-based prototyping is accuracy, real world IO and performance which is needed to run stacks of software and accelerate the execution of regression tests. The Synopsys FPGA-based prototyping R&D team is relentless in their quest for the highest performance operation and with the introduction of the HAPS next generation UltraScale systems they turned the dial to 11. (I love the movie that this reference comes from). The engineers identified a bottleneck within pin-multiplexing and set out to address it.

Case study, IO bottleneck limits performance in multi-fpga prototype

The above picture describes the scenario, this was an actual customer case study (executed on HAPS-70 systems) highlighting the bottleneck. This part of the design was a closely coupled subsystem with over 11,000 signals required to cross between two FPGA’s. The signals are split across two clock groups, one clock which is required to be greater than 10 MHz and the other is a slower clock, sub-2MHz. 11,040 signals, 480 IO’s (240 differential pairs) results in a mux ratio of 46 required to pass all the signals. Using the HAPS High Speed Time-Domain Multiplexing capability it was easy to meet and beat the 10 MHz performance goal. The HAPS HSTDMx48 ratio delivers over 11 MHz system operation. The customer was very happy with this high performance result. As you can see it’s the ratio between number of physical IO’s and pin multiplexing ratio which dictates the overall system performance.

One way to increase performance would be to apply more IO to the link, HAPS has the greatest flexibility to enable this. However in this design no more IO was available as the other HT3 connectors were populated with connections to other FPGA’s and daughter boards. The modest increase in IO from the Xilinx UltraScale FPGA’s does not change this situation in any significant fashion.

When we developed the HAPS next generation systems we built-in the capability to deliver increased virtual IO to address the needs of designs just like this. With the new HAPS systems we built in dedicated Multi-Gigabit (MGB) interconnect routes and have developed new High Speed Time-Domain Multiplexing capabilities in ProtoCompiler to optimize it’s usage and automate the seamless deployment.

How HAPS and ProtoCompiler solves the IO bottleneck challenge. Offload slower clock group signals onto dedicated multi-gigabit TDM bus

HAPS ProtoCompiler is utilized to help prioritize the signals in the faster clock group. The signals in the slower clock group are offloaded onto the MGBTDM paths which free up valuable IO’s. Now the situation has changed dramatically. As you have offloaded over 5500 signals you are left with the signals in the fast clock group which can utilize the same 480 IO’s available. Now the calculation of pin-mux ratio is 5520 signals across 480 IO’s (240 differential pairs) which results in a ratio of 23 to pass all the signals. The ratio is significantly reduced. The result: System performance increased to over 15 MHz, which is a 36% improvement.

And don’t worry, the signals that utilize the MGBTDM links are still passed synchronously to the design maintaining the fidelity of the source design.

If you want the highest system performance it’s critical that you have an end to end timing driven implementation flow in addition to specific differentiated capabilities to manage pin multiplexing requirements. HAPS with integrated ProtoCompiler delivers both.

If you like this or other previous posts, send this URL to your friends and tell them to Subscribe to this Blog.

To SUBSCRIBE use the Subscribe link in the left hand navigation bar.

Another option to subscribe is as follows:

• Go into Outlook

• Right click on “RSS Feeds”

• Click on “Add a new RSS Feed”

• Paste in the following “http://feeds.feedburner.com/synopsysoc/breaking”

• Click on “Accept” or “Yes” or whatever the dialogue box says.

Off subject, a number of people asked what I had been up to lately in my spare time. Well I never have any spare time because I am always building stuff.

First was a project a started a while ago and finally finished, it’s a video game console in a briefcase.  You can play over 900 1990’s style games in vertical or horizontal mode. Everything packs up into this little briefcase and is 12v battery powered for anywhere usage. This is the second generation of such an idea, the first was in a larger box

Briefcase closed

Mick Built Toys - Gaming console in a briefcase

Briefcase opened

Looking inside Mick built toys gaming console in a briefcase

Playtime

Mick built toys gaming console in a briefcase in action

I’ve also been working on some deck projects. The first was a built in cabinet box for under our garden window, it’s the cedar built in seen on the right hand side of the below picture. It opens up and becomes a table for when we have parties as well as being a storage area. The second project was a large rolling storage box, you can just see it at the end of the picture.

Micks deck projects

Finally I had some scrap left over from the deck box projects and I hate to waste so I turned the scrap into a set of Bat boxes. If you read one of my recent blogs, you will know that I am a huge bat fan. These bat boxes will help ensure that our local bats have somewhere to hang out (pun intended)

Mick built bat boxes, Mick loves bats!

  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn

Posted in Performance Optimization, UltraScale | Comments Off

See Xilinx UltraScale VU440 based HAPS at DAC

Posted by Michael Posner on June 5th, 2015

HAPS UltraScale based system

Come talk to Synopsys about the Xilinx UltraScale VU440 based HAPS next generation system solution at DAC 2015.

Another view of HAPS UltraScale based system

Being successful with FPGA-based prototyping requires a combined solution including enterprise class hardware and prototyping implementation software supporting IP through SoC development. The new HAPS solution delivers

  • The fastest time to operational prototype, on average 2 weeks from initial RTL, and the fastest RTL-to-Bit file flow for rapid incremental turn-around
  • Highest performance from an end-to-end timing-driven implementation flow & new high-speed TDM-based pin-multiplexing schemes
  • Always available, built-in debug delivers minimally-intrusive, superior debug visualization over thousands of RTL-level signals
  • Global accessibility, regression farm support and multi-design capabilities
  • Modular & scalable to over 1.5B ASIC gates – 1 to 64 Xilinx UltraScale VU440 devices.
  • Preserves existing HAPS investment, interoperable with HAPS-70, mix and match design flow and hardware, same form factor, I/O voltages, HT3 connectors, daughter boards, cables

Some of my recent blogs on the new HAPS systems

Remember, boards don’t solve the challenges of FPGA-based Prototyping, Synopsys’ comprehensive, integrated HAPS FPGA-based prototyping solution does. Just check out the latest FPMM survey results. Mapping ASIC to FPGA is still the #1 challenge. This is why an integrated solution is needed. If you can’t get to bit file you don’t need hardware.

FPMM Survey Results 2015 data

Looks forward to seeing you at DAC 2015. Last eye candy picture of this blog (Of course I realize that I’m only posting pictures of the hardware and not the software. Lets face it, if you have seen one GUI you have seen them all, hardware is just more interesting to look at. I’ll try and post some pictures of the sexy HAPS ProtoCompiler front end in the future blogs)

Side dock of HAPS UltraScale based system

  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn

Posted in UltraScale | Comments Off

Xilinx UltraScale VU440 Integrated Design Implementation and Debug

Posted by Michael Posner on May 30th, 2015

HAPS System with Xilinx UltraScale VU440 devices in Synopsys lab

Pictured in the Synopsys lab, above, is one of the fully operational next generation HAPS systems. I was asked multiple times this week why Synopsys has not publically announced the systems when the hardware is fully operational. There are a number of factors which make up the reason with the most important being that hardware is only a fraction of the challenge of FPGA-based prototyping. You cannot be successful without an implementation tool flow and that tool flow must be tested against the real hardware. We will announce when the complete solution is ready to go and can make customers immediately productive. Saying that, if you want early access to the HAPS ProtoCompiler software tool set and HAPS hardware with engineering sample Xilinx FPGA’s then contact me or your local Synopsys representative. We are already collaborating with over 20 customers in preparation for full availability.

Synopsys is ahead of the curve in all areas of product development, hardware functionality, software tool flow and IP. The hardware is ready, we are doing final characterization and integrated feature testing, these are the capabilities that are built into hardware and deployed via the software and IP flow. Of course, just like everyone else this testing is being done using Engineering Sample FPGA silicon as production Xilinx UltraScale VU440 devices are not available until late in the year. HAPS ProtoCompiler is operational with the main focus of testing being again the integrated capabilities such as always available debug, new pin-multiplexing capabilities which can improve system performance by up to 50% or more (I’ll blog on this new feature over the next couple of weeks) and the timing driven flow which is hardware aware and dependable as it’s based on the timing characteristics of the actual hardware and accessories.

Talking about integration, one of the existing HAPS ProtoCompiler capabilities which I’ve never talked about is the simple abstraction from FPGA pins to HAPS capabilities. This abstraction means that the engineer only needs to care about the HAPS hardware specific IO’s and capabilities and does not have to be a FPGA expert. For example:-

1. Top level pins uses HAPS name (applicable to HAPS-70 and HAPS-80)

define_haps_io {p:clk} -haps_io {GCLKP[1]} automatically assigns the port “clK” to HAPS GCLK 1. HAPS ProtoCompiler automatically inserts any required buffers, IO standard and constraints.

2. GPIO/LED Pins can use virtual IO names: (Applicable ONLY to HAPS-80)

define_haps_io {p:red[1:0]} -haps_io {A_LED_RED[2:1]} no need to manually work out where the LED is connected, HAPS ProtoCompiler automatically maps to it.

3. Once you are done, HAPS ProtoCompiler IO Report provides detailed information about RTL Names, HAPs Name and equivalent Xilinx Pin names.

In the below screen shots you can see both the HAPS ProtoCompiler implementation GUI and runtime analysis GUI. In the implementation GUI I have highlighted the HAPS-IO abstraction and the debug instrumentor. You insert debug monitors and watchpoints using the source RTL just like you do with a normal RTL simulator.

HAPS ProtoCompiler for Xilinx UltraScale VU440 design implementation GUI

In the runtime GUI you can view the debug data overlaid on the original RTL source, regardless of partition. In this screen shot we are also exercising the Universal Multi-Resource Bus, UMRBus, capability. (I just noticed that the UMRBus report strings still say HAPS-70, I assure you this design is running on the new systems. Hey, I said we are still developing and testing the software)

HAPS ProtoCompiler for Xilinx UltraScale VU440 Design Analysis GUI

Do you want to talk to me at DAC? If yes, shoot me a note, comment or email and we can setup a time to meet.

  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn

Posted in Admin and General, UltraScale | Comments Off

Maximizing Debug Visibility in Xilinx UltraScale FPGA-Based Prototypes

Posted by Michael Posner on May 22nd, 2015

While the Xilinx UltraScale VU440 FPGA device looks to rocket FPGA-based prototyping forward in respect to ASIC gate capacity is sadly does nothing to help you debug your prototype. If anything it amplifies the engineers debug challenge by enabling huge volumes of RTL to be modeled. The HAPS and ProtoCompiler integrated solution debugging capabilities have revolutionized FPGA-based prototype debug but engineers still want more. Proof of this can be seen below in the contrast between the 2011-2012 and 2013-2014 FPMM survey data. The question asked is to rate the most challenging aspect of FPGA-based prototyping. You can see that the priority on debug increased significantly in the 2013-2014 timeframe over the 2012-2013 range.

This is a long blog but I hope you read to the end as it’s filled with great technical data and a previews of what’s coming in the HAPS next generation systems.

Synopsys FPMM Survey data from 2012-2013

Synopsys FPMM Survey data 2013 to 2014

This shift could partly be due to HAPS ProtoCompiler successfully solving the 2012-2013 main challenge of Time to Prototype. It’s logical that the quicker you can get your DUT onto the prototype the quicker the need for debug. HAPS ProtoCompiler can generate results in as little as 5 days from initial RTL drop and incremental FPGA images overnight making HAPS prototypes highly productive. The other factor driving the increased need for debug is that the overall size(in ASIC gates and number of FPGA’s) of the FPGA-based platform has increased significantly. Whereas 1,2 and 4 FPGA platforms used to be the mainstream maximum sized prototype this has been superseded with prototypes of 8,12 or more FPGA’s (we even have customers using HAPS systems consisting of 32 FPGA’s) thanks to the HAPS seamless scalability and the modularity of the HAPS ProtoCompiler tool flow easing the modeling of the SoC DUT. When you add the Xilinx UltraScale VU440 device into the mix with 2.2x capacity over the Virtex-7 2000T you can understand why debug is a priority for prototypers.

Debug must no longer be treated as an afterthought, I see a lot of this when I visit and chat with prototypers. HAPS ProtoCompiler has helped as it moved some decision making on debug to the top of the flow ensuring that engineers don’t just focus on partitioning and implementation. Always accounting for debug is the key to success.

Here is a simple example of the impact of not accounting for debug upfront. In our picture below the SoC have been partitioned and implemented across four FPGA’s. Note that debug was NOT considered when the partition was created. (I should note, this is a real life case, engineer does not use HAPS or ProtoCompiler). You can see the clouds of logic and the arrows of interconnect between FPGAs.

Traditional prototype without debug

However the engineer needed to debug (of course) their DUT so they then tried to force fit debug into the partitioned design. Unfortunately as they had not accounted for debug when they did the original partition the addition of the debug logic, even though it was pretty light, still forced the partition and interconnect between FPGA’s to change. In the picture below you can see that the logic in one FPGA was over utilization and had to be split into another FPGA. This is hugely invasive to the prototype. Basically the engineer had to begin from scratch, re-partition the design, re-do the interconnect and pin multiplexing. It was horrible.

Traditional prototype with debug inserted as an after-thought

Synopsys does not want other prototypers to face this challenge in the future so we have solved it as part of our next generation solution. In essence a debug infrastructure which has the capability to capture 1000’s of debug signal bits per-FPGA at prototype platform speed is built-into directly into the HAPS hardware and the HAPS ProtoCompiler tool flow automated the implementation ensuring its seamless and minimally intrusive to the user DUT. As with the solution today, the engineers view the debug data in familiar RTL and waveform formats compatible with Synopsys’ Verification Continuum tool set.

HAPS Next Generation built-in, always available debug

Dedicated debug data storage memory and dedicated debug interconnect communication scheme is built into the HAPS hardware so you are not utilizing FPGA embedded memory blocks or precious FPGA standard IO’s for debug. The HAPS debug IP infrastructure is pre-compiled and instantiated automatically by HAPS ProtoCompiler making the addition of debug almost seamless to the user. As its pre-compiled and validated, we ensure that timing within the debug hub IP and infrastructure is always met. The design interface to the debug interconnect infrastructure is asynchronous and pipelined minimizing any effect of timing closure based on the users design and performance constraints.

HAPS Next Generation built-in debug infrastructure

As the debug infrastructure is automatically and always accounted for during the partition and implementation phase within HAPS ProtoCompiler its minimally invasive and overall seamless to the engineers. This is in respect to both the debug logic IP and the debug interconnect communication as that is all handled via dedicated routes. As noted above, these dedicated debug routes are not FPGA standard IO’s. FPGA standard IO’s are prioritized for user defined interconnect between FPGA’s in the same flexible HAPS manor maximizing system performance.

HAPS Next Generation built-in debug, minimally intrusive to DUT

Finally, this new solution is modular and scalable across system so as your prototype scales from 4 to 8 to 12 to more FPGA’s, so does the debug. There was a little preview to this in last week’s blog when I posted a picture of one of the new systems. The systems include an external debug panel that enables debug to be chained across systems. Simply connect the dedicated debug routes between systems and the HAPS ProtoCompiler tool flow does the rest by providing a unified debug view back to the RTL golden source code regardless of the number of FPGAs in the prototype partition.

HAPS Next Generation built-in debug multi-system seamless modularity and scalability

There we go. Built-in, always available, minimally intrusive debug  where the data is presented in context of the original source RTL for a simulator-like debug experience. Don’t forget, you still have other great HAPS and HAPS ProtoCompiler debug capabilities such as Real Time Debug, where design signals are routed seamlessly to a daughter board for capture using a logic analyzer. You can also create custom debug capabilities utilizing the HAPS Universal Multi-Resource Bus, UMRBus.

I hope you found this blog interesting. To get my blogs sent directly to you subscribe now.

To SUBSCRIBE use the Subscribe link in the left hand navigation bar.

Another option to subscribe is as follows:

• Go into Outlook

• Right click on “RSS Feeds”

• Click on “Add a new RSS Feed”

• Paste in the following “http://feeds.feedburner.com/synopsysoc/breaking”

• Click on “Accept” or “Yes” or whatever the dialogue box says.

I’m traveling again next week so I might be tardy in respect to the next blog. Stick with me though as I have some more exciting progress and capabilities to share with you.

Oh, if you liked this blog, post a comment and let me know. If you didn’t like this blog post a comment and let me know that as well. You only improve with feedback and I welcome it. Of course this is blog post ~150 so you might have wanted to post a comment sooner if you didn’t like what I write.

  • Print
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn

Posted in Debug, UltraScale | Comments Off