Breaking The Three Laws


Highest Performance Xilinx UltraScale-based Prototypes

High Frequency Operation

While individually the Xilinx UltraScale VU440 devices deliver increase performance, Xilinx quotes the (-1) speed grade having the same logic performance as the Xilinx Virtex-7 2000T (-2) speed grade, this unfortunately has very little effect on multi-FPGA prototype performance. The reason is that the performance bottleneck is not the speed of the logic in the FPGA device itself, it’s the overall multi-FPGA interconnectivity, we like to call this the pin-multiplexing bottleneck. Take the simple example below where an SoC is partitioned across multiple FPGA’s.

Representation of IO bottleneck after multi-fpga partition

In an FPGA-based prototype you are physical IO limited so when the number of signals that need to pass between FPGA’s exceeds the number of physical IO’s you need to utilize pin-multiplexing to share IO’s. The higher the pin-multiplexing ratio, the lower the overall system frequency achievable. This blog is all about minimizing the pin-multiplexing ratio.

To achieve the highest FPGA-based prototyping system performance you first need to start with a timing driven implementation flow. The HAPS ProtoCompiler tool delivers an end-to-end timing driven flow for this purpose. It’s critical to have timing driven capabilities at each stage of the design flow otherwise you have an uncontrollable open loop resulting in sub-optimal results. HAPS ProtoCompiler delivers timing driven capabilities in the area of partitioning, system route, system level timing analysis, system level time budgeting, FPGA synthesis and optimization and finally forward constraint generation to guide FPGA place and route.

HAPS ProtoCompiler end-to-end timing driven flow for highest performance operation

The HAPS ProtoCompiler timing driven flow addresses the need for speed at each level of the flow:-

  • Partition: Reduce the number and length of multi-hop paths
  • System route: Optimizes the total path & pin mux ratios
  • System level timing analysis: Provide early and accurate performance estimates
  • System level timing budgeting: Convert system-level constraints into timing constraints optimized for individual FPGAs
  • FPGA Synthesis & Optimization: Improve performance by reducing route congestion. Faster TaT from distributed compile & mapping
  • Guided & Optimized P&R: Pass FPGA constraints to Vivado for predictability & best performance

And the results speak for themselves:-

Average results at each stage of HAPS ProtoCompilers timing driven flow optimizations

HAPS ProtoCompiler delivers end to end timing optimization resulting in the highest performance operation. The main value of FPGA-based prototyping is accuracy, real world IO and performance which is needed to run stacks of software and accelerate the execution of regression tests. The Synopsys FPGA-based prototyping R&D team is relentless in their quest for the highest performance operation and with the introduction of the HAPS next generation UltraScale systems they turned the dial to 11. (I love the movie that this reference comes from). The engineers identified a bottleneck within pin-multiplexing and set out to address it.

Case study, IO bottleneck limits performance in multi-fpga prototype

The above picture describes the scenario, this was an actual customer case study (executed on HAPS-70 systems) highlighting the bottleneck. This part of the design was a closely coupled subsystem with over 11,000 signals required to cross between two FPGA’s. The signals are split across two clock groups, one clock which is required to be greater than 10 MHz and the other is a slower clock, sub-2MHz. 11,040 signals, 480 IO’s (240 differential pairs) results in a mux ratio of 46 required to pass all the signals. Using the HAPS High Speed Time-Domain Multiplexing capability it was easy to meet and beat the 10 MHz performance goal. The HAPS HSTDMx48 ratio delivers over 11 MHz system operation. The customer was very happy with this high performance result. As you can see it’s the ratio between number of physical IO’s and pin multiplexing ratio which dictates the overall system performance.

One way to increase performance would be to apply more IO to the link, HAPS has the greatest flexibility to enable this. However in this design no more IO was available as the other HT3 connectors were populated with connections to other FPGA’s and daughter boards. The modest increase in IO from the Xilinx UltraScale FPGA’s does not change this situation in any significant fashion.

When we developed the HAPS next generation systems we built-in the capability to deliver increased virtual IO to address the needs of designs just like this. With the new HAPS systems we built in dedicated Multi-Gigabit (MGB) interconnect routes and have developed new High Speed Time-Domain Multiplexing capabilities in ProtoCompiler to optimize it’s usage and automate the seamless deployment.

How HAPS and ProtoCompiler solves the IO bottleneck challenge. Offload slower clock group signals onto dedicated multi-gigabit TDM bus

HAPS ProtoCompiler is utilized to help prioritize the signals in the faster clock group. The signals in the slower clock group are offloaded onto the MGBTDM paths which free up valuable IO’s. Now the situation has changed dramatically. As you have offloaded over 5500 signals you are left with the signals in the fast clock group which can utilize the same 480 IO’s available. Now the calculation of pin-mux ratio is 5520 signals across 480 IO’s (240 differential pairs) which results in a ratio of 23 to pass all the signals. The ratio is significantly reduced. The result: System performance increased to over 15 MHz, which is a 36% improvement.

And don’t worry, the signals that utilize the MGBTDM links are still passed synchronously to the design maintaining the fidelity of the source design.

If you want the highest system performance it’s critical that you have an end to end timing driven implementation flow in addition to specific differentiated capabilities to manage pin multiplexing requirements. HAPS with integrated ProtoCompiler delivers both.

If you like this or other previous posts, send this URL to your friends and tell them to Subscribe to this Blog.

To SUBSCRIBE use the Subscribe link in the left hand navigation bar.

Another option to subscribe is as follows:

• Go into Outlook

• Right click on “RSS Feeds”

• Click on “Add a new RSS Feed”

• Paste in the following “”

• Click on “Accept” or “Yes” or whatever the dialogue box says.

Off subject, a number of people asked what I had been up to lately in my spare time. Well I never have any spare time because I am always building stuff.

First was a project a started a while ago and finally finished, it’s a video game console in a briefcase.  You can play over 900 1990’s style games in vertical or horizontal mode. Everything packs up into this little briefcase and is 12v battery powered for anywhere usage. This is the second generation of such an idea, the first was in a larger box

Briefcase closed

Mick Built Toys - Gaming console in a briefcase

Briefcase opened

Looking inside Mick built toys gaming console in a briefcase


Mick built toys gaming console in a briefcase in action

I’ve also been working on some deck projects. The first was a built in cabinet box for under our garden window, it’s the cedar built in seen on the right hand side of the below picture. It opens up and becomes a table for when we have parties as well as being a storage area. The second project was a large rolling storage box, you can just see it at the end of the picture.

Micks deck projects

Finally I had some scrap left over from the deck box projects and I hate to waste so I turned the scrap into a set of Bat boxes. If you read one of my recent blogs, you will know that I am a huge bat fan. These bat boxes will help ensure that our local bats have somewhere to hang out (pun intended)

Mick built bat boxes, Mick loves bats!

  • Print
  • Digg
  • StumbleUpon
  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn