Breaking The Three Laws


Crossing the Boundaries (Combinatorially)

I looked it up in Webster; yes “combinatorially” is a real adverb.

This post is about the added challenge of partioning non-sequential logic.

What’s that I hear you say? “We should partion only at block boundaries and all of our blocks have FFs at the ins and outs”. If so then you won’t have any combinatorial bondaries to worry about; you can advance to GO and collect £200 (or $200 but that’s not as much). I’m betting that there are others, however, who receive less optimal RTL and need to get the design onto the board any which way they can; am I right? For you, dear reader, here’s a couple of thoughts. . .

First, let’s answer the Partition Poser from the previous post because it is related; we don’t just throw these blogs together, you know (grin). As Minh-Duc Doan of Lantiq posted, the best solution is to split the Mux into three smaller muxes, as shown in the diagram below.

This reduces the number of IO pins and traces between the FPGAs by around 2n (where n is the width of the mux). As it happens, our own Certify tool split muxes  like this automatically (along with the associated control logic). Drop me or your friendly Synopsys prototyper a line if you’d like to learn more.

As Keith at Octera pointed out though, we still have the problem of constraining the combinatorial paths. How might we do that, not just for this example but for any combinatorial bondary in general?

A first pass approximation is to half the clock period of the destination FF and apply that as an IO constraint. this is very rough but at least its better than nothing (nothing is interpreted as a WHOLE clock period when it comes to synthesis and P&R, which is obviously not valid). Of course, much better is to know how the combinatorial path delay is shared between the source and destination FPGAs, and set IO constraints accordingly.

Timing budgeting provides more accurate IO constraints for P&R tools.

The timing engine in Certify estimates the components of the path delay in the source and destination FPGAs. If provided in the board description, it will also use the on-board flight time.

Consider the example below . . .

In this example, the estimates have been calculated as shown; 22ns in the source FPGA and 5ns in the destination FPGA. There is also 3ns allocated as on-board flight time. The timing constraint for the whole FF-to-FF path is given as 40ns in this example. We can see, then, that the total path delay of 22+3+5= 30ns meets the overall 40ns constraint. So far, so good, but what IO constraint should be passed on to the P&R tool runs for each FPGA?

If we were to use a half-clock default mentioned above, i.e. 20ns, then we will be over constraining the path in FPGA A but dangerously under-constraining the path in FPGA B.

Time budgeting, instead, has the ability to estimate timing in the whole path while it is still in the partitioner (in this case Certify) . Not only does this warn if the overall path does not meet the 40ns constraint but also  the IO constraints can then be allocated according to their proportion of the overall path delay. So in this example, we forward annotate to the P&R for FPGA A an IO constraint of 29.3ns. We can then allocate the remainder of the clock constraint, minus the on-board flight time, to FPGA B. So, in this example, that is 10.7ns slack from FPGA A minus the 3ns on-board flight time = 7.7ns.

So now both FPGA A and FPGA B have reasonable IO constraints which if met during the individual P&R runs will allow the overall path to meet the system-level timing constraint.

I’m sure you would hate to do this manually so look for tools that can use time-budgeting to do this during partitioning and forward annotate realistic IO constraints to P&R as a result.

That’s all for now.

As ever, we’d love to hear from you with your experiences of these issues.


Doug and Mick, Dec 2nd

  • Print
  • Digg
  • StumbleUpon
  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn