BLOGS & FORUMS
Breaking The Three Laws
|Breaking The Three Laws|
Archive for the 'FPGA-Based Prototyping' Category
Posted by Michael Posner on 25th January 2015
Last week I spent a week in Japan visiting users to discuss their FPGA-based prototyping challenges and explain how Synopsys can help. My overall take-away of the visits was that many companies want to expand their current single FPGA-based prototyping to multi-FPGA but fear the challenges associated with this. The summary of what I explained was pretty simple but to the point. #1 Having a defined methodology, flow and tool set is key. #2 Yes Multi-FPGA is more complex but it’s not as steep of a learning curve as you may think. And that’s it…..
#1 Having a defined methodology, flow and tool set is key
In respect to methodology you of course can refer to the FPGA-based Prototyping Methodology Manual, FPMM. (English and Japanese versions, 英語版と日本語版) I used Google translate so I hope I didn’t just offend the whole of Japan. Read the FPMM and you will become an expert prototyper but we all know that in this age of technology a summary is always nice. This is why I blogged about 3 Phase Approach to Successful Prototyping a while back. Yes, this is the blog with the upside down pyramid which at the time I thought was a great way to show the progression down through a funnel. The three phases are “Make Design FPGA Ready”, this is all about making the ASIC RTL FPGA friendly. “Bring Up Functional Prototype”, this is where you want to get onto the hardware as quickly as possible so you can functionally validate the design. Finally “Optimize Prototyping Performance”, pretty self-explanatory really.
Along with a defined methodology you need to utilize a tool flow which is designed for prototyping. I was amazed in Japan that many companies still tried to use the FPGA vendor tools for prototyping. I have nothing against FPGA vendor tools, they are great for their job which is FPGA synthesis. When I asked about the challenges these customer faced it was the same story I have heard before, ASIC clock conversion, gated clocks, memories and for the few that did multi-FPGA the key challenge was clock/reset synchronization and pin multiplexing. If you want to go fast on the freeway you don’t buy a bicycle you buy a car, it’s the same with prototyping. If you want to prototype buy a tool set which is designed for the purpose. I blogged about ASIC Gated clock conversion a while back, Unlocking the Secrets of ASIC Clock Conversion, which is just one example of a tool set designed for prototyping. Search my blog and you will find a stack more examples of what is possible by the tools these days. You need a tool set which is more than just a FPGA synthesis tool, you need a tool set that understands your challenges and can help with automation and dedicated capabilities.
#2 Yes Multi-FPGA is more complex but it’s not as steep of a learning curve as you may think.
If you have never prototyped before I’m not going to tell you to start doing multi-FPGA prototyping straight away, too many variables to take on at once. Start small, create a single FPGA-based prototype of a subsystem of your design. Don’t fall into the trap of thinking you can get away with using the FPGA vendor tools (see above) use a dedicated FPGA-based prototyping tool. This is exactly why we provide ProtoCompiler DX as part of the HAPS-DX system. ProtoCompiler DX is everything you need to implement the prototype and more as it includes high visibility debug capabilities. Once you have a prototype up and running on a single FPGA, then it’s time to expand, but don’t bite off more than you can chew, again start small. My suggestion is that you do not add anything new to the design you already have operational. Simply select a block or two from the existing design and move them into a 2nd FPGA. Using your multi-FPGA prototype tool, such as ProtoCompiler, get familiar with partitioning and pin multiplex IP insertion. Get familiar with customizing the hardware to match the needs of your design. Abstract Partition Flow Advantage, this is an important step to ensure that you create the highest performance multi-FPGA prototype. Once you have the same design up and running on two FPGA’s and a design flow and expertise in place you are ready for greater things. Go off and be successful in multi-FPGA prototyping.
The weather was pretty nice in Japan, here is the view from the hotel I was staying at:
We had lots of nice meals out, can you guess what was served at this restaurant?
We did a day trip to Shin-Osaka via the bullet train, Shinkansen, what a mean looking train
Nice view of Mount Fuji on the way up
While the Japanese can build a train that goes 200 MPH they have the same problem as the rest of the world, horrible coffee. Luckily when we arrived the local team treated me to vending machine coffee
Of course many may question my judgment for even trying train coffee and vending machine coffee. You have to remember I was jet lagged and this was better than no coffee….. but only marginally.
I met the star of the film, Big Hero6, Bay Max, well at least his inflatable body double
The funny thing is that Bay Max the real character is also inflatable so what this really a body double or his clone?
Posted in FPGA-Based Prototyping, FPMM Methods, Getting Started, Technical, Tips and Traps, Use Modes | No Comments »
Posted by Michael Posner on 16th January 2015
In late 2013 I blogged about the newly announced Xilinx UltraScale devices, the VU440 specifically that will be the largest FPGA device on the market: http://blogs.synopsys.com/breakingthethreelaws/2013/12/xilinx-fpga%E2%80%99s-for-fpga-based-prototyping/
Well this week Xilinx officially announced that they have shipped the first samples of the VU440 devices: http://press.xilinx.com/2015-01-15-Xilinx-Delivers-the-Industrys-First-4M-Logic-Cell-Device-Offering-50M-Equivalent-ASIC-Gates-and-4X-More-Capacity-than-Competitive-Alternatives
And check out who received the first of these samples…………………………… ok, you don’t need to read it, Synopsys did…….. We have optimized every generation of our HAPS prototyping systems for the highest system performance, greatest capacity while adding significant capabilities on top delivering prototyping specific features. We all know the FPGA device is a required component within the FPGA-based prototyping hardware but it’s not what defines or makes the solution useful. Anyone can slap an FPGA on a board but this does not help a prototyper as the device alone does not deliver the capabilities they require. Prototypers rely on a solution which includes a software implementation tool flow, integration between hardware and software accelerating time to operation, built in capabilities such as high speed pin multiplexing and high visibility debug to ease bug hunting while being modular and scalable. (side note: Synopsys offers exactly this…. just in case you didn’t know)
I recommend you also check out the VU440 demo video. It stars my friend Kirk from Xilinx who introduces the new device and the demo running ten ARM Cortex-A9 CPU’s, pretty impressive. http://www.xilinx.com/products/silicon-devices/fpga/virtex-ultrascale.html#uniquePlayer1
Over the coming weeks I’m going to focus my blogs on the capabilities that the new Xilinx UltraScale devices deliver and the impact they have to prototypers. As noted above, an FPGA alone does not deliver FPGA-based prototyping so I will discuss how the device capabilities are expected to be integrated and leveraged delivering a solution.
Oh, and just because Synopsys has received Xilinx sample devices don’t expect a new HAPS next week. Delivering a solution requires hardware development, software development and a huge amount of validation. But I’m confident that when you are ready to adopt, Synopsys will be ready to deliver…..
Posted in Admin and General, ASIC Verification, Bug Hunting, Debug, Early Software Development, FPGA-Based Prototyping, HW/SW Integration, In-System Software Validation, Man Hours Savings, Milestones, Technical, Tips and Traps | No Comments »
Posted by Michael Posner on 10th January 2015
Let me be the last to wish you “Happy New Year” and all that….
Of course it was the Consumer Electronics Show, CES recently and no good blogger would let this opportunity pass without writing something about it. Regardless of this I’m still going to write something. (If you got that joke comment below)
Hot at CES was self-drive cars with Advanced Driver Assistance Systems (ADAS), Sling TV, SuperHD TV’s, robots and of course wearables and the Internet of Things, IoT, because everyone wants a cooker with built in internet browser (well I do but as I already have a full PC in the kitchen for streaming TV and web browsing I will hold off on buying a cooker with similar capabilities)
It was nice to see that a number of projects that I know have been prototyped were being debuted at CES. It’s very fulfilling seeing a product go from idea to reality. While the big buzz was around everything you could see it’s the things you could not see which I want to discuss this week. These little electronic gadgets power the world of self-drive cars, wearables and IoT…. Yes I’m talking about sensors!!!!!
Lets face it, the sensors themselves don’t make the product but without them the device would not be able to operate so they really are the little product hero’s. Managing and data acquisition from these sensors can be a complex project in itself which is why Synopsys made it easier with the DesignWare Sensor and Control Subsystem.
This nifty subsystem is optimized to process data from digital and analog sensors, offloading the host processor and enabling more power efficient processing of the sensor data. The DesignWare Sensor and Control IP Subsystem provides designers with a complete, pre-verified solution that optimizes sensor fusion and actuator/motor control functions increasingly prevalent in automotive, mobile, industrial and IoT markets.
If you look at the block diagram above you can see that the interface to these sensors is usually pretty simple, UART, DAC/ADC, SPI, I2C and basic GPIO with the most complex interface being AMBA APB. I worked with a user recently who wanted to incorporate sensor information directly into their FPGA-based Prototype. The solution we came up with was to utilize off-the-shelf Peripheral Module interface, PMOD, boards. PMOD is a standard defined by Digilent Inc : http://en.wikipedia.org/wiki/Pmod_Interface
Diligent have a wide range of these little modules: http://www.digilentinc.com/Products/Catalog.cfm?NavPath=2,401&Cat=9 covering various applications.
Note that the PMOD’s are typically a standard 6-Pin interface with 4 signals, one ground and one power pin so very low pin count. The user connected these to the new HAPS GPIO board, pictured below, in order to interface the modules into their custom design modeled within HAPS.
Note that the HAPS GPIO board is a vertical mounted board so just like the PMOD’s themselves do not take up much physical real estate.
We ended up making full use of the HAPS GPIO board with multiple PMODs connected, interface to the ARM debugger, LED’s to signal various operational states and push switches to simulate user input. The GPIO board is very versatile with 3 standard 3.3V HAPS 10-pin GPIO headers, (Compatible with HAPS BIO1), 2 high-speed VCCO level GPIO headers (Red), 20-pin ARM JTAG header, Micro-USB for UART, MMCX clock connectors, 5V TTL header for serial LCD display, Level shifted header for I2C and SPI, 6 buttons and 4 LEDs on mini-daughter board which is connected to 3.3V HAPS 10-pin GPIO header when needed.
Unfortunately I was sick during the break so I didn’t get to as many of my little projects as I would have liked. I did however manage to complete one of them that I had been excited about for a while now. I made myself a portable gaming station complete with 412 games from the 1980-1990 era.
The whole setup packs away in a study roller case (which I got used which is why it’s so beat up). Open it up and pull the control station to the top of the box, plug it in and you are off. Favorite games like Pacman, Space Invaders, 1945, Donkey Kong and many, many more. I can now waste away many hours into the wee night playing mindless video games.
Posted in Daughter Boards, DWC IP Prototyping Kits, FPGA-Based Prototyping, Getting Started, Humor, Mick's Projects | No Comments »
Posted by Michael Posner on 13th December 2014
Warning: Technical Content!
I read an online article this week which flagged an issue with FPGA-based prototyping, clock conversion. Clock conversion is one of the most important aspects to successful prototyping and this week’s blog is dedicated to sharing information enabling you to be successful. Specifically I will cover automated Gated Clock Conversion, GCC, as the user realizes the highest benefits. With automated gated clock conversion you don’t need to maintain a separate RTL code branch for prototyping, you use the golden RTL source. Clock conversion done right ensures the prototype runs at the highest performance functionally equivalent to the source. If you look at the data from the Channel Media survey that Synopsys had conducted you see that for large FPGA-based prototyping, clock conversion is a major challenge.
Hold on, I forgot to cover why clock conversion is needed for FPGA-based prototyping. It’s one of the three laws, ASIC are not the same as FPGA’s, specifically in this case FPGA’s do not have the same clock resources like ASIC’s do. ASIC designs often use clock gating to reduce dynamic power and have complex clock logic to generate numerous internal clocks. In the ASIC flow, users do Clock Tree Synthesis to balance all the clock paths between the sources and destinations and avoid clock skews between them. The problem is the number of ASIC clocks in the design typically exceed the number of global clock lines available in the FPGAs. There are a limited number of dedicated global clock lines in FPGA devices and Global clock lines cannot accommodate clock generating and gating logic. Thus GCC is needed to convert these ASIC clock structures to FPGA comparable structures. You could do this manually but that’s time consuming and error prone.
In Emulation oversampling type techniques are used to solve this problem, this is where all clocks are resolved to one synchronous clock source and all clocks are driven from derivatives of this clock. The benefit of this technique is that the conversion is fast and practically any type of clock tree structure can be handled. The main two disadvantages are #1 you tank performance as the system frequency is dictated by the slowest clock. #2 the design loses some fidelity as all clocks are synchronous to each other which may not reflect the true asynchronous clock behavior hiding issues in clock domain crossing circuits. Techniques similar to this such as the HAPS Clock Optimization in ProtoCompiler (I think I’ll blog about HAPS Clock Optimization in the future but for today I’ll focus on Gated Clock Conversion) which map clocks to HAPS hardware resources, are getting popular in FPGA-based prototyping as they can help reduce the time to first prototype enabling a prototype to be handed off to the software teams quickly. It will not have the high performance expected by the software engineers but it’s available quickly, handles almost any ASIC clock structures and this gives the prototype engineers a little breathing space to complete the high performance version.
Gated Clock Conversion, GCC, converts ASIC clock structures to FPGA friendly structures mapped to FPGA clock resources. GCC also needs to handle timing violations due to clock skew created because of clock gating logic. These violations are introduced on paths where the source and destination flops are driven by different clocks. Data from the source may reach the destination quicker/later than the clock resulting in hold/setup time violations in many paths. GCC has to ensure that there is no clock skew between two synchronous clock domains. The GCC capabilities of ProtoCompiler directly addresses both these conversion challenges in a fully automated fashion. ProtoCompiler maps to FPGA resources and moves the generated clock and gated clock logic from the clock pin of the sequential elements to the enable pins. This includes supporting implementation of these structures across block boundaries and across multiple FPGAs as part of partitioning.
Tool support of automated gated clock conversion is not a milestone it’s a journey as ASIC coding styles are continually evolving and the tools need to keep pace. ProtoCompiler has an extensive portfolio of supported structures including, but not limited to, Generated Clock, Gated Clock, Integrated Clock Gating Cells, Complex Sequential Cells, Instantiated Cells, Mixed Async Controls, Data Latches and MUX / XOR structures. With the correct clock constraints ProtoCompiler should automatically identify these gated clock structures and convert them to FPGA friendly structures. Below is an example of generated clock identification and the result of the automatic conversion.
In summary clock conversion is essential to successful prototyping. The ProtoCompiler Clock Conversion capabilities moves the generated clock and gated clock logic from the clock pin of the sequential elements to the enable pins, allowing sequential elements to be tied directly to the source clock, removing skew issues and reducing the number of clock sources in the design making it FPGA-based prototype friendly. Automated Gated Clock Conversion is just one of the many capabilities that ProtoCompiler delivers and is essential for prototypers.
Don’t forget that the FPGA-based Prototyping Methodology Manual, FPMM, has a section to help understand clock conversion and other ASIC design structure handling. Last week I had the pleasure to hang out with Rene Richter, one of the authors of the FPMM. Rene signed a book copy for me, this copy could be yours if you comment and answer the following question
How many HAPS systems has Synopsys shipped and to how many customers?
Answer the question by comment and if you get the answer right I will contact you to get your shipping addresss.
If you like this or other previous posts, send this URL to your friends and tell them to Subscribe to this Blog.
To SUBSCRIBE use the Subscribe link in the left hand navigation bar.
Another option to subscribe is as follows:
• Go into Outlook
• Right click on “RSS Feeds”
• Click on “Add a new RSS Feed”
• Paste in the following “http://feeds.feedburner.com/synopsysoc/breaking”
• Click on “Accept” or “Yes” or whatever the dialogue box says.
Posted in ASIC Verification, FPGA-Based Prototyping, Man Hours Savings, Technical, Tips and Traps | 2 Comments »
Posted by Michael Posner on 14th November 2014
Recently Synopsys promoted that “Synopsys Virtual Prototyping Book Achieves Milestone of More Than 3000 Copies in Distribution to Over 1000 Companies” – http://news.synopsys.com/2014-11-05-Synopsys-Virtual-Prototyping-Book-Achieves-Milestone-of-More-Than-3000-Copies-in-Distribution-to-Over-1000-Companies Virtual Prototyping continues to gain momentum including Hybrid Prototyping, which combines Virtual Prototyping with FPGA-based prototyping. The VP book statistics prompted me to have a look at the FPGA-based Prototyping Methodology Manual, FPMM statistics.
To date there have been over 6000 FPMM downloads across 2800 different companies and over 2500 free books handed out. WOW. I expect a bit of overlap between the two but still that’s got to be over 8000 copies distributed. I’m happy that the FPMM has been able to help so many engineers around the world.
Looking at the challenges facing these prototypers, below image, you can see that conversion of ASIC to FPGA is still rated as #1.
Actually the #2 challenge, clocking issues, really falls into the this same category. As previously blogged, these challenges are not solved with just software or just hardware changes, they are solved by integration. When the software has built in understanding of the hardware and when the hardware can be customized to the needs of the SoC many of the challenges disintegrate.
A great example of the value of integration is the DesignWare IP Prototyping Kits which are part of the Synopsys IP Accelerated Initiative. DesignWare IP prototyping kits deliver a comprehensive IP subsystem which enables immediate productivity for both Hardware and Software engineers. Individually IP & FPGA-based prototyping deliver value but when combined the value is increased. It’s like the 1+1 = 3.
Talking of IP, DesignWare USB 3.1 IP is now available. http://www.synopsys.com/Company/PressRoom/Pages/usb-3-1-news-release.aspx . I have been talking about USB 3.1 for a while now and blogged about it here. You can also find a video of the DesignWare IP for USB 3.1 running on the HAPS-70 systems here https://www.youtube.com/watch?v=isQ7cvuyoTw
Do you have a topic you would like me to blog about? If so, drop me a comment and I’ll pop it in the queue.
Posted in FPGA-Based Prototyping, FPMM Methods, Getting Started, IP Validation, Technical, Tips and Traps | 1 Comment »
Posted by Michael Posner on 30th October 2014
In previous blogs I have spoken a lot about automation, features and capabilities which accelerate time to operational prototype and deliver higher performance enabling you to run more software against your design representation. These capabilities are designed to reduce the need for prototyping expertise and effort…….. but not to zero. Anyone who tells you that no expertise or effort is needed is not telling you the whole truth. This was the basis of this blog, “Breaking the three laws” of which the first law is ASIC are FPGA Hostile! Who can tell me what the other two laws are? I know but this is like a quiz for my readers.
Pictures in the blog are posted large so they are easier to read, click on the picture to see the full view version.
Synopsys has created a simple three phase definition for FPGA-based prototyping, including methodology guidelines and I am happy to share them with you. The three phases split into 1. Make Design FPGA Ready. 2. Bring Up Functional Prototype. 3. Optimize Prototype Performance. Follow these three phases and you will be on a path for FPGA-based prototyping success.
Make Design FPGA Ready
This is probably the most important step as the rule of thumb is garbage in, garbage out. There is only so much automation a tool can deliver so understanding the basic needs and best practices for FPGA-based prototyping is essential. Synopsys ProtoCompiler can help here with automated ASIC to FPGA translation, clock conversion and replication as needed. However you should always follow the best practices defined here to yield better results in the final implementation. Don’t forget, full best practices can be found in the FPMM, FPGA-based Prototyping Methodology Manual.
Bring Up Functional Prototype
Once code is prepared the bring up functional prototype phase is entered. This is the phase with the goal of getting the prototype up and running as quickly as possible, TTFP, enabling the team to hand off a platform to the software developers. The faster they get a platform the most productive they can be. Even if you have traded off a little performance to get the fastest time to prototype your software team will thank you for the fast enablement. ProtoCompiler and HAPS helps here, especially in the partition phase, I recently blogged about this: Abstract Partition Flow Advantage. Another important best practice is to plan your debug needs upfront in this phase, don’t treat it as an afterthought. This is exactly why in the ProtoCompiler flow debug is highlighted ensuring you at least give it some thought.
Optimize Prototype Performance
As you have already delivered an operational prototype to your software team you have a little breathing space now to focus on performance optimizations. In the fast turn-around abstract partition flow ProtoCompiler might have identified some bottlenecks that you skipped past in order to achieve fastest time to prototype. Now you have time to focus on these and other areas of the FPGA-based prototype to squeeze the most out of the solution. An example of this was shared with me recently where the prototype was fully operational at 9 MHz but with a little more effort, new partition and careful analysis of critical paths, the prototype performance was increased to 13 MHz. What a great improvement.
So there it is, three simple phased approach ensuring successful prototyping, enjoy!
Happy Halloween, here is the costume that I built, I call it Atomic Dinosaur. I am a construction spray foam master and it has LED lights down it’s back too!
That’s some crazy eyes I’ve got going on…………….
Posted in FPGA-Based Prototyping, FPMM Methods, Getting Started, Technical, Tips and Traps | Comments Off
Posted by Michael Posner on 2nd May 2014
This week I was asked to clarify what the PCIe Gen3 protocol speed is, to confirm, PCIe Gen3 speed is 8Gb/s per lane. PCIe Gen 1 is 2.5Gb/s and PCIe Gen2 is 5Gb/s. Yes I know I’m usually seen as the prototyping guy, (Or Mick “I’m not dead yet” Posner thanks to my pneumonia) but I also happen to double up as a protocol expert. One of the key advantages of FPGA-Based prototyping is the ability to model real world interfaces at speed so to be a prototyping expert you basically have to be a protocol expert as well. The above eye diagram (click image to view full size) is measured on the HAPS-70 (-2) speed grade systems running one of the many DesignWare PCIe Gen3 controller validation tests for interoperability and compliance testing. That is a good looking wide open eye!
I sneaked into the lab and snapped off this picture. (Click on image to view full size)
The little HAPS-70 S12 (-2) speed grade system is perched on top and that large black cable sticking out is the PCIe Gen3 capable cable connection. The cable plugs into a PCIe Gen3 host adapter board that in turn plugs into the host machine. In this specific setup I was told that we have the DesignWare PCIe Gen3 End Point controller modeled in a x4 lane configuration. Yes that’s x4 lanes of PCIe Gen3 so x4 lanes of 8Gb/s. I’m going to try and have the R&D engineer do a little video for me of the system in action as it was very impressive.
Oh, when using the Xilinx built in transceivers (Rocket IO) as the physical link the (-2) speed grade systems are required to model PCIe Gen3. The (-1) Xilinx Virtex-7 2000T devices only support up to 6.6 Gb/s while the (-2) support up to 10.3125 Gb/s thus supporting PCIe Gen3 speeds. Looking for PCIe Gen3 expert advice, go check out the Express Yourself blog
Off topic, spring has sprung in Oregon !! Yay, it was here for a whole 4 days and now we are expecting rain again. This is fine, this is what you expect and grow to love when living in Oregon. The mix of sun and rain makes everything really green, just look at how beautiful this little baby fern is. This is growing in my yard along with a huge amount of moss which is also typical across Oregon.
Click on image to view full size
Posted in ASIC Verification, FPGA-Based Prototyping, IP Validation, System Validation, Technical, Use Modes | Comments Off
Posted by Michael Posner on 28th April 2014
Synopsys just announced ProtoCompiler which is automation and debug software for HAPS FPGA-Based Prototyping Systems. ProtoCompiler is the result of years of R&D effort to generate a tool that addresses prototypers challenges today and built on top of an architecture which can support the needs of prototypers long into the future. ProtoCompiler focuses on the needs of prototypers specifically addressing the need for accelerated bring up as well as providing capabilities which result in higher system performance as compared to existing solutions. In this blog I’ll discuss some of the technical details behind the main tool highlights. Below are the detailed highlihts.
Integrated HAPS hardware and ProtoCompiler software accelerate time to prototype bring-up and improves prototype performance
Automated partitioning across multiple FPGAs decreases runtime from hours to minutes for up to 250 million ASIC gate designs
Enables efficient implementation of proprietary pin multiplexing for 2x faster prototype performance
Captures seconds of trace data with gigabytes of storage capacity for superior debug visibility
(Read to the end of the blog if you also want an update on Mick’s Projects)
Highlight: Integrated HAPS hardware and ProtoCompiler software accelerate time to first prototype bring-up and improves prototype performance
As noted above the goal of ProtoCompiler is to accelerate the bring up of a prototype as well as providing a path to the fastest possible operating performance. ProtoCompiler is unique as it combines hardware/software expertise with intimate knowledge to deliver superior results. Think of it as delivering a HAPS hardware expert packaged up into a format that anyone using the tool can access. ProtoCompiler has deep technical knowledge of the HAPS hardware including configuration, clocking structures, interconnect architecture, pin multiplexing expertise and more. ProtoCompiler is not only a hardware expert, it’s also a software expert. ProtoCompiler is built on top of a state of the art Synopsys proprietary prototyping database that means RTL is effectively processed and incremental and multi-processing techniques can be deployed with ease.
All this results in blazingly fast processing speeds. As an example ProtoCompiler’s area estimation, essential for automated partitioning, can processed 36 Million ASIC gates in less than 4 hours as compared to 22 hours in existing solutions. Now that’s fast!. Thanks to the new data model and incremental modes all subsequent compiles are even quicker.
Highlight: Automated partitioning across multiple FPGAs decreases runtime from hours to minutes for up to 250 million ASIC gate designs
So there are actually two announcements packaged up in this highlight. Starting in reverse ProtoCompiler supports 250 Million ASIC gate and larger designs. Humm, this sounds a little suspect as when HAPS-70 was launched it only supported 144 Million ASIC gates, what does ProtoCompiler know that we don’t? Luckily I know, HAPS-70 can now be scaled to support 288 Million ASIC gates, 2x the capacity. HAPS-70 now supports chaining of any six systems so if you chain six HAPS-70 S48’s you get a total of 288 Million ASIC gates supported which is 24 Xilinx Virtex-7 2000T FPGA’s. All working in one synchronous system.
Any 3 HAPS systems can be chained via our standard control and data exchange cabling, when you go above 3 systems you utilize a synchronization module that manages the system synchronization. Managing clock skew, reset distribution and configuration is all handled automatically. ProtoCompiler understands the hardware capabilities thus making deployment of such a system a snap. No longer do your engineers have to worry about how to distribute clocking, we have done the hard work so you don’t have to. Other vendors “claim” scalability and modularity but if all they are delivering is boards then it’s nothing more than a wild claim. To deploy a scalable and modular system you need a complete solution of software and hardware. You can now prototype SoC designs you thought never possible
The first part of the highlight introduces the new partition technology deployed in ProtoCompiler. ASIC’s are bigger than a single FPGA so you need to quickly partition the design across multiple FPGA’s. Historically this has been a challenge but with ProtoCompiler that challenge has been overcome. The partition engine in ProtoCompiler requires minimal setup before you can apply it to your design. There are four simple steps to setup the partition engine #1 Create target system, basically which system(s) you are compiling to. #2 Establish basic constraints which are things like blocks of IO. #3 Define the design clocks. #4 Propose an interconnect structure. Actually #4 can either be defined telling the partition engine to use a set interconnect architecture or leave it open and let the tool do it. There are advantages of both. By letting the tool pick the needed architecture the resulting system should be higher performance as ProtoCompiler will maximize interconnect to reduce pin multiplexing ratio. In a previously deployed system you may have already set the interconnect and then want the tool to use the available resources so you don’t make any changes to the hardware in the field. ProtoCompiler has the flexibility to do both meeting the needs of new prototype creation and image re-spin after a new RTL code drop.
ProtoCompiler partition engine is FAST, using the same example as above, 36 Million ASIC gates, ProtoCompiler was able to come to an automated solution is 4 minutes!!! WOW. ProtoCompiler provides a huge amount of information as to what it automatically did so that the engineer can quickly review the results and maybe provide ProtoCompiler more guidance to optimize the partition. For example after the first run you might want to lock down select parts of the design and then incrementally run the engine to push it to find a better solution for the rest of the design. As it runs so fast you can do multiple of these optimization iterations in a matter of hours. I’ve played with the tool as I was interested in this particular capability and have to say it’s amazing. I’ve tried the open method and let the tool find a solution for itself, in this mode ProtoCompiler pretty much finds a solution every time. I also played with challenging the tool for example locking the tool to use only 100 IO’s (two HT3 connectors) between FPGA’s. ProtoCompiler quickly finishes and told me that I was crazy and that the design could never be partitioned with my selected interconnect architecture.
Highlight: Enables efficient implementation of proprietary pin multiplexing for 2x faster prototype performance
OK, this is simple, this basically says that ProtoCompiler can automatically deploy the HAPS High Speed Time-Domain Multiplexing (HSTDM). HSTDM is developed and optimized on HAPS and ProtoCompiler packages up this expertize and automated the deployment. The partition engine will automatically select HSTDM and instance it into the prototype design. HSTDM delivers high performance pin multiplexing between multiple FPGA’s. The signals are packaged up, sent across a high performance link and unpacked at the other side. This all happens within one system clock and is completely transparent to the user. No manual intervention, no additional latency, and it’s stable and reliable as HSTDM is tested as part f the HAPS production testing and every system has to pass the minimum HSTDM performance tests. This ensures that when you deploy am image with HSTDM that it runs on every system the image is loaded on. No need to tailor the pin multiplexing implementation for each board like you have to do with other vendors.
Highlight: Captures seconds of trace data with gigabytes of storage capacity for superior debug visibility
ProtoCompiler expands the debug capabilities and grows the HAPS Deep Trace Debug capability which utilizes off-FPGA memory to store debug data. ProtoCompiler provides seamless multi-FPGA debug capabilities on top of a set of other debug capabilities tailored to delivering visibility at the right level of the debug cycle.
In debug one size does not fit all, you need to deploy the right level of debug visibility capability dependent on what you are trying to debug and the specific point you are in the project cycle. Sometimes you want very wide debug visibility with fast incremental turn-around. Later in the design cycle you typically want very, very deep debug windows. ProtoCompiler delivers both, fully automated through the flow, seamless and transparent to the users. And when I say deep, I mean deep, the example below is very typical of the debug window where you can easily capture seconds of debug data.
As usual my blogs got really long. I wrote it in the car while driving from Portland to Eugene. Amazing that I could type all of this and drive at the same time (LOL, only joking I was in the passenger seat)
Anyway, ProtoCompiler is the bees knees and I personally think it revolutionizes FPGA-based prototyping using HAPS. What do you think of ProtoCompiler?
If you have managed to get this far into my blog then congratulations. I’ve been taking it easy this week while I recover from the pneumonia that I came down with. In the evenings I finished off the two mini RC tracked vehicles I had been working on. The basis of both are simple kits which I then modified and added RC receivers and motor controllers to. While I am a grown adult I must admit they are fun to play with. The first is a basic platform RC tracked vehicle which I attached a Lego sheet to. Little did I know that this would be so popular with my son. He has been building towers and all types of structures on top of it.
Why drive your car to a car park when the car park can come to you. No joke that’s what my son said.
Mobile tire store
Bulldozer and sweeper
At the same time I also built a kit that has a shovel that moves on the front. Again I modified it to be radio controlled, including the shovel. This vehicle is a HUGE hit with my son and he has been busy building towers, knocking them down, then tidying them up with the shovel.
There are a couple of video’s of these little things in action on my You Tube page: https://www.youtube.com/user/MrMickPosner (and a video of my chicken food winch system)
Posted in Admin and General, ASIC Verification, Bug Hunting, Debug, Early Software Development, FPGA-Based Prototyping, FPMM Methods, Getting Started, HW/SW Integration, In-System Software Validation, Man Hours Savings, Mick's Projects, Milestones, Project management, System Validation, Technical | Comments Off
Posted by Michael Posner on 5th April 2014
Not many people know this but I am a FPGA-based prototyping Ninja-Fu master. What super power do I have you ask? I have the power to enable higher performance prototype operation and in this week’s blog I am sharing this ancient secret power with you. Wow, the start of this blog sounds like the bio from a really bad “B” movie, it definitely seemed funnier in my mind, then again everything seems funnier in my mind. There is actually some seriousness to this blog as I really am going to share the not so secret method to enable higher performance in your FPGA-based prototypes.
First, let’s study a typical SoC, this case it’s ~40-50 Million ASIC gates. I chose this design as it’s easier to explain but the principle for higher performance operation is even more important for larger more complex SoC’s. Our example SoC includes a CPU with tightly coupled GPU and DDR3-based memory subsystem, PCIe high performance interface, SRAM scratch pad storage, global bus, custom logic block (your SoC’s special sauce for instance) and a number of lower performance peripherals
When you model such an SoC in an FPGA-based prototype, even with the largest FPGA’s, you need to partition the design. Partition is to split up the design across multiple FPGA’s. The challenge is that the SoC design blocks have more signals than you have FPGA pins (Hey that’s one of the three laws of the breaking the three laws blog). We all know that when you partition such a design you need to insert pin multiplexing to manage the many signals over the limited FPGA pins. As I am writing this I suddenly realized I have shared the secret of higher performance prototyping before, here, anyway, this blog is way cooler so I’ll continue writing.
The challenge of partitioning this design is that due to the tightly coupled CPU/GPU you end up with many signals spanning out from a small number of design blocks. Lets assume the CPU and GPU are partitioned across two FPGA’s. If all you are prototyping is these two blocks then with the use of pin multiplexing you can connect the two blocks together. The challenge of this prototyping project is that you are also modeling the other design blocks as you want to validate the software and use that to validate your RTL design blocks. This means you end up with the SoC partitioned across four FPGA’s which forces even more connections between FPGA’s.
The picture above is a representation of the partition, the raw IO interconnect usage and the number of external IO’s required for daughter boards. Suddenly you see not only the sheer volume of interconnect needed but also the number of individual connectors required to create such an partition. Just look at FPGA 2, it’s packed with IO and daughter boards. You could try and partition the design in a different way but it’s sure to tank the performance as the GPU needs to be tightly coupled to the DDR3 memory and the CPU requires a tight link to the PCIe interface. If you sacrifice physical IO between FPGA 1 and FPGA 2 you will end up with very high pin mux ratios resulting in very low system performance.
If you were to try and model this SoC on a board with a fixed interconnect between FPGA’s or forced to use a board with great big IO connectors you would physically not be able to support SoC designs like this. With the fixed interconnect board, even if you could work out a partition, you will have to force fit your SoC interconnect topology across a fixed number of IO’s resulting in high mux ratio’s thus low performance. In addition it’s unlikely that the board would have the number of available external connector IO to support the SoC’s external interfaces for daughter boards. It’s similarly bad on a board with high pin count connectors. Using our typical SoC as the example, if the FPGA-based prototyping board has FMC like connectors, ~150 IO’s per connector, you would need ten connectors to support the required interconnect and daughter boards for the tightly connected CPU/GPU. Whoops, I know of no board that has this many connectors. Again you would be forced to use very high pin multiplexing tanking the performance and making the platform worthless.
Now look at the HAPS-70 S48, the Synopsys four FPGA FPGA-based prototyping system. This type of typical SoC design is the reason why the HAPS-70 systems expose all the FPGA’s pins to HapsTrak 3 (HT3) connectors. HT3 granularity is 50 FPGA IO’s per connector and are bank matched to the Xilinx Virtex-7 2000T banks and Super Logic Regions (SLR’s). This granularity is the “not so secret” enabler for SoC prototyping and the key to higher performance operation.
Now you can see that not only do you have the connector granularity to tailor the interconnect to the requirements of the SoC design but you also have ample connectors to support the external IO daughter boards. You can create a very dense interconnect between FPGA 1 and FPGA 2 supporting the tightly coupled CPU/GPU and you don’t need pin multiplexing as you have the physical number of IO’s needed. At the same time you can support all the other interconnect requirements to the other FPGA’s and the required daughter boards.
Hold on, there’s more….
You have ample connectors to setup the prototype with the needed JTAG debugger daughter board connecting your software debugger to the CPU. You have ample connectors to add real time debug to the platform. Real time debug is when you extract signals from the design and route them to a debugger daughter board which you connect a logic analyzer to. Oh and you can also add on some HAPS Deep Trace Debug memory so you can capture seconds of debug visibility. So not only is the HAPS system higher performance but the hardware architecture is the enabler for prototyping typical SoC’s. If you are smart you will also understand that as the SoC grows in size and requires more FPGA partitions that the HAPS flexible interconnect architecture becomes even more important. Below you can see a picture of the HAPS-70 S96, eight FPGA system, deployed for SoC prototyping enabling earlier software development and system validation
Now you have the secret Ninja-Fu.
I’ve pretty much finished my large tracked vehicle project which I featured last week. I plan to add a controllable shovel and other attachments to the front but I got side tracked building a new project.
This is a very small shovel dozer, you can see how small it is as it’s sitting on top of my home-built tracked vehicle. (Or maybe my tracked vehicle is just very big). You this little shovel dozer come from a kit but rather than using the supplied hard wired connection I’m going to retro fit this model with some tiny radio controlled electronics. I have not built the RC control yet and with upcoming business travel I’m not sure when I am going to get a chance to. I’ll be sure to post an update when this latest project is finished.
Posted in Debug, Early Software Development, FPGA-Based Prototyping, HW/SW Integration, Mick's Projects, System Validation, Technical, Tips and Traps | Comments Off
Posted by Michael Posner on 27th March 2014
It was the Synopsys Users Group, SNUG in San Jose this week and on top of the Synopsys announcement on ICC 2.0 with the Powar (I know this is a typo but I enjoy saying the word like this, you try it Powarrrrrr) of 10x there was a whole track dedicated to FPGA-Based prototyping. Knowing that the whole world does not revolve around California I thought I would provide you with an overview of the days FPGA-base prototyping sessions. As a reminder if you have an active Synopsys SolvNet account you can download the session presentations and papers shortly after the event ends
The day started with : Automating SoC RTL to Operational Prototype – Synopsys R&D Presentation
This session delivered technical information on new Synopsys prototyping software that increases the level of automation and streamlines RTL to operational HAPS FPGA-based prototyping including ultrafast partitioning. I personally think this session might have been a little too technical (is there such a thing) as it jumped into how the software solved the problems but seemed to skip the explanation of the problem itself. Now for experienced prototypers they immediately saw the benefits being delivered but I think some of the more in-experienced prototypers were in over their heads. Even with this in mind I highly recommend this presentation. Using this software the user can see up to a 50% reduction in time to prototype, achieve on average 2X or more performance improvement in system performance and get greater debug visibility.
Next up: Achieving Maximum System Performance on Multi-FPGA designs using HAPS-70 System – SanDisk use of HAPS-70 for an enterprise SSD SoC
Two words, killer presentation. I love seeing real users present their experience with FPGA-based prototyping systems. This presentation included the various steps that SanDisk used to bring up the HAPS-70 FPGA-based prototype of their enterprise SSD design and then optimize it for the highest system performance possible within the limitation of the SoC design constraints. It was very interesting and I recommend the presentation as it includes lots of great real world information on what it takes to create a full SoC FPGA-based prototype. The picture above was snuck off without anyone noticing (until now) pictures/videos within the sessions is prohibited. (whoops)
Final presentation of the day was: Putting IP and Subsystem Prototyping on the Fast Track – Synopsys Mick Posner (yes that’s me) and Antonio Costa from the DesignWare IP R&D team.
I came prepared with giveaway’s to bribe the audience for good feedback, we will have to wait and see if that worked. I introduced HAPS-DX for complex IP and Subsystem prototyping and tried to explain the various validation use modes and best practices for IP prototyping to streamline hand-off to the SoC team streamlining their prototyping efforts. I then handed over to Antonio who provided details on how the DesignWare IP R&D utilize the many generations of HAPS systems to validate the IP. I think the bits that resonated the best with the audience was Antonio’s explanation of the validation scenarios that could only be reached using FPGA-based prototyping. Antonio also introduced the IP R&D teams use of HAPS Deep Trace Debug to capture seconds of debug visibility enabling long protocol scenarios to be successfully debugged.
I highly recommend that you download these SNUG presentations on FPGA-based prototyping from SolvNet.
Progress on my latest garage project has been slowed by business travel and sickness. But I have made some progress and am happy to say that basic functionally has been validated.
If you can’t recognize what it is from this picture then let me tell you that it’s a custom designed remote control tracked vehicle with full suspension tracks. I designed the track runner system and this picture is of revision 2.0. It’s not perfect but it does the job and I can see a 3.0 revision to get better articulation. When finished I’ll take a little video if it in action, it’s a lot of fun. Here is a preview of the vehicle in action: https://www.youtube.com/watch?v=yF4uRSzL5qQ
Some has asked me where I find time to do projects like this, the answer is I don’t find time but I manage to fit in a little here and there at the expense of other things like family time, house projects, eating…. I’m going to slow down on the project as family time, house projects and eating should really take priority.
Posted in ASIC Verification, Bug Hunting, Debug, Early Software Development, FPGA-Based Prototyping, HW/SW Integration, In-System Software Validation, IP Validation, Mick's Projects, System Validation, Use Modes | Comments Off
| © 2015 Synopsys, Inc. All Rights Reserved.