A global team of protocol experts that share their insights and technical expertise in the areas of AMBA, DDR, Ethernet, LPDDR, MIPI, PCIe, SAS, SATA, USB and UFS. This comprehensive team participates in standards committees and will provide the latest information and updates as it relates to your future design considerations.
Internet of Things (IoT) is connecting billions of intelligent “things” to our fingertips. The ability to sense countless amounts of information that communicates to the cloud is driving innovation into IoT applications. Servers powering the cloud will have to scale to handle these billions of intelligent things. As a preparation to that PCIe Gen 4 has been introduced. It is capable of supporting 16 T transfers/s. Current primary market driver for the PCIe Gen4 application seems to be server storage space.
Also PCI-SIG and MIPI are collaborating on supporting MIPI MPHY with PCIe: MPCIe is a version of the PCIe protocol for the mobile interconnect.
PCIe had its own evolution with Gen1, Gen2, Gen3 and now Gen4. With every new generation, the speed has doubled and so is the increase in complexity. A proven PCIe Verification IP with support for all the speeds can significantly reduce the verification schedule. If such a Verification IP is also bundled with test suite and coverage suite it can certainly reduce the risk of verification. What if such Verification IP also comes bundled with support for protocol aware debug in Verdi?
Synopsys offers all these features in a single PCIe VC Verification VIP offering:
Support for Gen1, Gen2, Gen3 and Gen 4 speeds
Support for MPCIe
Supporting for the NVMe application
Includes a bundled test suite
Built-in support for protocol-aware debug in Verdi
Come experience our product hands-on through a PCIe Workshop in your region. This workshop will provide you a unique opportunity to learn about:
Ease of programming interface of VC Verification IP for doing normal transfers, error injection and low power scenarios
Various outputs generated by the VC Verification IP for debug, Learn to use different abstraction of debug information for different level of debug from signals to text logs to protocol aware debugs within single Verification IP
How to integrate the user DUT in to the test suite environment and get it going quickly
Recently, PCIe workshops were held in Mountain View, California, and Bangalore, India. Participants in these workshops told us that they loved the new feature of Verdi to facilitate protocol aware debug and coverage back annotations. Error injection capabilities coupled with various debug capabilities at each layer gave them the confidence to left-shift verification closure.
The MIPI Unified Protocol (UniPro) specification defines a layered protocol for interconnecting devices and components within mobile device systems. It is applicable to a wide range of component types including application processors, co-processors, and modems. MIPI UniPro powers the JEDEC UFS, MIPI DSI2 and MIPI CSI3 applications. As of now, MIPI UniPro has been adopted the most in the mobile storage segment through JEDEC UFS. Adoption of MIPI UniPro and MIPI M-PHY provides lower power and higher performance solutions.
Many PCIe veterans may already have begun implementing MIPI UniPro in their designs. This blog post takes you through a quick view of the MIPI UniPro and MIPI M-PHY stack from a PCIe perspective. As you will notice, there are many similarities.
PCI Express provides switch based point-to-point connection for connecting chips. MIPI UniPro also does the same. The current UniPro 1.6 specification does not support the switch though. It’s planned for future revisions. Toshiba has already released detailed technical documentation of UniPro bridge and switch for supporting the Google Ara. Both are packet switched high-speed serial protocols.
PCI Express maintains backward compatibility and uses the load and store model of the PC world. Configuration, Memory, IO and message address space access are supported at the transaction level. MIPI UniPro, on the other hand, is brand new and hence, does not have to carry the burden of ensuring backward compatibility. UniPro provides a raw communication of data in the form of messages. There is no structure to messages.
Both PCI express and UniPro support the concept of multiple logical data streams at the transport level. PCI express supports the concept of Traffic class (TCs) and Virtual Channels (VCs). A maximum of 8 TCs are supported. TCs can be mapped to different Virtual Channels (VCs). This concept of TCs and VCs is targeted towards providing deterministic bandwidth and latency.
UniPro also has a similar concept. PCI Express Traffic class (TC) is equivalent to a MIPI UniPro Cport and the PCI Express Virtual Channel(VC) is equivalent to a UniPro Traffic Class(TC). (Yes you noticed the use of TC terminology right. Both use same term but meanings are different) CPorts are bidirectional logical channels. Each CPort has a set of properties, which characterize the service provided. Multiple CPorts can be mapped to a single TC. UniPro 1.6 supports two TCs – TC0 and TC1. TC1 has higher priority than TC0. Additionally the UniPro provides End-to-End (E2E) Flow control at transport level. Note that UniPro E2E flow control is meant for application level buffer management and not for the transport layer buffers. While PCI Express implements the flow control at transport level as well.
PCI Express transport layer implements the end to end CRC (ECRC) and data poisoning. PCI Express has higher sensitivity to error detection at transport layer than the MIPI UniPro 1.6. This along with Advanced error reporting (AER) is what differentiates the PCI express to be used in the Server space where high reliability, accessibility and serviceability are valued.
PCI Express does not have a separate network layer. MIPI UniPro has a very simple pass through network layer in Version 1.6.
Data Link Layer
PCI Express data link layer is almost similar to that of the MIPI UniPro. Both solve the same problem of providing a means of robust link communication to the next immediate hop. They use similar error recovery mechanisms such as CRC protection, retransmission using NAC acknowledgement, multiple outstanding unacknowledged packets tracked using sequence numbers and flow control using the credits. While PCI Express credits are managed for Posted, Non posted and completion headers along with data, the UniPro flow control is per Traffic class(TC). In UniPro, both the credits and sequence numbers are independently managed per traffic class.
MIPI UniPro additionally supports the concept of pre-emption where in a high priority frame can pre-empt a low priority frame. This enables UniPro to provide even higher levels of latency determinism than PCI Express.
PCI Express data link layer supports low-level power management that is controlled by the hardware. The link power states are L0, L0s, L1, L2 and L3. L0 is full-on state and L3 is Link-off state. UniPro pushes it to the Physical adapter layer.
PCI Express uses differential signaling and embedded clock. It can support multiple lanes up to 32 lanes. Reset and Initialization mechanisms enable determination of Link Speed, Link width and Lane mapping. Uses the 8B10B and 128/130 bit encoding, scrambling and deskew patterns aiding the clock recovery.
UniPro has gone one step ahead and partitioned the functionality further here. The UniPro Physical layer is divided into two sub layers Physical Adapter layer (L1.5) and Physical layer (L1). MIPI has multiple physical layer specifications. They are D-PHY and the more recent M-PHY. UniPro 1.6 will only be used with the M-PHY, though. The real role of L1.5 is to abstract the higher layer from the physical layer technology. UniPro is designed to use up to 4 M-PHY lanes. Reset and initialization mechanisms are used to determine the capabilities, similar to PCI Express.
MIPI UniPro supports higher power optimization. It is done by dividing Speed into two categories: High speed and Low speed. These are further sub-divided in to Gears. Dynamically, the speed and number of lanes can be scaled based on the bandwidth requirements. This process of changing the speed is called a power mode change. The application software initiates power mode changes. When the link is not under use it can be put in to hibernate state resulting in the highest power savings. The Physical adapter layer can also autonomously save power by ending the active data burst and entering sleep or stall states.
M-PHY also uses differential signaling and embedded clock in one of its modes of operation. 8B10B, 128/130 bit encoding, scrambling and deskew patterns are used to aid the clock recovery in high speed mode of operation.
We were awed by the similarities. No wonder there are initiatives like Mobile PCI Express(M-PCIe) — allowing the PCI Express to operate over M-PHY makes sense.
Similar comparisons can be made between MIPI UniPro and the Super Speed USB 3.0. Hence we are beginning to see initiatives being taken to enable the Super speed Inter-chip (SSIC ) with M-PHY.
It will be interesting to see how these will evolve, and which one of these will emerge victorious. While we wait for the UniPro vs. M-PCIE battle to settle down, one thing is clear: M-PHY has proved itself as a clear winner.
NVM Express or the Non-Volatile Memory Host Controller Interface (its prior name was NVMHCI, now shortened to NVMe) is a host-based software interface designed to communicate with Solid State storage devices across a PCIe fabric. The current Synopsys NVMe Verification IP (VIP) is a comprehensive testing vehicle which consists of two main subsystems – the first is the SVC (System Verification Component), the second is SVT (System Verification Technology). The SVC layers are associated with the actual NVMe (and PCIe, etc.) protocol layers. The SVT provides a verification methodology interface to UVM and other methodologies such as VMM and OVM.
Here’s where you can learn more about Synopys’ VC Verification IP for NVMe and for PCIe and M-PHY.
Although the VIP supports multiple versions of VIP, we will initially be version agnostic, speaking more in generalities of the protocol in order to provide a 10,000’ view of the protocol and its support in the VIP. Future discussions will delve deeper into particular details of NVMe and features of the Verification IP.
You can learn more about Synopys’ VC Verification IP for NVMe here.
A Brief Glance at NVMe
Unlike PCIe – where the root and endpoint are essentially equals – NVMe’s asymmetric relationship is closer to that of other storage protocols (e.g. SATA, Fibre Channel)
An NVMe command (e.g. Identify, Read, Write) is initiated at the Host and converted to an NVMe request, which is then appended to a particular submissionqueue that lives in host memory. Once the command is inserted into a queue, the host writes to a per-queue doorbell register on the Controller (controllers live on PCIe endpoints.) This doorbell write wakes up the controller, which then probes the queue for the new request(s). It reads the queue entry, executes the command (potentially reading data buffers from the Host memory) and finally appends a completion into a completion queue then notifies the host of this via an interrupt. The host wakes up, pops that completion off the queue and returns results to the user.
There are two main types of queues that are used:
Admin Queues – these are used for configuring and managing various aspects of the controller. There is only one pair of Admin queues per controller.
I/O Queues – these are used to move NVMe protocol specific commands (e.g. Read, Write). There can be up to 64K I/O queues per controller.
The queues have both tail (producer) and head (consumer) pointers for each queue. The tail pointer points to the next available entry to add an entry to. After the producer adds an entry to a queue, he increments the tail pointer (taking into consideration that once it gets to the end of the queue, it will wrap back to zero – they are all circular queues.) The queue is considered empty if the head and tail pointers are equal.
The consumer uses her head pointer to determine where to start reading from the queue; after examining the tail pointer and determining that the queue is non-empty; she will increment the head pointer after reading the each entry.
The submission queue’s tail pointer is managed by the host; after one or more entries have been pushed into the queue, the tail pointer (that was incremented) is written to the controller via a submission queue doorbell register. The controller maintains the head pointer and begins to read the queue once notified of the tail pointer update. It can continue to read the queue until empty. As it consumes entries, the head pointer is updated, and sent back to the host via completion queue entries (see below).
Similarly, the completion queue’s tail is managed by the controller, but unlike the host, the controller only maintains a private copy of the tail pointer. The only indication that there is a new completion queue entry is a bit in the completion queue entry that can be polled. Once the host determines an entry is available, it will read that entry and update the head pointer. The controller is notified of head pointer updates by host writes to the completion queue doorbell register.
Note that all work done by an NVMe controller is either pulled into or pushed out of that controller by the controller itself. The host merely places work into host memory and rings the doorbell (“you’ve got a submission entry to handle”). Later it collects results from the completion queue, again, ringing the doorbell (“I’m done with these completion entries”). So the controller is free to work in parallel with the host; for example, there is no requirement for ordering of completions – the controller can order it’s work anyway it feels like.
So what are these queue entries that we’re moving back and forth between host and controller?
The first is the Submission Queue Entry, a 64-byte data structure that the host uses to transmit command requests to the controller:
Command Dwords 15-10 (CDW15-10): 6 dwords of command-specific information.
PRP Entry 2 (PRP2): Pointer to the PRP entry or buffer or (in conjunction with PRP1) the SGL Segment.
PRP Entry 1 (PRP1): Pointer to the PRP entry, or buffer or (in conjuction with PRP2) the SGL Segment.
Metadata Pointer (MPTR): This field contains the address of an SGL Segment or a contiguous buffer containing metatdata.
Namespace Identifier (NSID): This field specifies the namespace ID that this command applies to.
Command Dword 0 (CDW0): This field is common to all commands and contains the Command Opcode (OPC), Command Identifier (CID), and various control bits.
One submission queue entry per command is enqueued to the appropriate Admin or I/O queue. The Opcode specifies the particular command to execute and the Command Identifier is a unique identifier for a command (when combined with the Submission Queue ID).
In addition to using queue entries to move information back and forth, the host can also allocate data buffers in host memory. These buffers can either be contiguous (defined by their base address and length) or a set of data buffers spread about the memory. The latter use data structures called PRP lists and Scatter-gather lists (SGL) to define their locations. When the host needs to move these buffers to/from the controller (e.g. for a read or write command), it will allocate the appropriate data structure in host memory and write information regarding those data structures for those buffers into the above PRP1 and PRP2 fields prior to writing the queue entry to that controller.
Metadata (e.g. end-to-end data protection) can also be passed along with the NVMe commands, in two ways. It can be sent either in-band with the data (i.e. it is contiguous with the data, per sector), or out-of-band (i.e. it is sent as a separate data stream). In SCSI parlance these are known as Data Integrity Field (DIF) and Data Integrity Extension (DIX), respectively. The latter of these uses the Metadata Pointer described above. We’ll discuss this in detail in future episodes.
When we are actually writng/reading to/from the Non-Volatile storage on the controller, we write to namespaces. In other storage technologies, there are other analogous containers – for example LUNs in SCSI. Namespaces can be unique to a controller, or shared across multiple controllers. Regardless, the namespace ID field in the request determines which namespace is getting accessed. Some commands don’t use the namespace field (which is then set to 0), others may need to deal with all the namespaces (the namespace ID is then set to 0xffff_ffff).
On the completion side, there is an analogous data structure, the Completion Queue Entry:
Command Specific Information: One dword of returned information. (Not always used.)
Submission Queue ID: The submission queue in which the associated command was sent on. (16-bits)
In this video, Synopsys Applications Consultant, Vijay Akkaraju, describes the evolving Storage ecosystem, the challenges of verifying storage protocol based system, and how Synopsys’ SATA Verification IP can support you in verifying and debugging your designs efficiently and effectively.
You can learn more about VC Verification IP for SATA here.
In the blog Seamless Fast Initialization for DDR VIP Models, we discussed how important it is for Memory VIP simulations to have the option of going through the process of Reset and Initialization fast, and get to the IDLE state and start reading and writing to memory location. We presented one way to achieve this by scaling down the timings required while going thru all the JEDEC standard steps required for Reset and Initialization.
In this blog, we will discuss how Synopsys Memory VIP allows skipping initialization altogether while maintaining the proper behavior of the model.
You can learn more about Synopsys Memory VIP here.
Using the Synopsys Memory VIP’s Skip Initialization feature ensures that the model will be in an IDLE state, bypassing the requirements for the reset process. In that state, the VIP is ready to accept commands like REF, MRS, and ACT. The allowed commands are illustrated below in Figure 1 – The DDR3 SDRAM JEDEC standard JESD79-3F State Diagram, and Figure 2 – The DDR4 SDRAM JEDEC Standard JESD79-4 State Diagram.
Figure 1 – The DDR3 SDRAM JEDEC standard JESD79-3F State Diagram
Figure 2 – The DDR4 SDRAM JEDEC Standard JESD79-4 State Diagram
The Skip Initialization feature is applicable for DDR3, DDR4, and LPDDR. It should be noted that a reset after backdoor settings using skip init will wipe out all the settings and set back to default.
For Discrete devices, we can use the following to set the VIP to skip initialization mode:
// dram_cfg is handle of class svt_ddr_confitugation
dram_cfg.skip_init = 1
For DIMM devices, we can use the following steps to set the VIP to skip the initialization sequence on a DIMM Model:
// dimm_cfg is handle of svt_ddr_dimm_configuration and
// configuring the skip_init setting for individual DRAM
// configurations with DIMM structure
dimm_cfg.data_lane_cfg[i].rank_cfg[j].skip_init = 1;
// Skip initialization setting for RCD component within an
// RDIMM and LRDIMM
dimm_cfg.ca_buffer_cfg.skip_init = 1;
Skip initialization setting for Discrete as well as DIMM models should be done in the build phase before passing the configuration object through the config_db mechanism.
Also, these settings can be done after build phase but user will have to call reconfigure() method to update the settings in the model. This has to be done prior to any command on the interface.
The following is the syntax for reconfigure() method call:
// For Discrete Device Model
// For DIMM Model
In subsequent blogs, we will discuss how Mode Registers can be set using Frontdoor and Backdoor accesses. So do come back and check it out.
Authored by Nasib Naser
You can learn more about Synopsys Memory VIP here.
Here, we describe how easy it is to integrate and validate a SoundWire design using Synopsys SoundWire VIP Test Suite.
Often Verification IP and design integration require in-depth understanding of the protocol and methodology. This requires significant investment of time in building the expertise in-house. To accelerate the process, Synopsys’ Soundwire VIP solution is written in 100% native SystemVerilog to enable ease-of-use, ease-of integration and high performance. In addition, we provide test suites that are complete, self-contained and design-proven testbenches, written in SystemVerilog UVM and targeted at protocol compliance testing. These are provided as source code enabling users to easily customize or extend the environments to include unique application-specific tests or corner-case scenarios. Using Synopsys VIPs and Test Suites, our users have reduced verification time from months to a few hours in some cases.
Soundwire Test Suite Architecture
Verification IP and RTL Design integration is one of the areas where a good test suite architecture helps the most. It is easy to plug-in the design with verification IP if test suite environment is designed keeping various design configurations in mind. The figure below illustrates our Test Suite architecture.
The intention of this architecture is to make the environment design independent so that it can work with any design with little or no effort. In this figure, dark and light purple color blocks are provided with Test Suite, user intervention is only required in light purple color boxes to customize the environment to specific DUT. These changes are one time changes and all existing tests and sequences should run as it is after the changes. This significantly reduces verification time of any design, while giving the user a lot of flexibility to write their own test and sequences for design specific test scenarios.
In this white paper, you can learn how a customer was able to integrate the SoundWire Test Suite to verify SoundWire Slave IP of a third party vendor and begin test runs within 8 hours. To learn more about the basics of digital audio transmission and MIPI Soundwire, you can download this whitepaper.
DDR verification is one of the most critical and complex tasks in any SoC as it involves a controller sitting inside the DUT and an external DDR memory sitting outside the DUT on board. Here we will discuss fast initialization for DDR VIP models.
You can learn more about Synopsys Memory VIP here.
As per the JEDEC standard JESD79-4, Section 3.3.1, RESET_n needs to be maintained for a minimum of 200us. In simulation time, this value is a very long time. Furthermore, if the user’s testbench violates this timing, the Memory VIP will flag it as a UVM_ERROR and fail the simulation. Even though this violation is flagged as an error, it doesn’t affect the behavior of the VIP model.
There are a number of ways to get around this violation. In this blog, we will discuss one of these ways.
The Synopsys Memory VIP has an initialization feature called Fast Initialization, also known as, scaled down initialization. The intention of this feature is to allow control for overriding the initialization parameters to speed up the initialization process. The new values, whether they are set by default or customized by the user, enable faster initialization times without asserting any checker violations. Also, it doesn’t affect the initialization behavior of the model. This feature is only available for front door access – vs. backdoor access. We will discuss types of Memory VIP access in subsequent blog posts.
There are two ways to scale down the initialization parameters. One is set by using default values, and another by customization.
As per the standard, the following are the expected values:
Using the default approach, one may call the function “set_scaled_initialization_timings()” from the build_phase of the configuration object. That function call will scale down the timing parameters to the assigned values below without triggering checker violations:
To customize the values, the user may set their own customized values and then set the flag “scaled_timing_flag”. The VIP will get configured to the user provided values. As such:
For Discrete Devices:
// cfg handle of the svt_ddr_configuration class
// Pass the cfg to the DDR Discrete Device component by using // the config_db mechanism.
cfg.timing_cfg.min_cke_high_after_reset_deasserted_in_pu_and_res_init_time_ps = 500000;
cfg.timing_cfg.min_reset_pulse_width_in_pu_ps = 200000;
cfg.timing_cfg. tPW_RESET_ps = 100000;
cfg.timing_cfg.scaled_timing_flag = 1;
For DIMM Models:
// dimm_cfg is handle of svt_ddr_dimm_configuration
dimm_cfg.data_lane_cfg[i].rank_cfg[j].timing_cfg.min_cke_high_after_reset_deasserted_in_pu_and_res_init_time_ps = 500000;
dimm_cfg.data_lane_cfg[i].rank_cfg[j].timing_cfg.min_reset_pulse_width_in_pu_ps = 200000;
dimm_cfg.data_lane_cfg[i].rank_cfg[j].timing_cfg.tPW_RESET_ps = 100000;
dimm_cfg.data_lane_cfg[i].rank_cfg[j].timing_cfg.scaled_timing_flag = 1;
Authored by Nasib Naser
You can learn more about Synopsys Memory VIP here.
Today’s PCIe verification engineers have to tradeoff between verification completeness and shrinking to market complicated even further with the new Gen4 specification. Synopsys VC VIP for PCIe, fully compliant to latest version of the Gen4 specification, can solve the riddle of completing verification while keeping with the tight schedules.
This webinar will highlight enhancements to the PCIe specifications (Gen 1, 2, 3 and 4) as reported by PCI-SIG, and provide an overview of the complete PCIe Solution offered by Synopsys – Controller, PHY and VIP. It will then dive deeper into Synopsys Verification IP offering, including Test Suites, built-in error injection, passive monitor, and it will also touch on NVMe support. We will conclude with a demo using Verdi Protocol Analyzer to demonstrate advance features for debugging complex verification scenarios.
Register Web event: Learn How to Accelerate Verification Closure with PCIe Gen4 VIP Date: August 19, 2015 Time:10:00 AM PDT