Express Yourself


The Goodies at Last: What’s new in PCIe 4.0

So back in what seems like another lifetime, I said

The official PCI Express Base Specification Revision 4.0 final specification has been released – get it here:

Stay tuned for the goodies of course!”

The only problem is that you’ve been staying tuned … and staying tuned … and staying tuned some more and I’ve not delivered on any goodies ☹ until now!

Higher Speed: 16GT/s

Well, yeah, you’ve known this piece for way too many years, but it’s important to note that the evolution from PCIe 8GT/s signaling to 16GT/s is similar to that of PCIe 2.5GT/s to 5GT/s – it’s “just” a new higher speed, negotiated at link initialization. Now though, getting to PCIe 16GT/s data rates requires a two-stage process – where previous rates did not. First, the link is brought up to 8GT/s using the familiar 4-phase equalization process, then the same 4-phase process is repeated while running 8GT/s rate to switch to 16GT/s rate. This requires some new arcs on the PCIe link state machine, but is re-using methods now well-proven in PCIe 8GT/s. Happily, the 128/130 encoding scheme from PCIe 8GT/s is still used at PCIe 16GT/s data rates, so designers can re-use virtually all of that logic. Of course there are some minor changes needed in the main protocol state machine, the Link Training and Status State Machine (LTSSM), to accommodate the new equalization. A few other minor symbol and test pattern tweaks are also in there to ease operation at the higher speed, but overall a PCIe 4.0 16GT/s link is going to look almost unchanged to everyone familiar with 8GT/s operation.

More Data Movement: Tags and Credits

When we were developing the PCIe 4.0 specification, some folks were concerned that certain devices with specific workloads might not be able to fully utilize the 16GT/s data rate with the existing limits on credits and outstanding transactions. To help those devices, PCIe 4.0 expanded the Tag[] field in the packet header from 8-bits to 10-bits. Note that one combination of the new bits is reserved to help detect erroneous hierarchy configurations, for a total of 768 tags available. (Yeah, that reminds me of the 2×5=8 math, but such is life.) All devices implementing 16GT/s signaling are required to support receiving 10-bit tags, but may choose whether or not to generate them based on their own needs. Because of that, all designers of PCIe 4.0 16GT/s devices will need to expand their received tag-tracking logic to handle the larger tags, but they can continue to rely on header credits to throttle the total number of simultaneous requests they must accept.

For the affected devices, more outstanding commands (and therefore more tags in use) are necessary, but not sufficient. To really make use of those additional tags, the PCIe 4.0 specification defines a scaling scheme for the flow-control credit mechanism. Devices requiring more credit than previously available can now advertise a scaling factor of 4X or 16X whereby each numeric credit in the protocol actually represents 4 or 16 credits respectively. Here again, all devices implementing PCIe 4.0 16GT/s are required to support their link partner scaling by 4X or 16X, but are permitted to use 1X scaling for their own credits if desired. Using the new scaling factors, PCIe 3.1’s maximum of 127 header credits can be extended to 508 (using 4X scaling) or 2032 (using 16X scaling) – independently for each Posted (PH), Non-Posted (NPH) or Completion (CPLH) credit type. Likewise, data credits can grow from PCIe 3.1’s 2047 (~32KB) to 8188 (~128KB) or 32,752 (~512KB) using 4X or 16X scaling respectively for each Posted (PD), Non-Posted (NPD) or Completion (CPLD) credit type.  Whew, that’s a lot of credit!

System Margin

Another significant item introduced by the 4.0 specification is “Lane Margining at the Receiver.” This feature uses software that runs on the PCIe system board to evaluate how much margin exists in each lane of the PCIe channel, or put another way, how close a given lane is to failing to transfer data reliably. The specification defines a set of registers and a basic command set whereby the host software can instruct each receiver in a PCIe channel to move its sampling point in time (and optionally voltage) to determine roughly how wide (and optionally how high) the signal eye is at the receiver.

It’s critical to understand that this feature is intended for use as a system diagnostic/evaluation tool to provide an approximate measurement of the PCIe channel and is not a measurement of the receiver. Perhaps more important is that while supporting Lane Margining is required of all devices supporting PCIe 4.0 16GT/s the use of Lane Margining is not required to run at 16GT/s. In other words, devices don’t use lane margining to get their link running at 16GT/s, systems use lane margining to find out how good their channels are in real life.


Of course, actually implementing Lane Margining in an SoC requires close cooperation between a PCIe 4.0 16GT/s controller and 16GT/s PHY, but I’ll leave that for another day.  (Hint, it’s been done in hardware, on Synopsys’ PCIe 4.0 Root Complex and it’s working with multiple vendors’ implementations…)


If you have any more topics you’d like to see covered (either as “Flashback to Basics Fridays” or otherwise), please comment below!  As always, please subscribe to ExpressYourself  by clicking here for RSS or here for email so you don’t miss out on any future updates!



Share and Enjoy:
  • Digg
  • Facebook
  • Google Bookmarks
  • Print
  • Twitter
  • StumbleUpon
  • LinkedIn
  • RSS