In 2021, PCI-SIG® released the latest version of the PCI Express® specification PCIe® 6.0. PCIe 6.0 has a raw data rate of 64 GT/s and double the bandwidth of PCIe 5.0 (32GT/s) to meet industry demand for a high-speed, low latency interconnect. It is a scalable interconnect solution for data-intensive markets like data centers, AI/ML, high-performance computing (HPC), and automotive.
In our recent blog, PCI Express Surges Forward: High Bandwidth Interconnect with PCIe 6.0 we talked about the changes that are introduced in PCIe 6.0. In this blog we will discuss the Forward Error Correction (FEC) mechanism in PCIe 6.0, why it is required, and what verification solution Synopsys offers to cover this feature.
What is PAM-4?
PCIe 6.0 uses PAM-4 signaling at 64GT/s data rate instead of non-return-to-zero (NRZ) signaling which is used at lower data rates. This means transmitted and received signals will now have four distinct voltage level (2-bit) encodings in a unit interval resulting in three eyes. This is because NRZ signaling at 64GT/s would result in the increased Nyquist rate of 32GHz at which point the channel loss could be huge; therefore, PAM-4 is used at 64GT/s to reduce the channel loss as it has the same Nyquist rate (16GHz) as 32GT/s data rate.
However, this comes with a trade-off as the eye height and eye width are reduced which makes it susceptible to errors at the receiver. It is expected that the errors occur in bursts in a lane and some amount of correlation across the lanes is also expected. So, the bit error rate (BER) associated with PAM-4 signaling is expected to be much higher than the 10^-12 target of the lower data rates.
Why FEC is required and how it is done?
FEC is used to mitigate high BER in the data stream. Since FEC works on fixed code size, FLITs are used to transmit the TLPs and DLPs in the data stream. The latency and complexity of the FEC increase exponentially with the number of bytes to be corrected. To keep the latency (<2ns) and complexity low, a lightweight FEC is used which can correct a single byte error. This is coupled with a strong cyclic redundancy check (CRC) for error detection to produce a high-reliability result. Additionally, precoding can be used to minimize the errors in a burst.
A FLIT is 256 bytes in size with 236 bytes for TLPs, 6 bytes for DLPs, 8 bytes for CRC, and 6 bytes for error checking and correcting (ECC). The 8 bytes of CRC protect the TLP and DLP bytes, but not the ECC bytes. The 6 bytes of ECC protect the entire FLIT including CRC bytes.
The FEC code is 3-way interleaved as shown in the table below. Each color represents an ECC group with the bytes marked in the same color. Thus, three consecutive bytes in a lane belong to three different ECC groups. Hence, a burst error of length <=16 in a lane will not impact more than one byte in each group and each ECC can correct a single byte error.
Table 1: FLIT interleaving on a x16 link
At the receiver, the ECC decoders perform the correction for its corresponding code group and report the error status as needed. It is then followed by the CRC check to decide whether the received FLIT is accepted or not. If the CRC check fails, the FLIT is replayed and gets corrected.
If an uncorrectable error is detected, the CRC check will fail and results in a negative acknowledgment (NAK) and then replayed. Optimizations are possible, such as FLIT with NOP-only TLPs may not be replayed, and it is also possible to replay the error FLIT only.
Correctable & uncorrectable errors:
Below are snapshots from Synopsys Verification IP for PCIe 6.0 transcripts illustrating correctable and uncorrectable errors in the FLITs.
A correctable error is injected in the transmitted FLITs.
FEC at the link partner has corrected the error and the received FLIT is accepted.
An uncorrectable error is injected in the transmitted FLITs and expecting DUT to send a NAK.
FEC at the link partner detected an uncorrectable error.
It is very important to verify both the correctable and uncorrectable errors to ensure that the FEC, CRC, and replay mechanisms are working properly.
Ultimately, a lightweight FEC in conjunction with a strong 64-bit CRC works well for an FBER of 10^-6, even with a high lane correlation. The retry probability per FLIT is around 5×10^-6 and the Failure in Time (FIT) is almost 0.
Synopsys Verification IP for PCIe 6.0 is designed to address all the verification complexities required to close data reliability aspects of your SoC. Data reliability is a much-desired system aspect and target users for PCIe 6.0 look for solutions to verify their SoC at the system level. Running system-level payload on SoCs requires a faster hardware-based pre-silicon solution. Synopsys transactors based on Synopsys IP enable fast verification hardware solutions including Synopsys ZeBu® emulation systems and Synopsys HAPS® prototyping systems for validation use cases.
Synopsys protocol verification solutions are natively integrated with the Synopsys Verification Family of products including Synopsys Verdi® debugger and regression management and automation with Synopsys VC Execution Manager.
Synopsys has been one of the key contributors to the PCIe specification and continues to provide the industry’s first verification solutions with Synopsys Verification IP for PCIe 6.0 and Testsuite.
In addition, Synopsys DesignWare IP for PCIe 6.0 includes the controller and PHY solutions, enabling the early development of PCIe 6.0 system-on-chip (SoC) designs.
To learn more about Synopsys verification solutions, please visit us at: www.synopsys.com/vip