With unreliable RDMA transports (UC or UD), when can Mellanox FDR drop a packet during normal operation? By normal operation, I mean:
- The receiver always has sufficient RECVs on its receive queue
- No network fluctuations such as link up/down, network reconfiguration, switch reboot, etc.
I will put my thoughts on this topic here. Please let me know where I am wrong and what more information is relevant here.
In this normal case, I can rule out the following types of packets losses:
- Congestion-based: Due to link-level flow control, the network does not drop packets due to congestion.
- Bit-error based: Mellanox FDR uses Forward Error Correction (FEC) at the physical layer, and Link Layer Retransmissions (LLR) at the link layer. So each link is reliable, making the end-to-end path reliable.
- There is one caveat here. It seems that Mellanox's Link Layer Retransmission is cell-based so it might use a short per-cell CRC. I was unable to find the CRC length but Intel's Omni Path uses a 14-bit CRC for its link-layer retransmission; I assume Mellanox uses a similar-sized cell CRC. Therefore there is a high probability of a false negative, 2^(-14), where the CRC might fail to detect an error. This error will only be detected later using the stronger, end-to-end CRC. At that point, because the transport is unreliable, the packet will be dropped.
- However, this can only happen if a bit error that is uncorrectable by the physical-layer FEC also causes a CRC false negative. I'd expect the physical-layer FEC to be capable of correcting any single-bit error, so an uncorrectable cell requires at least 2 bit errors. Given Mellanox's advertised BER of < 10^(-15), the probability of 2 errors in the same cell is < 10^(-30).
- This needs to be multiplied further with the probability of a CRC false negative with 2 bit errors.
I will appreciate if someone shares their packet loss stories with Mellanox FDR. My experience has been good until now - I have transferred petabytes of data over UD without seeing a loss, but I am still looking for some theoretical backing.