I have two howts machine connected by Mellanox infiniband HCA. I'm executing a simple RDMA application to perform RDMA write and RDMA read operation
from one machine (client) on the other machine (server). To know which interrupts are related to HCA cards on each machine, I ran the following command less proc/interrupts
67: | 475880 | 50253 | 0 | 0 PCI-MSI-edge | mlx4-async@pci:0000:01:00.0 |
68: | 399002 | 0 | 73 | 0 PCI-MSI-edge | mlx4_0-0 |
69: | 0 | 3264 | 23 | 0 PCI-MSI-edge | mlx4_0-1 |
70: | 0 | 0 | 0 | 0 PCI-MSI-edge | mlx4_0-2 |
71: | 0 | 0 | 0 | 0 PCI-MSI-edge | mlx4_0-3 |
On the server machine, I've experimented that using the function __disable_irq() on those 4 interrupts causes all RDMA read/write operations performed by the client to fail with the error message "transport retry counter exceeded".
My question is why and when RDMA read/write operations can generate irqs on the remote machine, I taught that they don't involve the remote CPU, then they will not perform any kind of IRQ ?
Then, why disabling those interrupts causes these operations to fail ?
Message was edited by: FOPA Léon Constantin