Hello again
. I've managed to find a way to quickly trigger the "H:0x0 D:0x8 P:0x0" errors doing some stress testing using fio. Try to deploy 2-4 VMs and use the 'fio --verify=md5 --rw=write --size=8000m --bs=4k --loops=60 --runtime=60m --group_reporting --sync=1 --direct=1 --directory=/mnt/sdb1 --ioengine=libaio --numjobs=32 --thread --name=srp' command on them, the communication will work fine for 10-30 minutes, but after that you should see the 0x8 flood in the ESXi logs and the queues jumping. Firstly, I was convinced that the "DEVICE BUSY" came from the target, which have the logic to trigger such SCSI responses when the req_lim for the LUN is exceeded. But, after playing with the ibdump tool to dump the traffic, it seems that there are no such responses sent between the target and initiator LIDs. You can look for them yourself with "infiniband.bth.opcode == 4 && data.data[0] == c1 && data.data[19] != 0" filter using Wireshark and the pcap dump files.
Changing the ib_srp module parameters on ESXi doesn't help, still using a Linux initiator instead of ESXi shows that there are no such errors triggered.
"H:0x5 D:0x0 P:0x0 error" was corrected by optimizing the latency, but "H:0x0 D:0x8 P:0x0" looks like an ESXi module bug in how the initiator tracks SRP credits, according to a friend that helped me to track the issue. Hope it will be corrected in the ib_iser module coming soon.
regards