Hi There,
I have setup as follows
1. Host server - RHEL 7.0 + Virtualization (QEMU) installed mlnx_fw_nic_3.0-1.0.1.2_rhel7_x86-64.bin with SR-IOV on a mellanox CX-3 card
2. Created KVM guests and passed VFs to each guests. Installed MLNX_OFED_LINUX-3.0-2.0.1-rhel7.1-x86_64.iso for the guests
When try to do rdma connect without any default routes added then dtest works fine. but when i added route to the network based on a L3 ip i created on the switch connect timesout.. i didn't get any roce-mode so its still rocev1, why does connect work when routes are added?
# dtest -P roce1
2230 Running as server - roce1 v2
2230 Local Address AF_INET - 10.1.1.121 port 45248
2230 Server is waiting for client connection to send server info
2230 Server waiting for connect request on port b0c0
2230 Waiting for connect response
2230 CONNECTED!
2230 Send RMR msg to remote: r_key_ctx=0x60020b0d,va=0x7fb52c84b000,len=0x400000
2230 remote RMR data arrived!
2230 Received RMR from remote: r_iov: r_key_ctx=30020b0f,va=7f10bf6eb000,len=0x400000
2230 Query EP: LOCAL addr 10.1.1.121 port b0c0
2230 Query EP: REMOTE addr 10.1.1.122 port ae6b
2230 RDMA WRITE DATA with SEND MSG
2230 Sending RDMA WRITE completion message
2230 inbound rdma_write; send message arrived!
2230 Received RMR from remote: r_iov: r_key_ctx=30020b0f,va=7f10bf6eb000,len=0x400000
2230 SERVER: RDMA write buffer contains: client RDMA write data...
2230 RDMA READ DATA with SEND MSG
2230 Sending RDMA read completion message
2230 Waiting for inbound message....
2230 inbound rdma_read; send message arrived!
2230 Received RMR from remote: r_iov: r_key_ctx=30020b0f,va=7f10bf6eb000,len=0x400000
2230 SERVER: RCV RDMA read buffer contains: client RDMA read data...
2230 PING DATA with SEND MSG
2230: Message RTT: Total=1095998.05 usec, 100 bursts, itime=10959.98 usec, pc=0
2230: RDMA write (bi-direction): Total=367049.93 usec, itime=1835.25 us, poll=0, 200 x 4194304, 2285.41 MB/sec
2230: DAPL Test Complete. PASSED
[root@ps1vm2 ~]# dtest -P roce1 -h 10.1.1.121
12870 Running as client - waiting for server input
12870 Running as roce1 client v2
12870 Local Address AF_INET - 10.1.1.122 port 45248
12870 Server Name: 10.1.1.121
12870 Server Net Address: 10.1.1.121 port b0c0
12870 Waiting for connect response
12870 CONNECTED!
12870 Send RMR msg to remote: r_key_ctx=0x30020b0f,va=0x7f10bf6eb000,len=0x400000
12870 remote RMR data arrived!
12870 Received RMR from remote: r_iov: r_key_ctx=60020b0d,va=7fb52c84b000,len=0x400000
12870 Query EP: LOCAL addr 10.1.1.122 port ae6b
12870 Query EP: REMOTE addr 10.1.1.121 port b0c0
12870 RDMA WRITE DATA with SEND MSG
12870 Sending RDMA WRITE completion message
12870 inbound rdma_write; send message arrived!
12870 Received RMR from remote: r_iov: r_key_ctx=60020b0d,va=7fb52c84b000,len=0x400000
12870 CLIENT: RDMA write buffer contains: server RDMA write data...
12870 RDMA READ DATA with SEND MSG
12870 Sending RDMA read completion message
12870 Waiting for inbound message....
12870 inbound rdma_read; send message arrived!
12870 Received RMR from remote: r_iov: r_key_ctx=60020b0d,va=7fb52c84b000,len=0x400000
12870 CLIENT: RCV RDMA read buffer contains: server RDMA read data...
12870 PING DATA with SEND MSG
12870: Message RTT: Total=1092453.00 usec, 100 bursts, itime=10924.53 usec, pc=0
12870: RDMA write (bi-direction): Total=367846.97 usec, itime=1839.23 us, poll=0, 200 x 4194304, 2280.46 MB/sec
12870: DAPL Test Complete. PASSED
[root@ps1vm2 ~]#
# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.1.1.0 10.1.1.3 255.255.255.0 UG 0 0 0 ens10
10.1.1.0 0.0.0.0 255.255.255.0 U 100 0 0 ens10
10.1.2.0 0.0.0.0 255.255.255.0 U 100 0 0 ens12
# dtest -P roce1
13902 Running as server - roce1 v2
13902 Local Address AF_INET - 10.1.1.121 port 45248
13902 Server is waiting for client connection to send server info
13902 Server waiting for connect request on port b0c0
# dtest -P roce1 -h 10.1.1.121
28726 Running as client - waiting for server input
28726 Running as roce1 client v2
28726 Local Address AF_INET - 10.1.1.122 port 45248
28726 Server Name: 10.1.1.121
28726 Server Net Address: 10.1.1.121 port b0c0
28726 Waiting for connect response
ps1vm2.torolab.ibm.com:CMA:7036:8065e700: 98893068 us(98893068 us!!!): dapl_cma_active: CONN_ERR event=0x7 status=-110 TIMEOUT DST 10.1.1.121, 45248
28726 Error unexpected conn event : 0x4008 DAT_CONNECTION_EVENT_UNREACHABLE
28726 Error connect_ep: DAT_ABORT
28726 ERR: Checking ASYNC EVD...
28726 ERR: Checking RECEIVE EVD...
28726 ERR: Checking REQUEST EVD...
28726: DAPL Test Complete. FAILED
[root@ps1vm2 ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.1.1.0 10.1.1.3 255.255.255.0 UG 0 0 0 ens9
10.1.1.0 0.0.0.0 255.255.255.0 U 100 0 0 ens9
10.1.2.0 0.0.0.0 255.255.255.0 U 100 0 0 ens12
parm: roce_mode:Set RoCE modes supported by the port
A single value (e.g. 0) to define uniform preferred RoCE_mode value for all devices
or a string to map device function numbers to their RoCE mode value (e.g. '0000:04:00.0-0,002b:1c:0b.a-0').
Allowed values are 0: RoCEv1 (default), 1: RoCEv1.5, 2: RoCEv2, 3: RoCEv1.5+2 and 4: RoCEv1+2)