I tried a couple of hours to use ODP in my application without success.
Issue 1: ibv_exp_query_device() returns no odp capabilities in 'per_transport_caps'. The code snippet is as follows:
===============================================================================================
struct ibv_exp_device_attr attr;
memset(&attr,0,sizeof(struct ibv_exp_device_attr));
attr.comp_mask = IBV_EXP_DEVICE_ATTR_EXP_CAP_FLAGS | IBV_EXP_DEVICE_ATTR_ODP;
TEST_NZ(ibv_exp_query_device(ctxt,&attr),"Could not query experimental device attributes.");
printf("ODP device support:\t0x%lx\n", attr.exp_device_cap_flags & IBV_EXP_DEVICE_ODP);
printf("ODP driver support:\t0x%x\n",attr.comp_mask & IBV_EXP_DEVICE_ATTR_ODP);
printf("general_odp_caps=\t0x%lx\n",attr.odp_caps.general_odp_caps);
printf("rc_odp_caps=\t0x%x\n", attr.odp_caps.per_transport_caps.rc_odp_caps);
printf("uc_odp_caps=\t0x%x\n", attr.odp_caps.per_transport_caps.uc_odp_caps);
printf("ud_odp_caps=\t0x%x\n", attr.odp_caps.per_transport_caps.ud_odp_caps);
printf("dc_odp_caps=\t0x%x\n", attr.odp_caps.per_transport_caps.dc_odp_caps);
printf("xrc_odp_caps=\t0x%x\n", attr.odp_caps.per_transport_caps.xrc_odp_caps);
printf("raw_eth_odp_caps=\t0x%x\n", attr.odp_caps.per_transport_caps.raw_eth_odp_caps);
===============================================================================================
results:
ODP device support: 0x8000000000
ODP driver support: 0x400
general_odp_caps= 0x0
rc_odp_caps= 0x0
uc_odp_caps= 0x0
ud_odp_caps= 0x0
dc_odp_caps= 0x0
xrc_odp_caps= 0x0
raw_eth_odp_caps= 0x0
Issue 2: I just ignore Issue 1 and try to register a memory region with odp as follows:
===============================================================================================
struct ibv_exp_reg_mr_in in;
in.pd = ctxt.pd;
in.addr = ctxt.pages;
in.length = page_size*MAX_PAGE;
in.exp_access = IBV_EXP_ACCESS_ON_DEMAND|IBV_EXP_ACCESS_REMOTE_WRITE|IBV_ACCESS_REMOTE_WRITE|IBV_EXP_ACCESS_LOCAL_WRITE|IBV_ACCESS_LOCAL_WRITE|IBV_EXP_ACCESS_REMOTE_READ|IBV_ACCESS_REMOTE_READ|IBV_EXP_ACCESS_REMOTE_ATOMIC|IBV_ACCESS_REMOTE_ATOMIC;
in.comp_mask = 0;
TEST_Z(ctxt.mr=ibv_exp_reg_mr(&in),"Could not register odp mr");
===============================================================================================
I re-ran the RDMA transfer program (one-sided RDMA over RC connection), which works with out ODP, I got this error:
mlx5: compute26: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 a9005604 0800013b 0000e4d2
28855:server_routine: Completion with error at server:
28855:server_routine: Failed status 4: wr_id 3, qp_num = 315, vendor_err = 86
Simple put, ibv_poll_cq() returned a wc with status field = 4, IBV_WC_LOC_PROT_ERR (4). I also tried file mapped region and get the same error.
Issue 3: The pre-installed ib_send_bw tool from MLNX_OFED_LINUX-3.3-1.0.0.0-ubuntu16.04-x86_64.tgz works with --odp flag enabled. But if I build the 'perftest' tool from the source in the same package and run its 'ib_send_bw' tool with --odp flag, it reports error as follows:
---------------------------------------------------------------------------------------
On-demand paging not supported by driver.
failed to create mr
Failed to create MR
local address: LID 0000 QPN 0x013d PSN 0xab38a3
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:09:25
remote address: LID 0000 QPN 0x013d PSN 0xae7d06
GID: 00:00:00:00:00:00:00:00:00:00:255:255:192:168:09:26
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
mlx5: compute25: got completion with error:
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
00000000 92003204 0000013d 000085e2
Completion with error at server
Failed status 4: wr_id 0 syndrom 0x32
rcnt=0
I just suspect if the perftest binary and source match with each other. If only I can get the perftest run with odp, I think I can understand what I should do with my code.
Operationg system:
weijia@compute26:~/workspace/odp$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04 LTS"
weijia@compute26:~/workspace/odp$ uname -a
Linux compute26 4.4.0-28-generic #47-Ubuntu SMP Fri Jun 24 10:09:13 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
HCA:
CA 'mlx5_1'
CA type: MT4115
Number of ports: 1
Firmware version: 12.16.1006
Hardware version: 0
Node GUID: 0x7cfe90030080ab79
System image GUID: 0x7cfe90030080ab78
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 0
LMC: 0
SM lid: 0
Capability mask: 0x3c010000
Port GUID: 0x7efe90fffe80ab79
Link layer: Ethernet