I'm not sure have you resolved seg 11 problem by my way.
As far as I see,I compile the openmpi with my ucx:
./configure --prefix=/usr/local/openmpi-3.1.1 --with-wrapper-ldflags=-Wl,-rpath,/lib --disable-vt --enable-orterun-prefix-by-default -disable-io-romio --enable-picky --with-cuda=/usr/local/cuda --with-ucx=/opt/ucx-cuda --enable-mem-debug --enable-debug --enable-timing
Actually, It will be less latency on GDR. What kind of net card have you been using?CX4 or CX 3?
Wish you share some test data and test environment configuration,it will be great.