I am probably making a stupid error, but I don't really know where I should look.
This is all on RHEL 6.5.
I have previously used both HPC-X and compiled OpenMPI against libmxm (yalla driver).
HPC-X 1.3.336 works well for me.
Now I am trying to install HPC-X 1.5.370 and also compile OpenMPI 1.10.2. All efforts have resulted in
code that hangs shortly after MPI_Init(). I compile Intel IMB benchmark and run it on 2 tasks using the
yalla driver and it hangs in the first MPI_Bcast() which is the first communicating routine after the initial
setup (MPI_Init/MPI_Comm_size/MPI_Comm_rank).
If disable libmxm and use "-mca pml ob1 -mca btl openib,self,sm" the program runs correctly.
I have tried two different versions of libmxm
HPC-X 1.3.336: MXM_VERNO_STRING "3.3.3055"
HPC-X 1.5.370: MXM_VERNO_STRING "3.4.3079"
If I build OpenMPI 1.10.2 using v 3.3 of mxm I get a working implementation with yalla.
If I use HPC-X 1.3.336 everything also works fine with yalla
If I run HPC-X 1.5.370 or if I build OpenMPI 1.10.2 against the 3.4 version of mxm I get the problem.
The software installed in /opt/mellanox and related software is at the same level as HPC-X 1.5.370.
Anyone on this list that has suggestion what may be my problem and/or how to diagnose it?