Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all articles
Browse latest Browse all 6275

OpenMPI MXM problem

$
0
0

I am probably making a stupid error, but I don't really know where I should look.

 

This is all on RHEL 6.5.

 

I have previously used both HPC-X and compiled OpenMPI against libmxm (yalla driver).

HPC-X 1.3.336 works well for me.

 

Now I am trying to install HPC-X 1.5.370  and also compile OpenMPI 1.10.2. All efforts have resulted in
code that hangs shortly after MPI_Init(). I compile Intel IMB benchmark and run it on 2 tasks using the
yalla driver and it hangs in the first MPI_Bcast() which is the first communicating routine after the initial
setup (MPI_Init/MPI_Comm_size/MPI_Comm_rank).


If disable libmxm and use "-mca pml ob1 -mca btl openib,self,sm" the program runs correctly.

 

I have tried two different versions of libmxm

HPC-X 1.3.336:  MXM_VERNO_STRING "3.3.3055"

HPC-X 1.5.370:  MXM_VERNO_STRING "3.4.3079"

 

If I build OpenMPI 1.10.2 using v 3.3 of mxm I get a working implementation with yalla.

If I use HPC-X 1.3.336 everything also works fine with yalla

If I run HPC-X 1.5.370 or if I build OpenMPI 1.10.2 against the 3.4 version of mxm I get the problem.

 

The software installed in /opt/mellanox and related software is at the same level as HPC-X 1.5.370.

 

Anyone on this list that has suggestion what may be my problem and/or how to diagnose it?


Viewing all articles
Browse latest Browse all 6275

Trending Articles