Re: Configuration settings for IPoIB performance

April 24, 2014, 12:25 am

≫ Next: Sys log message

≪ Previous: Re: Configuration settings for IPoIB performance

Pure centos 6.5 x86_64

1. echo "connected" > /sys/class/net/ib0/mode

2. ifconfig ib0 mtu 65520

No other tuning! No external OFED install!

service rdma start

iperf -P 1 -c 172.10.11.3

------------------------------------------------------------

Client connecting to 172.10.11.3, TCP port 5001

TCP window size: 630 KByte (default)

------------------------------------------------------------

[ 3] local 172.10.11.2 port 43968 connected with 172.10.11.3 port 5001

[ ID] Interval Transfer Bandwidth

[ 3] 0.0-10.0 sec 11.6 GBytes 10.0 Gbits/sec

netperf -H 172.10.11.3

MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.10.11.3 () port 0 AF_INET

Recv Send Send

Socket Socket Message Elapsed

Size Size Size Time Throughput

bytes bytes bytes secs. 10^6bits/sec

87380 16384 16384 10.00 9954.79

UPDATE:

As you mentioned multiple threads here it goes:

iperf -c 172.20.20.3 -P 4

------------------------------------------------------------

Client connecting to 172.20.20.3, TCP port 5001

TCP window size: 645 KByte (default)

------------------------------------------------------------

[ 5] local 172.20.20.2 port 53514 connected with 172.20.20.3 port 5001

[ 3] local 172.20.20.2 port 53513 connected with 172.20.20.3 port 5001

[ 4] local 172.20.20.2 port 53512 connected with 172.20.20.3 port 5001

[ 6] local 172.20.20.2 port 53515 connected with 172.20.20.3 port 5001

[ ID] Interval Transfer Bandwidth

[ 5] 0.0-10.0 sec 6.11 GBytes 5.25 Gbits/sec

[ 3] 0.0-10.0 sec 5.42 GBytes 4.66 Gbits/sec

[ 4] 0.0-10.0 sec 6.70 GBytes 5.75 Gbits/sec

[ 6] 0.0-10.0 sec 6.55 GBytes 5.63 Gbits/sec

[SUM] 0.0-10.0 sec 24.8 GBytes 21.3 Gbits/sec

~21Gbit seems to be the barrier on the hardware (E5-2620,128GB) without futher tuning

no matter the number of threads (2+).

The more threads the higher decrease in total bandwidth [SUM] for example 64 threads max only to 17Gbit.

↧

Sys log message

April 25, 2014, 1:40 am

≫ Next: MXM ERROR failed to create send cq: Cannot allocate memory

≪ Previous: Re: Configuration settings for IPoIB performance

Hi , anyone know what these messages mean, anything to worry about and if not, how can I stop them from filling up my syslog files?

Apr 25 16:12:29 hksw1001 issd[4840]: TID 1299130464: [issd.ERR]: NPAPI_ERR: err SendIgmpPacketToIgmpModule Failed Processing RxGddProcessRecvInterruptEvent

Apr 25 16:12:29 hksw1001 issd[4840]: TID 1299130464: [issd.WARNING]: NPAPI_WRN: warning RxMgrLowMain __RxMgrProcessL2Packet failed

↧

MXM ERROR failed to create send cq: Cannot allocate memory

April 25, 2014, 5:26 am

≫ Next: Re: MXM ERROR failed to create send cq: Cannot allocate memory

≪ Previous: Sys log message

I am trying to setup a small HPC cluster using Mellanox MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] cards and a M3601Q switch to work with openMPI and SLURM.

I have read that I need to activate MXM support when compiling openMPI. I have solved a lot of small problems, but I have this problem now. The openMPI jobs crash and tell they cannot allocate the memory:

mpirun noticed that process rank 3 with PID 2208 on node node01 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
[1398427878.955592] [node02:2198 :0]      cib_ep.c:261 MXM ERROR failed to create send cq: Cannot allocate memory
[1398427878.957108] [node02:2198 :0]      cib_ep.c:93   MXM ERROR Failed to cancel async thread.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
[1398427878.955621] [node02:2199 :0]      cib_ep.c:261 MXM ERROR failed to create send cq: Cannot allocate memory
[1398427878.957105] [node02:2199 :0]      cib_ep.c:93   MXM ERROR Failed to cancel async thread.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
[1398427878.955550] [node02:2200 :0]      cib_ep.c:261 MXM ERROR failed to create send cq: Cannot allocate memory
[1398427878.957063] [node02:2200 :0]      cib_ep.c:93   MXM ERROR Failed to cancel async thread.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
MXM: Got signal 15 (Terminated)
[1398427878.958152] [node02:2202 :0]      cib_ep.c:261 MXM ERROR failed to create send cq: Cannot allocate memory
[1398427878.959497] [node02:2202 :0]      cib_ep.c:93   MXM ERROR Failed to cancel async thread.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
MXM: Got signal 15 (Terminated)
[1398427878.963706] [node02:2204 :0]      cib_ep.c:261 MXM ERROR failed to create send cq: Cannot allocate memory
[1398427878.965081] [node02:2204 :0]      cib_ep.c:93   MXM ERROR Failed to cancel async thread.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
4 total processes killed (some possibly by mpirun during cleanup)

moreover, when I try to run the jobs with slurm command SRUN I get the following error:

http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

Local host: node01
Local device: mlx4_0
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory. This typically can indicate that the
memlock limits are set too low. For most HPC installations, the
memlock limits should be set to "unlimited". The failure occured
here:

Local host:    node01
OMPI source:   btl_openib_component.c:1216
Function:      ompi_free_list_init_ex_new()
Device:        mlx4_0
Memlock limit: 65536

[I cut the error message, it is repetitive for each node]

I have read here that I should modify the MTT values. I have followed the procedure, but I still get the same error. Does anyone know how to troubleshoot this?

Thanks in advance

Kind regards,

Andrea

NB: I have compiled openMPI also with slurm support and MXM.

↧

Re: MXM ERROR failed to create send cq: Cannot allocate memory

April 25, 2014, 4:09 pm

≫ Next: Re: IPoIB performance issue!!

≪ Previous: MXM ERROR failed to create send cq: Cannot allocate memory

For MTT values, you no longer need to set them if you are using MLNX_OFED 2.x versions. So which version of MLNX_OFED are you using?

This seems that if you are getting memlock limit to 64KB, it has something to do with environment when you are running the job. I have seen similar for other job schedulers (Setting Up TORQUE for Running HPC Jobs), but I haven't run into it for SLURM. I thought SLURM would just use whatever you have set, unlike other job schedulers. So I think it has something to do with your environment.

Usually the MLNX_OFED install would update some values in /etc/sysctl.conf, then after a reboot you should be all set. So if you are running outside of a job scheduler, you should check for ulimit -l.

Maybe upgrade to the latest MLNX_OFED and see if it improves?

↧

Re: IPoIB performance issue!!

April 25, 2014, 9:36 pm

≫ Next: Re: Sys log message

≪ Previous: Re: MXM ERROR failed to create send cq: Cannot allocate memory

Thanks a lot "yairi" and "ingvar_j" for your response. Finally I was able to achieve ~20 Gbs on FDR.

with

OFED : MLNX_OFED_LINUX-2.0-3.0.0 (OFED-2.0-3.0.0)

Firmaware: 2.30.3000

Infiniband card type: x4 FDR, Rate: 56Gbps

CDM: netperf -H 192.168.180.101 -t TCP_STREAM

↧

Re: Sys log message

April 26, 2014, 1:23 am

≫ Next: Re: SX1012 Openflow => Defense4All via OpenDaylight Controller

≪ Previous: Re: IPoIB performance issue!!

Hi gandriano,

Can you provide the code level your Mellanox Ethernet Switch is running on?

I think it's a harmless message related to an Older code level, but I would like to get your code level just to be sure.

↧

Re: SX1012 Openflow => Defense4All via OpenDaylight Controller

April 26, 2014, 2:50 am

≫ Next: Re: Configuration settings for IPoIB performance

≪ Previous: Re: Sys log message

Hi thanks for that, I was really just looking at options. In my case the DC are going to put in some thunder tps boxes from a10networks which are specifically for ddos

But out of path ddos protection is very interesting , just hoping for more apps and doco to come out around openflow in this area

Regards Daniel

↧

Re: Configuration settings for IPoIB performance

April 28, 2014, 2:32 pm

≫ Next: Re: Configuration settings for IPoIB performance

≪ Previous: Re: SX1012 Openflow => Defense4All via OpenDaylight Controller

Thanks wesb!

I upgraded the IB drivers to latest Mellanox 2.1, set the system to "connected" mode and MTU size to 65520. Now, I'm getting performance of ~13.5Gb/s.

I'm getting much less performance than what you noted. Not sure if it is a QDR vs FDR performance difference. I've QDR setup with X5650, 24GB memory, 8x PCI slot.

↧

Re: Configuration settings for IPoIB performance

April 29, 2014, 12:48 am

≫ Next: Re: MXM ERROR failed to create send cq: Cannot allocate memory

≪ Previous: Re: Configuration settings for IPoIB performance

For the record I measured it with:

ConnectX-2 cards

QDR switch

2 * E5-2620/128GB boxes

Again, no playing with external OFED - just yum groupinstall "Infiniband Support"

↧

Re: MXM ERROR failed to create send cq: Cannot allocate memory

April 29, 2014, 3:08 am

≫ Next: Re: MXM ERROR failed to create send cq: Cannot allocate memory

≪ Previous: Re: Configuration settings for IPoIB performance

Good morning, thanks a lot for your answer.

I spent the whole day yesterday trying to troubleshoot this problem. You are right, I had a system locked memory of 64Kb, which was peanuts.

I have solved these problems, I had to:

- add knem to the kernel modules that are loaded

- add a special line for root in /etc/security/limits.conf (apparently MXM memory-related stuff is used by root, even if it is a normal user that launches the SRUN command), the line is: root - memlock unlimited so now in this file I have:

root - memlock unlimited
* - memlock unlimited

- chmod 0666 /dev/knem (so that it is usable by anyone)

so now my basic "hello world" program works, and I have other problems, but I think they are related to programming and not to the cluster configuration.

Thanks a lot for your precious help.

Regards,

Andrea

↧

Re: MXM ERROR failed to create send cq: Cannot allocate memory

April 29, 2014, 8:49 am

≫ Next: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

≪ Previous: Re: MXM ERROR failed to create send cq: Cannot allocate memory

Good to know it works finally for you. The knem improves only for the large messages transfers betwen MPI processes within the node. It must be that memlock limit you set in the /etc/security/limits.conf made it work. You can mark the question as answered. Thanks for the update!

↧

Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

April 30, 2014, 4:09 am

≫ Next: Re: Configuration settings for IPoIB performance

≪ Previous: Re: MXM ERROR failed to create send cq: Cannot allocate memory

Hi all,

we are trying to use a ConnectX-3 VPI adapter card over XenServer 6.2, for this we have followed the following steps:

1) Install XenServer 6.2 on my system (Supermicro 1027GR-TRF)
2) Install XenServer 6.2 updates (service pack 1) following the directions from http://support.citrix.com/article/CTX138115#XenServer 6.2

3) Install MLNX_OFED 2.1-1.0.6 and update firmware (./mlnxofedinstall --force-fw-update). Installation and update have been completed without errors. The firmware version is 2.30.8000.

After that, we have restarted the system and the openibd service but the InfiniBand network interfaces have not been detected:

[root@xenserver ~]# /etc/init.d/openibd restart

hostname: `Host' unknown

Unloading HCA driver:                                      [ OK ]
Loading HCA driver and Access Layer:                       [ OK ]
Setting up InfiniBand network interfaces:
Setting up service network . . .                           [ done ]

I.e. the ib0 network interface has not been detected by the system.
Do you know if XenServer 6.2 works with ConnectX-3? Could you please get some information about that? We have devoted a lot of time and now we wonder whether it really works.

↧

Re: Configuration settings for IPoIB performance

April 30, 2014, 4:09 pm

≫ Next: Re: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

≪ Previous: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

You cannot expect much better performance than this, you are hitting CPU barriers as all TCP error correction with IPOIB is done in software, not in hardware.

You can try adjusting TCP window settings in sysctl to send larger frames, and this may improve performance slightly, but really, you can only expect to reach the full potential of infiniband fabric by using native IB protocols, iSER / SRP etc etc

↧

Re: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

April 30, 2014, 5:26 pm

≫ Next: Re: Strange рerformance problem with CX354A - ConnectX-3 QSFP with large messages

≪ Previous: Re: Configuration settings for IPoIB performance

Have you tried version 1_5_3-4_0_42 instead? You may need to contact Mellanox support for XenServer driver.

↧

Re: Strange рerformance problem with CX354A - ConnectX-3 QSFP with large messages

May 6, 2014, 7:40 pm

≫ Next: Re: Strange рerformance problem with CX354A - ConnectX-3 QSFP with large messages

≪ Previous: Re: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

Have you tried to connect to a different port on the switch to see if this problem exists with the port or the cable?

↧

Re: Strange рerformance problem with CX354A - ConnectX-3 QSFP with large messages

May 6, 2014, 8:54 pm

≫ Next: Re: Issues with SRP and unexplainable "ibping" behaviour.

≪ Previous: Re: Strange рerformance problem with CX354A - ConnectX-3 QSFP with large messages

Yes, I tried even different pair of servers with different cables and ports.

↧

Re: Issues with SRP and unexplainable "ibping" behaviour.

May 7, 2014, 1:31 am

≫ Next: Re: Are ConnectX-3 VPI adapters compatible with the Red Hat MRG (i.e. real-time patch)?

≪ Previous: Re: Strange рerformance problem with CX354A - ConnectX-3 QSFP with large messages

Hey,

Have you tried upgrading the Firmware version?

↧

Re: Are ConnectX-3 VPI adapters compatible with the Red Hat MRG (i.e. real-time patch)?

May 7, 2014, 2:04 am

≫ Next: Re: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

≪ Previous: Re: Issues with SRP and unexplainable "ibping" behaviour.

Hi,

I've checked that and it does not yet support Red Hat Enterprise MRG.

↧

Re: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

May 7, 2014, 4:16 am

≫ Next: Re: PCIe Bracket for Connect-X EN2

≪ Previous: Re: Are ConnectX-3 VPI adapters compatible with the Red Hat MRG (i.e. real-time patch)?

Hi, I have tried with MLNX_OFED_LINUX-2.2-1.0.1-xenserver6.x-i686 and the firmware update, but I have obtained the same results that was shown on the first post. So I have some questions:

Did you use XenServer 6.2 (Service Pack 1)?

What's the command used to install the ofed? (I have used ./mlnxofedinstall --force-firmware-update)

Did you use another additional driver or only the mlnx_ofed software stack?

Thank you in advance.

↧

Re: PCIe Bracket for Connect-X EN2

May 7, 2014, 4:18 am

≫ Next: Re: Corrupt Voltaire 9024-m firmware help?

≪ Previous: Re: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

Hi,

I'll recommend contacting one of Mellanox sales department,depending on your regional.

↧