Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6275 articles
Browse latest View live

Re: Configuration settings for IPoIB performance

$
0
0

Pure centos 6.5 x86_64

 

1. echo "connected" > /sys/class/net/ib0/mode

2. ifconfig ib0 mtu 65520

 

No other tuning! No external OFED install!

 

service rdma start

 

iperf -P 1 -c 172.10.11.3

------------------------------------------------------------

Client connecting to 172.10.11.3, TCP port 5001

TCP window size:  630 KByte (default)

------------------------------------------------------------

[  3] local 172.10.11.2 port 43968 connected with 172.10.11.3 port 5001

[ ID] Interval       Transfer     Bandwidth

[  3]  0.0-10.0 sec  11.6 GBytes  10.0 Gbits/sec

 

netperf -H 172.10.11.3

MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.10.11.3 () port 0 AF_INET

Recv   Send    Send                        

Socket Socket  Message  Elapsed            

Size   Size    Size     Time     Throughput

bytes  bytes   bytes    secs.    10^6bits/sec

87380  16384  16384    10.00    9954.79

 

UPDATE:

 

As you mentioned multiple threads here it goes:

 

iperf  -c 172.20.20.3  -P 4

------------------------------------------------------------

Client connecting to 172.20.20.3, TCP port 5001

TCP window size:  645 KByte (default)

------------------------------------------------------------

[  5] local 172.20.20.2 port 53514 connected with 172.20.20.3 port 5001

[  3] local 172.20.20.2 port 53513 connected with 172.20.20.3 port 5001

[  4] local 172.20.20.2 port 53512 connected with 172.20.20.3 port 5001

[  6] local 172.20.20.2 port 53515 connected with 172.20.20.3 port 5001

[ ID] Interval       Transfer     Bandwidth

[  5]  0.0-10.0 sec  6.11 GBytes  5.25 Gbits/sec

[  3]  0.0-10.0 sec  5.42 GBytes  4.66 Gbits/sec

[  4]  0.0-10.0 sec  6.70 GBytes  5.75 Gbits/sec

[  6]  0.0-10.0 sec  6.55 GBytes  5.63 Gbits/sec

[SUM]  0.0-10.0 sec  24.8 GBytes  21.3 Gbits/sec

 

~21Gbit seems to be the barrier on the hardware (E5-2620,128GB) without futher tuning

no matter the number of threads (2+).

The more threads the higher decrease in total bandwidth [SUM] for example 64 threads max only to 17Gbit.


Sys log message

$
0
0

Hi , anyone know what these messages mean, anything to worry about and if not, how can I stop them from filling up my syslog files?

 

 

Apr 25 16:12:29 hksw1001 issd[4840]: TID 1299130464: [issd.ERR]: NPAPI_ERR: err SendIgmpPacketToIgmpModule Failed Processing RxGddProcessRecvInterruptEvent

Apr 25 16:12:29 hksw1001 issd[4840]: TID 1299130464: [issd.WARNING]: NPAPI_WRN: warning RxMgrLowMain __RxMgrProcessL2Packet failed

MXM ERROR failed to create send cq: Cannot allocate memory

$
0
0

I am trying to setup a small HPC cluster using Mellanox MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] cards and a M3601Q switch to work with openMPI and SLURM.

 

I have read that I need to activate MXM support when compiling openMPI. I have solved a lot of small problems, but I have this problem now. The openMPI jobs crash and tell they cannot allocate the memory:

 

mpirun noticed that process rank 3 with PID 2208 on node node01 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
MXM: Got signal 15 (Terminated)
[1398427878.955592] [node02:2198 :0]      cib_ep.c:261  MXM  ERROR failed to create send cq: Cannot allocate memory
[1398427878.957108] [node02:2198 :0]      cib_ep.c:93   MXM  ERROR Failed to cancel async thread.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
[1398427878.955621] [node02:2199 :0]      cib_ep.c:261  MXM  ERROR failed to create send cq: Cannot allocate memory
[1398427878.957105] [node02:2199 :0]      cib_ep.c:93   MXM  ERROR Failed to cancel async thread.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
[1398427878.955550] [node02:2200 :0]      cib_ep.c:261  MXM  ERROR failed to create send cq: Cannot allocate memory
[1398427878.957063] [node02:2200 :0]      cib_ep.c:93   MXM  ERROR Failed to cancel async thread.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
MXM: Got signal 15 (Terminated)
[1398427878.958152] [node02:2202 :0]      cib_ep.c:261  MXM  ERROR failed to create send cq: Cannot allocate memory
[1398427878.959497] [node02:2202 :0]      cib_ep.c:93   MXM  ERROR Failed to cancel async thread.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
MXM: Got signal 15 (Terminated)
[1398427878.963706] [node02:2204 :0]      cib_ep.c:261  MXM  ERROR failed to create send cq: Cannot allocate memory
[1398427878.965081] [node02:2204 :0]      cib_ep.c:93   MXM  ERROR Failed to cancel async thread.
MXM: Got signal 11 (Segmentation fault)
==== backtrace ====
4 total processes killed (some possibly by mpirun during cleanup)

 

moreover, when I try to run the jobs with slurm command SRUN I get the following error:

 

    http://www.open-mpi.org/faq/?category=openfabrics#ib-locked-pages
--------------------------------------------------------------------------
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.

Local host:   node01
Local device: mlx4_0
--------------------------------------------------------------------------
--------------------------------------------------------------------------
The OpenFabrics (openib) BTL failed to initialize while trying to
allocate some locked memory.  This typically can indicate that the
memlock limits are set too low.  For most HPC installations, the
memlock limits should be set to "unlimited".  The failure occured
here:


  Local host:    node01
  OMPI source:   btl_openib_component.c:1216
  Function:      ompi_free_list_init_ex_new()
  Device:        mlx4_0
  Memlock limit: 65536

[I cut the error message, it is repetitive for each node]

I have read here that I should modify the MTT values. I have followed the procedure, but I still get the same error. Does anyone know how to troubleshoot this?

 

 

Thanks in advance

Kind regards,

Andrea

 

NB: I have compiled openMPI also with slurm support and MXM.

Re: MXM ERROR failed to create send cq: Cannot allocate memory

$
0
0

For MTT values, you no longer need to set them if you are using MLNX_OFED 2.x versions. So which version of MLNX_OFED are you using?

 

This seems that if you are getting memlock limit to 64KB, it has something to do with environment when you are running the job. I have seen similar for other job schedulers (Setting Up TORQUE for Running HPC Jobs), but I haven't run into it for SLURM. I thought SLURM would just use whatever you have set, unlike other job schedulers. So I think it has something to do with your environment.

 

Usually the MLNX_OFED install would update some values in /etc/sysctl.conf, then after a reboot you should be all set. So if you are running outside of a job scheduler, you should check for ulimit -l.

 

Maybe upgrade to the latest MLNX_OFED and see if it improves?

Re: IPoIB performance issue!!

$
0
0

Thanks a lot "yairi" and "ingvar_j" for your response. Finally I was able to achieve ~20 Gbs on FDR.

 

with

OFED : MLNX_OFED_LINUX-2.0-3.0.0 (OFED-2.0-3.0.0)

Firmaware: 2.30.3000

Infiniband card type: x4 FDR, Rate: 56Gbps

 

CDM: netperf -H 192.168.180.101 -t TCP_STREAM

Re: Sys log message

$
0
0

Hi gandriano,

 

Can you provide the code level your Mellanox Ethernet Switch is running on?

I think it's a harmless message related to an Older code level, but I would like to get your code level just to be sure.

Re: SX1012 Openflow => Defense4All via OpenDaylight Controller

$
0
0

Hi thanks for that, I was really just looking at options. In my case the DC are going to put in some thunder tps boxes from a10networks which are specifically for ddos

 

But out of path ddos protection is very interesting , just hoping for more apps and doco to come out around openflow in this area

 

Regards Daniel

 

Re: Configuration settings for IPoIB performance

$
0
0

Thanks wesb!

 

I upgraded the IB drivers to latest Mellanox 2.1, set the system to "connected" mode and MTU size to 65520. Now, I'm getting performance of ~13.5Gb/s.

I'm getting much less performance than what you noted. Not sure if it is a QDR vs FDR performance difference. I've QDR setup with X5650, 24GB memory, 8x PCI slot.


Re: Configuration settings for IPoIB performance

$
0
0

For the record I measured it with:

ConnectX-2 cards

QDR switch

2 * E5-2620/128GB boxes

Again, no playing with external OFED - just yum groupinstall "Infiniband Support"

Re: MXM ERROR failed to create send cq: Cannot allocate memory

$
0
0

Good morning, thanks a lot for your answer.

I spent the whole day yesterday trying to troubleshoot this problem. You are right, I had a system locked memory of 64Kb, which was peanuts.

 

I have solved these problems, I had to:

- add knem to the kernel modules that are loaded

- add a special line for root in /etc/security/limits.conf (apparently MXM memory-related stuff is used by root, even if it is a normal user that launches the SRUN command), the line is: root - memlock unlimited so now in this file I have:

  1. root - memlock unlimited
  2. * - memlock unlimited

- chmod 0666 /dev/knem (so that it is usable by anyone)

 

so now my basic "hello world" program works, and I have other problems, but I think they are related to programming and not to the cluster configuration.

 

Thanks a lot for your precious help.

Regards,

Andrea

Re: MXM ERROR failed to create send cq: Cannot allocate memory

$
0
0

Good to know it works finally for you. The knem improves only for the large messages transfers betwen MPI processes within the node. It must be that memlock limit you set in the /etc/security/limits.conf made it work. You can mark the question as answered. Thanks for the update!

Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

$
0
0

Hi all,

 

we are trying to use a ConnectX-3 VPI adapter card over XenServer 6.2, for this we have followed the following steps:

1) Install XenServer 6.2 on my system (Supermicro 1027GR-TRF)
2) Install XenServer 6.2 updates (service pack 1) following the directions from http://support.citrix.com/article/CTX138115#XenServer 6.2

3) Install MLNX_OFED 2.1-1.0.6 and update firmware (./mlnxofedinstall --force-fw-update). Installation and update have been completed without errors. The firmware version is 2.30.8000.

After that, we have restarted the system and the openibd service but the InfiniBand network interfaces have not been detected:

[root@xenserver ~]# /etc/init.d/openibd restart

hostname: `Host' unknown

Unloading HCA driver:                                      [  OK  ]
Loading HCA driver and Access Layer:                       [  OK  ]
Setting up InfiniBand network interfaces:
Setting up service network . . .                           [  done  ]

I.e. the ib0 network interface has not been detected by the system.
Do you know if XenServer 6.2 works with ConnectX-3? Could you please get some information about that? We have devoted a lot of time and now we wonder whether it really works.


Re: Configuration settings for IPoIB performance

$
0
0

You cannot expect much better performance than this, you are hitting CPU barriers as all TCP error correction with IPOIB is done in software, not in hardware.

 

You can try adjusting TCP window settings in sysctl to send larger frames, and this may improve performance slightly, but really, you can only expect to reach the full potential of infiniband fabric by using native IB protocols, iSER / SRP etc etc

Re: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

$
0
0

Have you tried version 1_5_3-4_0_42 instead? You may need to contact Mellanox support for XenServer driver.

Re: Strange рerformance problem with CX354A - ConnectX-3 QSFP with large messages

$
0
0

Have you tried to connect to a different port on the switch to see if this problem exists with the port or the cable?


Re: Strange рerformance problem with CX354A - ConnectX-3 QSFP with large messages

$
0
0

Yes, I tried even different pair of servers with different cables and ports.

Re: Issues with SRP and unexplainable "ibping" behaviour.

$
0
0

Hey,

Have you tried upgrading the Firmware version?

Re: Are ConnectX-3 VPI adapters compatible with the Red Hat MRG (i.e. real-time patch)?

$
0
0

Hi,

I've checked that and it  does not yet support Red Hat Enterprise MRG.

Re: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

$
0
0

Hi, I have tried with MLNX_OFED_LINUX-2.2-1.0.1-xenserver6.x-i686 and the firmware update, but I have obtained the same results that was shown on the first post. So I have some questions:

Did you use XenServer 6.2 (Service Pack 1)?

What's the command used to install the ofed? (I have used ./mlnxofedinstall --force-firmware-update)

Did you use another additional driver or only the mlnx_ofed software stack?

 

 

Thank you in advance.

Re: PCIe Bracket for Connect-X EN2

Viewing all 6275 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>