Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6275 articles
Browse latest View live

Re: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

$
0
0

Hi !

you're right. after installing OFED 1.5.4, my system can detect ib0, and it works!

 

thanks for your help


Re: XenServer 6.2 SP1 can't find device ib0

$
0
0

Hi!

installing OFED 1.5.4 version solved this problem. now my system can detect ib0 and ipoib works!

thanks

Re: The MaxReadRequest size is set too low (512 bytes)

$
0
0

Hi Matt,

 

There is (or at least used to be) a variable in the .ini configuration file where you could set this. The variable was 'default_max_read_request_size'. I'll have to check to see if this is still active. Thanks.

create_ibv_flow() Create of QP flow ID failed

$
0
0

Dear Mellanox Support,

 

I've problems with OFED 2.2. Neither raw_ethernet_bw / raw_ethernet_lat nor libvma can create a QP flow / call verbs.

At some point on that machine it used to run fine but then we added more network interfaces and it seems to break it somehow.

 

I got all the latest MLNX software & firmware. Flow steering is disabled and there is nothing useful in dmesg. Could you please help me to figure out what's wrong? See basic info below.

 

[root@aurarb01 log]# uname -a

Linux aurarb01 2.6.32-431.17.1.el6.x86_64 #1 SMP Fri Apr 11 17:27:00 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

[root@aurarb01 log]# ofed_info

MLNX_OFED_LINUX-2.2-1.0.1 (OFED-2.2-1.0.0):

 

[root@aurarb01 log]# raw_ethernet_bw --server

Max msg size in RawEth is MTU 1518

Changing msg size to this MTU

---------------------------------------------------------------------------------------

                    Send BW Test

Dual-port       : OFF Device         : mlx4_1

Number of qps   : 1 Transport type : IB

Connection type : RawEth Using SRQ      : OFF

RX depth        : 512

CQ Moderation   : 100

Mtu             : 1518[B]

Link type       : Ethernet

Gid index       : 0

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

MAC attached  : 24:BE:05:81:95:90

error: Function not implemented

Couldn't attach QP

[root@aurarb01 log]# LD_PRELOAD=libvma.so sockperf sr

VMA INFO   : ---------------------------------------------------------------------------

VMA INFO   : VMA_VERSION: 6.6.4-0 Release built on 2014-04-23-16:54:38

VMA INFO   : Cmd Line: sockperf sr

VMA INFO   : OFED Version: MLNX_OFED_LINUX-2.2-1.0.1:

VMA INFO   : Log Level                      3                          [VMA_TRACELEVEL]

VMA INFO   : ---------------------------------------------------------------------------

sockperf: == version #2.5.233 ==

VMA ERROR  : rfs[0x23888f0]:188:create_ibv_flow() Create of QP flow ID failed with flow dst:172.17.38.37:11111, src:0.0.0.0:0, protocol:UDP

VMA ERROR  : rfs[0x241aa70]:188:create_ibv_flow() Create of QP flow ID failed with flow dst:172.17.38.38:11111, src:0.0.0.0:0, protocol:UDP

VMA ERROR  : rfs[0x243e240]:188:create_ibv_flow() Create of QP flow ID failed with flow dst:172.17.38.36:11111, src:0.0.0.0:0, protocol:UDP

sockperf: [SERVER] listen on:

[ 0] IP = 0.0.0.0         PORT = 11111 # UDP

sockperf: Warmup stage (sending a few dummy messages)...


[root@aurarb01 log]# ibv_devinfo

hca_id: mlx4_1

  transport: InfiniBand (0)

  fw_ver: 2.31.5050

  node_guid: 24be:05ff:ff81:9590

  sys_image_guid: 24be:05ff:ff81:9593

  vendor_id: 0x02c9

  vendor_part_id: 4099

  hw_ver: 0x1

  board_id: HP_0280210019

  phys_port_cnt: 2

  port: 1

  state: PORT_ACTIVE (4)

  max_mtu: 4096 (5)

  active_mtu: 1024 (3)

  sm_lid: 0

  port_lid: 0

  port_lmc: 0x00

  link_layer: Ethernet

 

  port: 2

  state: PORT_ACTIVE (4)

  max_mtu: 4096 (5)

  active_mtu: 1024 (3)

  sm_lid: 0

  port_lid: 0

  port_lmc: 0x00

  link_layer: Ethernet

 

hca_id: mlx4_0

  transport: InfiniBand (0)

  fw_ver: 2.31.5050

  node_guid: 24be:05ff:ff94:fae0

  sys_image_guid: 24be:05ff:ff94:fae3

  vendor_id: 0x02c9

  vendor_part_id: 4099

  hw_ver: 0x1

  board_id: HP_0230240009

  phys_port_cnt: 2

  port: 1

  state: PORT_ACTIVE (4)

  max_mtu: 4096 (5)

  active_mtu: 1024 (3)

  sm_lid: 0

  port_lid: 0

  port_lmc: 0x00

  link_layer: Ethernet

 

  port: 2

  state: PORT_DOWN (1)

  max_mtu: 4096 (5)

  active_mtu: 1024 (3)

  sm_lid: 0

 

[root@aurarb01 log]# ifconfig

eth0      Link encap:Ethernet  HWaddr 24:BE:05:94:FA:E1 

          inet addr:172.XX.XX.XX  Bcast:172.17.38.39  Mask:255.255.255.248

          inet6 addr: fe80::26be:5ff:fe94:fae1/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:3987 errors:0 dropped:0 overruns:0 frame:0

          TX packets:11 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:300512 (293.4 KiB)  TX bytes:812 (812.0 b)

 

 

eth2      Link encap:Ethernet  HWaddr 24:BE:05:81:95:90 

          inet addr:172.XX.XX.XX  Bcast:172.17.38.39  Mask:255.255.255.248

          inet6 addr: fe80::26be:5ff:fe81:9590/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:3978 errors:0 dropped:0 overruns:0 frame:0

          TX packets:9 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:299904 (292.8 KiB)  TX bytes:684 (684.0 b)

 

 

eth3      Link encap:Ethernet  HWaddr 24:BE:05:81:95:91 

          inet addr:172.XX.XX.XX  Bcast:172.17.38.39  Mask:255.255.255.248

          inet6 addr: fe80::24be:500:181:9591/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:3972 errors:0 dropped:0 overruns:0 frame:0

          TX packets:9 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:299492 (292.4 KiB)  TX bytes:684 (684.0 b)

 

 

eth5      Link encap:Ethernet  HWaddr 38:EA:A7:36:99:6A 

          inet addr:10.128.XX.XX  Bcast:10.128.178.63  Mask:255.255.255.224

          inet6 addr: fe80::3aea:a7ff:fe36:996a/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:869904 errors:0 dropped:0 overruns:0 frame:0

          TX packets:1318759 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:447971625 (427.2 MiB)  TX bytes:1615917933 (1.5 GiB)

          Memory:f3f00000-f4000000

 

 

lo        Link encap:Local Loopback 

          inet addr:127.0.0.1  Mask:255.0.0.0

          inet6 addr: ::1/128 Scope:Host

          UP LOOPBACK RUNNING  MTU:16436  Metric:1

          RX packets:747 errors:0 dropped:0 overruns:0 frame:0

          TX packets:747 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0

          RX bytes:70889 (69.2 KiB)  TX bytes:70889 (69.2 KiB)

Firmware for Mellanox cards in Intel S2600JFQ

$
0
0

Hi,

 

I'm having a set of machines with Intel S2600JFQ motherboards which have Mellanox Infiniband cards (PSID: INCX-3I358C10501 ), Since I'm having RHEL/Centos 6.5 installed on my systems I've had to install the latest stack of the MLNX_OFED software, but it was always complaining about the firmware version of the cards (it is 2.11.1308). Theoretically the OFED stack requires something like 2.30.XXX. Despite that, things were working reasonably fine. Recently I've noticed that the same cards on S2600JFQs but slightly newer revisions (PSID: INCX-3I358C10551 ) have got a firmware update:  http://www.mellanox.com/page/firmware_table_Intel?mtag=oem_firmware_download. I was wondering whether I should expect maybe a firmware update for my cards as well, or all hope is lost ?  Also it is interesting what people think about running recent OFED versions on slightly outdated firmware (because that seems to be the only alternative to actually updating firmware)

 

Thank you,
    Sergey

error 104 on socket

$
0
0

Hi,

 

I am trying to run a working socket-based program over MT4099 in EN mode.

 

The connection is established but recv() report errno 104 (connection reset by peer) on the first four-byte attempt. No real RST si seen on the wire. What am I doing wrong?

 

The app is a server listening on port 445 (SMB).

 

Mark

Does MHQH29C-XTR support port type management via connectx_port_config?

$
0
0

Hi all,

 

I'm thinking of purchasing a MHQH29C-XTR to use with IB infrastructure and a linux host. However, i would like to reserve the ability to change the port type using a QSR adapter to switch to 10G Ethernet for testing. Does this card support this? If so ... can it support Ethernet on one port and IB on the other?

 

Unfortunately i have not found an example of this exotic config in the docs. There is a hint that the chip would support this in this document.

 

http://www.mellanox.com/related-docs/user_manuals/ConnectX_2_%20VPI_UserManual_MHZH29.pdf

 

However, this card actually has 2 different ports for QSFP and STP+ unlike the solution i'm looking for.

 

Thanks in advance! 

how do I update the firmware on MT23108 when kernel is 3.7.10

$
0
0

I have MT23108 firmware 3.3.5 and I want to update the firmware. The card is working under OpenSuse 12.3  but MSTFLINT  does this

 

mstflint -d 5:00.0 -i fw-23108-3_5_000-380299-B21_A1.bin b

-E- Can not open 5:00.0:  MFE_OLD_DEVICE_TYPE

 

The card works

# ibstat

CA 'mthca0'

        CA type: MT23108

        Number of ports: 2

        Firmware version: 3.3.5

        Hardware version: a1

        Node GUID: 0x0019bbffff007880

        System image GUID: 0x0019bbffff007883

        Port 1:

                State: Active

                Physical state: LinkUp

                Rate: 10

                Base lid: 1

       ....

 

Do i have to build an old  linux system just to update the cards?


Re: Mellanox (old Voltaire) ISR9024D-M recover flash area

Re: Hyper-V vSwitch speed problems

$
0
0

I think that 1.4GB/s performance derived from SMB Direct!

But Hyper-V vSwitch performance is diffrent factor.

 

I remember something exists tunning guide for that circumstances.

i'll search and return for you...:)

Re: Hyper-V vSwitch speed problems

$
0
0

Any progress on finding some tips on how to set things up inbusiness?

Re: Mellanox 40 GbE and Dell Powerconnect 8164

Re: Hyper-V vSwitch speed problems

$
0
0

Hi!

Unfortunately, there is nothing information to help you.

I think that Microsoft focus SMB Direct only, because since introduction of Hyper-V shared nothing migration then Microsoft show a some product support. ie. MSSQL support SMB Shared folder, etc.

 

I have experience about Live Migration via SMB Direct but Hyper-V vSwitch was just a fast above 10Gb Network.

 

Maybe wait for Microsoft (also VMware,too) will be ready to support RDMA on their hypervisor...:)

Re: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

$
0
0

Hi all,

 

Finally the problem was fixed. After several tests, it seems that the problem was due to hardware incompatibility.

We have repeated the same steps in other node (with different integrated board) and all seems to be ok.

ib_send_bw test with ConnectX-3 VPI adapter card over XenServer 6.2 fails with message: Failed to modify QP 100 to RTR

$
0
0

Hi all,

 

 

we are testing a ConnectX-3 card with XenServer 6.2. For that, we have two nodes, the first node has XenServer 6.2 dom0 and the second node has CentOS 6.4. In both nodes we have installed the MLNX_OFED version 2.2-1.0.1.

 

 

If we perform a ibping between the two nodes, all seems to be ok. But when we try to perform a ib_send_bw test, we have problems  depending who is the server. If the server is the node with CentOS 6.4, the test ends correctly.

 

Server -> CentOS 6.4 and Client -> XenServer 6.2 dom 0

 

(Server output)

-bash-4.1$ ib_send_bw -d mlx4_0

************************************

* Waiting for client to connect... *

************************************

---------------------------------------------------------------------------------------

                    Send BW Test

Dual-port       : OFF Device         : mlx4_0

Number of qps   : 1 Transport type : IB

Connection type : RC Using SRQ      : OFF

RX depth        : 512

CQ Moderation   : 100

Mtu             : 2048[B]

Link type       : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x13 QPN 0x0063 PSN 0x27cecf

remote address: LID 0x1b QPN 0x0865 PSN 0x54507

---------------------------------------------------------------------------------------

#bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]

65536      1000           0.00               5946.02   0.095136

---------------------------------------------------------------------------------------

 

(Client output)

[root@xenserver ~]# ib_send_bw 192.168.1.14 -d mlx4_0

---------------------------------------------------------------------------------------

                    Send BW Test

Dual-port       : OFF Device         : mlx4_0

Number of qps   : 1 Transport type : IB

Connection type : RC Using SRQ      : OFF

TX depth        : 128

CQ Moderation   : 100

Mtu             : 2048[B]

Link type       : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x1b QPN 0x0865 PSN 0x54507

remote address: LID 0x13 QPN 0x0063 PSN 0x27cecf

---------------------------------------------------------------------------------------

#bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]

65536      1000           5906.23            5904.40   0.094470

---------------------------------------------------------------------------------------

 

However, if the server is the XenServer 6.2 dom0 node, the test fails:

 

Server -> XenServer 6.2 dom 0 and Client -> CentOS 6.4

 

(Server output)

[root@xenserver ~]# ib_send_bw -d mlx4_0

 

 

************************************

* Waiting for client to connect... *

************************************

---------------------------------------------------------------------------------------

                    Send BW Test

Dual-port       : OFF Device         : mlx4_0

Number of qps   : 1 Transport type : IB

Connection type : RC Using SRQ      : OFF

RX depth        : 512

CQ Moderation   : 100

Mtu             : 2048[B]

Link type       : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x1b QPN 0x0866 PSN 0x301c00

remote address: LID 0x13 QPN 0x0064 PSN 0x86e97e

ethernet_read_keys: Couldn't read remote address

Unable to read to socket/rdam_cm

Failed to exchange data between server and clients

 

(Client output)

-bash-4.1$ ib_send_bw 192.168.1.17 -d mlx4_0

---------------------------------------------------------------------------------------

                    Send BW Test

Dual-port       : OFF Device         : mlx4_0

Number of qps   : 1 Transport type : IB

Connection type : RC Using SRQ      : OFF

TX depth        : 128

CQ Moderation   : 100

Mtu             : 128[B]

Link type       : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x13 QPN 0x0064 PSN 0x86e97e

remote address: LID 0x1b QPN 0x0866 PSN 0x301c00

Failed to modify QP 100 to RTR

Unable to Connect the HCA's through the link

 

 

Can anyone give me help on this error?


Re: OFED install firmware update fails on Ubuntu 14.04

$
0
0

Hi Cduced,

 

This is a lower cost 10G card that has a flash size that is half the size of most of our other cards. Because of this there is no room in the eeprom to have two images in flash (which is required when doing a failsafe burn).

 

 

MLNX_OFED installation is using the mlxfwmanager tool which is located at the firmware directory, this tool  doesn't support burning non-failsafe images so this explains why the tool couldn't burn the new firmware on the card.
you can overcome with the following,
1. download the relevent bin file ( according to the NIC psid) from:
Firmware for ConnectX®-3 EN - Mellanox Technologies
2. burn the new firmware (using the --nofs switch):
1. mst start
2.  flint -d mlx4_0 -i  fw-ConnectX3-rel-2_31_5050-MCX311A-XCA_Ax-FlexBoot-3.4.225_ETH.bin --nofs burn

ibping fails between Mellanox MT26428 and qlogic QDR on Centos6.5

$
0
0

Hi All,

I have a mix of Mellanox MT26428 and qlogic IBA7322 adapters all connected to a QLOGIC12300 switch.

Everything used to work fine with Centos6.4, after the upgrade to Centos6.5 the iping between the mellanox and qlogic adapter stopped working. While it is still fine for the qlogic-qlogic and mellanox-mellanox paths.

Below the details for two hosts. Any hint would be much appreciated.

Kind Regards,

  Daniele.

 

HOST1:

[root@hpc-200-06-13-a ~]# lspci | grep Mellanox

02:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)

[root@hpc-200-06-13-a ~]# ibnodes

Ca      : 0x003048fffff49b1c ports 1 "hpc-200-06-13-b HCA-1"

Ca      : 0x0011750000708278 ports 1 "hpc-200-06-09 HCA-1"

Ca      : 0x00117500007080ca ports 1 "hpc-200-06-11 HCA-1"

Ca      : 0x0011750000702988 ports 1 "hpc-200-06-08 HCA-1"

Ca      : 0x00117500007028c4 ports 1 "hpc-200-06-07 HCA-1"

Ca      : 0x0011750000705360 ports 1 "hpc-200-06-06 HCA-2"

Ca      : 0x0011750000702966 ports 1 "hpc-200-06-05 HCA-2"

Ca      : 0x0011750000706e52 ports 1 "hpc-200-06-04 HCA-1"

Ca      : 0x0011750000702dfc ports 1 "hpc-200-06-03 HCA-1"

Ca      : 0x00117500007041a6 ports 1 "hpc-200-06-02 HCA-1"

Ca      : 0x003048fffff499c4 ports 1 "hpc-200-06-13-a HCA-1"

Switch  : 0x00066a00e3005938 ports 36 "QLogic 12300 GUID=0x00066a00e3005938" enhanced port 0 lid 1 lmc 0

[root@hpc-200-06-13-a ~]# ibstat

CA 'mlx4_0'

        CA type: MT26428

        Number of ports: 1

        Firmware version: 2.7.200

        Hardware version: b0

        Node GUID: 0x003048fffff499c4

        System image GUID: 0x003048fffff499c7

        Port 1:

                State: Active

                Physical state: LinkUp

                Rate: 40

                Base lid: 12

                LMC: 0

                SM lid: 1

                Capability mask: 0x02510868

                Port GUID: 0x003048fffff499c5

                Link layer: InfiniBand

 

HOST2

(only qib0 is connected to the switch)

[root@hpc-200-06-05 ~]# lspci | grep -i infi

01:00.0 InfiniBand: Mellanox Technologies MT27600 [Connect-IB]

03:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02)

 

[root@hpc-200-06-05 ~]# ibstat

CA 'mlx5_0'

        CA type: MT4113

        Number of ports: 2

        Firmware version: 10.10.1000

        Hardware version: 0

        Node GUID: 0xf4521403001b4b40

        System image GUID: 0xf4521403001b4b40

        Port 1:

                State: Active

                Physical state: LinkUp

                Rate: 56

                Base lid: 1

                LMC: 0

                SM lid: 1

                Capability mask: 0x0651484a

                Port GUID: 0xf4521403001b4b40

                Link layer: InfiniBand

        Port 2:

                State: Initializing

                Physical state: LinkUp

                Rate: 56

                Base lid: 65535

                LMC: 0

                SM lid: 0

                Capability mask: 0x06514848

                Port GUID: 0xf4521403001b4b48

                Link layer: InfiniBand

CA 'qib0'

        CA type: InfiniPath_QLE7340

        Number of ports: 1

        Firmware version:

        Hardware version: 2

        Node GUID: 0x0011750000702966

        System image GUID: 0x0011750000702966

        Port 1:

                State: Active

                Physical state: LinkUp

                Rate: 40

                Base lid: 4

                LMC: 0

                SM lid: 1

                Capability mask: 0x07610868

                Port GUID: 0x0011750000702966

                Link layer: InfiniBand

[root@hpc-200-06-05 ~]# ibnodes -C qib0

Ca      : 0x003048fffff49b1c ports 1 "hpc-200-06-13-b HCA-1"

Ca      : 0x003048fffff499c4 ports 1 "hpc-200-06-13-a HCA-1"

Ca      : 0x0011750000708278 ports 1 "hpc-200-06-09 HCA-1"

Ca      : 0x00117500007080ca ports 1 "hpc-200-06-11 HCA-1"

Ca      : 0x0011750000706e52 ports 1 "hpc-200-06-04 HCA-1"

Ca      : 0x0011750000702988 ports 1 "hpc-200-06-08 HCA-1"

Ca      : 0x00117500007028c4 ports 1 "hpc-200-06-07 HCA-1"

Ca      : 0x0011750000705360 ports 1 "hpc-200-06-06 HCA-2"

Ca      : 0x0011750000702dfc ports 1 "hpc-200-06-03 HCA-1"

Ca      : 0x00117500007041a6 ports 1 "hpc-200-06-02 HCA-1"

Ca      : 0x0011750000702966 ports 1 "hpc-200-06-05 HCA-2"

Switch  : 0x00066a00e3005938 ports 36 "QLogic 12300 GUID=0x00066a00e3005938" enhanced port 0 lid 1 lmc 0

 

Starting the server:

[root@hpc-200-06-05 ~]# ibping -v -d  -S -G 0x0011750000702966

ibdebug: [31145] ibping_serv: starting to serve...

 

 

Launching the ping:

[root@hpc-200-06-13-a ~]# ibping -v -d -G 0x0011750000702966

ibwarn: [12545] sa_rpc_call: attr 0x35 mod 0x0 route Lid 1

ibwarn: [12545] mad_rpc_rmpp: rmpp (nil) data 0x7ffff73b0aa0

ibwarn: [12545] mad_rpc_rmpp: data offs 56 sz 200

rmpp mad data

0000 0000 0000 0000 fe80 0000 0000 0000

0011 7500 0070 2966 fe80 0000 0000 0000

0030 48ff fff4 99c5 0004 000c 0000 0000

0080 ffff 0000 8487 8e00 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000

ibdebug: [12545] ibping: Ping..

ibwarn: [12545] ib_vendor_call_via: route Lid 4 data 0x7ffff73b0e30

ibwarn: [12545] ib_vendor_call_via: class 0x132 method 0x1 attr 0x0 mod 0x0 datasz 216 off 40 res_ex 1

ibwarn: [12545] mad_rpc_rmpp: rmpp (nil) data 0x7ffff73b0e30

ibwarn: [12545] mad_rpc_rmpp: MAD completed with error status 0xc; dport (Lid 4)

ibdebug: [12545] main: ibping to Lid 4 failed

SL4540 GEN8 + MT27500 Family [ConnectX-3] ---> random packets "dissapearing"

$
0
0

Hello,

 

We´ve just build 9 RHES 6.4 servers (3xSL4540 blades) where we are running some network intensive software.

 

The thing is that eventually, connectivity between this servers will fail (for example, making an http request or an ssh attempt).

 

More specific details:

If we do an ssh from server A, to server B, the connection is established OK.

But 1 every out of 10 ssh attempts, the connection request won´t reach its destiny.

 

At first we thought we were having some network issues, but switch is configured straight away with no firewalls nor any special commands (except from STP, which won´t interfere here).

 

For trying to recreate the scenario, we make a simple test with a SSH small script:

 

for  (( ; ; )); do ssh  hostname.xxx "ls -l" ; done

 

This will execute an ls on a loop.

Randomnly, and every 20-30 attempts, one of them will fail (hanging the session).

 

We did the same test with some http / scp / smtp tests, and we are getting the same error (random timeout cause of packet missing).

We did try doing a tcpdump, and the packet failing to reach server B, is actually dissapearing (its being sent but its not being "recepted" by server B).

Kernel stacks and sockstats looks fine (no orphaned nor tables being filled up)

 

We have 9 servers.

3 out of 6 servers are presenting this problems. The rest is working fine.

 

All of them have exactly the same configuration, same kernel parameters and same OS + software.

We are using bonding on mode 1.

We are using these drivers:

mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011)

mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.0 (Dec 2011)

 

We did not wanted to upgrade driver/firmware since the problem is only being presented randomly on 3 servers out of 9.

 

Does this issue has been registered before?

Can you recommend me any extra/special parameters or configuration to test an discard hardware/link problems?

 

Thanks,

Nicolas.-

Re: Virtual functions did not come up at boot time of virtual machines

$
0
0

Tested latest driver MLNX_OFED_LINUX-2.2-1.0.1-rhel6.4-x86_64 and this problem does not happen. Seems like it got fixed.

Re: OFED install firmware update fails on Ubuntu 14.04

$
0
0

Thanks.  It would be nice if mlxfwmanager would handle the f/w upgrade but at least there is a straightforward workaround.  -- Bud

Viewing all 6275 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>