There is (or at least used to be) a variable in the .ini configuration file where you could set this. The variable was 'default_max_read_request_size'. I'll have to check to see if this is still active. Thanks.

↧

create_ibv_flow() Create of QP flow ID failed

May 24, 2014, 10:22 am

≫ Next: Firmware for Mellanox cards in Intel S2600JFQ

≪ Previous: Re: The MaxReadRequest size is set too low (512 bytes)

Dear Mellanox Support,

I've problems with OFED 2.2. Neither raw_ethernet_bw / raw_ethernet_lat nor libvma can create a QP flow / call verbs.

At some point on that machine it used to run fine but then we added more network interfaces and it seems to break it somehow.

I got all the latest MLNX software & firmware. Flow steering is disabled and there is nothing useful in dmesg. Could you please help me to figure out what's wrong? See basic info below.

[root@aurarb01 log]# uname -a

Linux aurarb01 2.6.32-431.17.1.el6.x86_64 #1 SMP Fri Apr 11 17:27:00 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

[root@aurarb01 log]# ofed_info

MLNX_OFED_LINUX-2.2-1.0.1 (OFED-2.2-1.0.0):

[root@aurarb01 log]# raw_ethernet_bw --server

Max msg size in RawEth is MTU 1518

Changing msg size to this MTU

---------------------------------------------------------------------------------------

Send BW Test

Dual-port : OFF Device : mlx4_1

Number of qps : 1 Transport type : IB

Connection type : RawEth Using SRQ : OFF

RX depth : 512

CQ Moderation : 100

Mtu : 1518[B]

Link type : Ethernet

Gid index : 0

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

MAC attached : 24:BE:05:81:95:90

error: Function not implemented

Couldn't attach QP

[root@aurarb01 log]# LD_PRELOAD=libvma.so sockperf sr

VMA INFO : ---------------------------------------------------------------------------

VMA INFO : VMA_VERSION: 6.6.4-0 Release built on 2014-04-23-16:54:38

VMA INFO : Cmd Line: sockperf sr

VMA INFO : OFED Version: MLNX_OFED_LINUX-2.2-1.0.1:

VMA INFO : Log Level 3 [VMA_TRACELEVEL]

VMA INFO : ---------------------------------------------------------------------------

sockperf: == version #2.5.233 ==

VMA ERROR : rfs[0x23888f0]:188:create_ibv_flow() Create of QP flow ID failed with flow dst:172.17.38.37:11111, src:0.0.0.0:0, protocol:UDP

VMA ERROR : rfs[0x241aa70]:188:create_ibv_flow() Create of QP flow ID failed with flow dst:172.17.38.38:11111, src:0.0.0.0:0, protocol:UDP

VMA ERROR : rfs[0x243e240]:188:create_ibv_flow() Create of QP flow ID failed with flow dst:172.17.38.36:11111, src:0.0.0.0:0, protocol:UDP

sockperf: [SERVER] listen on:

[ 0] IP = 0.0.0.0 PORT = 11111 # UDP

sockperf: Warmup stage (sending a few dummy messages)...

[root@aurarb01 log]# ibv_devinfo

hca_id: mlx4_1

transport: InfiniBand (0)

fw_ver: 2.31.5050

node_guid: 24be:05ff:ff81:9590

sys_image_guid: 24be:05ff:ff81:9593

vendor_id: 0x02c9

vendor_part_id: 4099

hw_ver: 0x1

board_id: HP_0280210019

phys_port_cnt: 2

port: 1

state: PORT_ACTIVE (4)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

port: 2

state: PORT_ACTIVE (4)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

hca_id: mlx4_0

transport: InfiniBand (0)

fw_ver: 2.31.5050

node_guid: 24be:05ff:ff94:fae0

sys_image_guid: 24be:05ff:ff94:fae3

vendor_id: 0x02c9

vendor_part_id: 4099

hw_ver: 0x1

board_id: HP_0230240009

phys_port_cnt: 2

port: 1

state: PORT_ACTIVE (4)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

port_lid: 0

port_lmc: 0x00

link_layer: Ethernet

port: 2

state: PORT_DOWN (1)

max_mtu: 4096 (5)

active_mtu: 1024 (3)

sm_lid: 0

[root@aurarb01 log]# ifconfig

eth0 Link encap:Ethernet HWaddr 24:BE:05:94:FA:E1

inet addr:172.XX.XX.XX Bcast:172.17.38.39 Mask:255.255.255.248

inet6 addr: fe80::26be:5ff:fe94:fae1/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:3987 errors:0 dropped:0 overruns:0 frame:0

TX packets:11 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:300512 (293.4 KiB) TX bytes:812 (812.0 b)

eth2 Link encap:Ethernet HWaddr 24:BE:05:81:95:90

inet addr:172.XX.XX.XX Bcast:172.17.38.39 Mask:255.255.255.248

inet6 addr: fe80::26be:5ff:fe81:9590/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:3978 errors:0 dropped:0 overruns:0 frame:0

TX packets:9 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:299904 (292.8 KiB) TX bytes:684 (684.0 b)

eth3 Link encap:Ethernet HWaddr 24:BE:05:81:95:91

inet addr:172.XX.XX.XX Bcast:172.17.38.39 Mask:255.255.255.248

inet6 addr: fe80::24be:500:181:9591/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:3972 errors:0 dropped:0 overruns:0 frame:0

TX packets:9 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:299492 (292.4 KiB) TX bytes:684 (684.0 b)

eth5 Link encap:Ethernet HWaddr 38:EA:A7:36:99:6A

inet addr:10.128.XX.XX Bcast:10.128.178.63 Mask:255.255.255.224

inet6 addr: fe80::3aea:a7ff:fe36:996a/64 Scope:Link

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:869904 errors:0 dropped:0 overruns:0 frame:0

TX packets:1318759 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:447971625 (427.2 MiB) TX bytes:1615917933 (1.5 GiB)

Memory:f3f00000-f4000000

lo Link encap:Local Loopback

inet addr:127.0.0.1 Mask:255.0.0.0

inet6 addr: ::1/128 Scope:Host

UP LOOPBACK RUNNING MTU:16436 Metric:1

RX packets:747 errors:0 dropped:0 overruns:0 frame:0

TX packets:747 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:0

RX bytes:70889 (69.2 KiB) TX bytes:70889 (69.2 KiB)

↧

Firmware for Mellanox cards in Intel S2600JFQ

May 27, 2014, 6:56 am

≫ Next: error 104 on socket

≪ Previous: create_ibv_flow() Create of QP flow ID failed

Hi,

I'm having a set of machines with Intel S2600JFQ motherboards which have Mellanox Infiniband cards (PSID: INCX-3I358C10501 ), Since I'm having RHEL/Centos 6.5 installed on my systems I've had to install the latest stack of the MLNX_OFED software, but it was always complaining about the firmware version of the cards (it is 2.11.1308). Theoretically the OFED stack requires something like 2.30.XXX. Despite that, things were working reasonably fine. Recently I've noticed that the same cards on S2600JFQs but slightly newer revisions (PSID: INCX-3I358C10551 ) have got a firmware update: http://www.mellanox.com/page/firmware_table_Intel?mtag=oem_firmware_download. I was wondering whether I should expect maybe a firmware update for my cards as well, or all hope is lost ? Also it is interesting what people think about running recent OFED versions on slightly outdated firmware (because that seems to be the only alternative to actually updating firmware)

Thank you,
Sergey

↧

error 104 on socket

May 28, 2014, 2:38 am

≫ Next: Does MHQH29C-XTR support port type management via connectx_port_config?

≪ Previous: Firmware for Mellanox cards in Intel S2600JFQ

Hi,

I am trying to run a working socket-based program over MT4099 in EN mode.

The connection is established but recv() report errno 104 (connection reset by peer) on the first four-byte attempt. No real RST si seen on the wire. What am I doing wrong?

The app is a server listening on port 445 (SMB).

Mark

↧

Does MHQH29C-XTR support port type management via connectx_port_config?

May 28, 2014, 5:57 am

≫ Next: how do I update the firmware on MT23108 when kernel is 3.7.10

≪ Previous: error 104 on socket

Hi all,

I'm thinking of purchasing a MHQH29C-XTR to use with IB infrastructure and a linux host. However, i would like to reserve the ability to change the port type using a QSR adapter to switch to 10G Ethernet for testing. Does this card support this? If so ... can it support Ethernet on one port and IB on the other?

Unfortunately i have not found an example of this exotic config in the docs. There is a hint that the chip would support this in this document.

http://www.mellanox.com/related-docs/user_manuals/ConnectX_2_%20VPI_UserManual_MHZH29.pdf

However, this card actually has 2 different ports for QSFP and STP+ unlike the solution i'm looking for.

Thanks in advance!

↧

how do I update the firmware on MT23108 when kernel is 3.7.10

May 28, 2014, 10:22 am

≫ Next: Re: Mellanox (old Voltaire) ISR9024D-M recover flash area

≪ Previous: Does MHQH29C-XTR support port type management via connectx_port_config?

I have MT23108 firmware 3.3.5 and I want to update the firmware. The card is working under OpenSuse 12.3 but MSTFLINT does this

mstflint -d 5:00.0 -i fw-23108-3_5_000-380299-B21_A1.bin b

-E- Can not open 5:00.0: MFE_OLD_DEVICE_TYPE

The card works

# ibstat

CA 'mthca0'

CA type: MT23108

Number of ports: 2

Firmware version: 3.3.5

Hardware version: a1

Node GUID: 0x0019bbffff007880

System image GUID: 0x0019bbffff007883

Port 1:

State: Active

Physical state: LinkUp

Rate: 10

Base lid: 1

....

Do i have to build an old linux system just to update the cards?

↧

Re: Mellanox (old Voltaire) ISR9024D-M recover flash area

May 29, 2014, 4:32 am

≫ Next: Re: Hyper-V vSwitch speed problems

≪ Previous: how do I update the firmware on MT23108 when kernel is 3.7.10

I,m also agree it....:)

↧

Re: Hyper-V vSwitch speed problems

May 29, 2014, 4:37 am

≫ Next: Re: Hyper-V vSwitch speed problems

≪ Previous: Re: Mellanox (old Voltaire) ISR9024D-M recover flash area

I think that 1.4GB/s performance derived from SMB Direct!

But Hyper-V vSwitch performance is diffrent factor.

I remember something exists tunning guide for that circumstances.

i'll search and return for you...:)

↧

Re: Hyper-V vSwitch speed problems

May 30, 2014, 6:56 pm

≫ Next: Re: Mellanox 40 GbE and Dell Powerconnect 8164

≪ Previous: Re: Hyper-V vSwitch speed problems

Any progress on finding some tips on how to set things up inbusiness?

↧

Re: Mellanox 40 GbE and Dell Powerconnect 8164

May 31, 2014, 1:49 am

≫ Next: Re: Hyper-V vSwitch speed problems

≪ Previous: Re: Hyper-V vSwitch speed problems

Did you already try to upgrade your Switch firmware to 6.x and check if "service unsupported-transceiver" works for you on SFP and QSFP?

See below from Dell: http://en.community.dell.com/support-forums/network-switches/f/866/t/19581139.aspx

↧

Re: Hyper-V vSwitch speed problems

June 3, 2014, 4:24 am

≫ Next: Re: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

≪ Previous: Re: Mellanox 40 GbE and Dell Powerconnect 8164

Hi!

Unfortunately, there is nothing information to help you.

I think that Microsoft focus SMB Direct only, because since introduction of Hyper-V shared nothing migration then Microsoft show a some product support. ie. MSSQL support SMB Shared folder, etc.

I have experience about Live Migration via SMB Direct but Hyper-V vSwitch was just a fast above 10Gb Network.

Maybe wait for Microsoft (also VMware,too) will be ready to support RDMA on their hypervisor...:)

↧

Re: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

June 3, 2014, 9:02 am

≫ Next: ib_send_bw test with ConnectX-3 VPI adapter card over XenServer 6.2 fails with message: Failed to modify QP 100 to RTR

≪ Previous: Re: Hyper-V vSwitch speed problems

Hi all,

Finally the problem was fixed. After several tests, it seems that the problem was due to hardware incompatibility.

We have repeated the same steps in other node (with different integrated board) and all seems to be ok.

↧

ib_send_bw test with ConnectX-3 VPI adapter card over XenServer 6.2 fails with message: Failed to modify QP 100 to RTR

June 3, 2014, 10:05 am

≫ Next: Re: OFED install firmware update fails on Ubuntu 14.04

≪ Previous: Re: Trouble with ConnectX-3 VPI adapter card over XenServer 6.2 (Service Pack 1)

Hi all,

we are testing a ConnectX-3 card with XenServer 6.2. For that, we have two nodes, the first node has XenServer 6.2 dom0 and the second node has CentOS 6.4. In both nodes we have installed the MLNX_OFED version 2.2-1.0.1.

If we perform a ibping between the two nodes, all seems to be ok. But when we try to perform a ib_send_bw test, we have problems depending who is the server. If the server is the node with CentOS 6.4, the test ends correctly.

Server -> CentOS 6.4 and Client -> XenServer 6.2 dom 0

(Server output)

-bash-4.1$ ib_send_bw -d mlx4_0

************************************

* Waiting for client to connect... *

************************************

---------------------------------------------------------------------------------------

Send BW Test

Dual-port : OFF Device : mlx4_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

RX depth : 512

CQ Moderation : 100

Mtu : 2048[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x13 QPN 0x0063 PSN 0x27cecf

remote address: LID 0x1b QPN 0x0865 PSN 0x54507

---------------------------------------------------------------------------------------

#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

65536 1000 0.00 5946.02 0.095136

---------------------------------------------------------------------------------------

(Client output)

[root@xenserver ~]# ib_send_bw 192.168.1.14 -d mlx4_0

---------------------------------------------------------------------------------------

Send BW Test

Dual-port : OFF Device : mlx4_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

TX depth : 128

CQ Moderation : 100

Mtu : 2048[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x1b QPN 0x0865 PSN 0x54507

remote address: LID 0x13 QPN 0x0063 PSN 0x27cecf

---------------------------------------------------------------------------------------

#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]

65536 1000 5906.23 5904.40 0.094470

---------------------------------------------------------------------------------------

However, if the server is the XenServer 6.2 dom0 node, the test fails:

Server -> XenServer 6.2 dom 0 and Client -> CentOS 6.4

(Server output)

[root@xenserver ~]# ib_send_bw -d mlx4_0

************************************

* Waiting for client to connect... *

************************************

---------------------------------------------------------------------------------------

Send BW Test

Dual-port : OFF Device : mlx4_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

RX depth : 512

CQ Moderation : 100

Mtu : 2048[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x1b QPN 0x0866 PSN 0x301c00

remote address: LID 0x13 QPN 0x0064 PSN 0x86e97e

ethernet_read_keys: Couldn't read remote address

Unable to read to socket/rdam_cm

Failed to exchange data between server and clients

(Client output)

-bash-4.1$ ib_send_bw 192.168.1.17 -d mlx4_0

---------------------------------------------------------------------------------------

Send BW Test

Dual-port : OFF Device : mlx4_0

Number of qps : 1 Transport type : IB

Connection type : RC Using SRQ : OFF

TX depth : 128

CQ Moderation : 100

Mtu : 128[B]

Link type : IB

Max inline data : 0[B]

rdma_cm QPs : OFF

Data ex. method : Ethernet

---------------------------------------------------------------------------------------

local address: LID 0x13 QPN 0x0064 PSN 0x86e97e

remote address: LID 0x1b QPN 0x0866 PSN 0x301c00

Failed to modify QP 100 to RTR

Unable to Connect the HCA's through the link

Can anyone give me help on this error?

↧

Re: OFED install firmware update fails on Ubuntu 14.04

June 4, 2014, 11:28 pm

≫ Next: ibping fails between Mellanox MT26428 and qlogic QDR on Centos6.5

≪ Previous: ib_send_bw test with ConnectX-3 VPI adapter card over XenServer 6.2 fails with message: Failed to modify QP 100 to RTR

Hi Cduced,

This is a lower cost 10G card that has a flash size that is half the size of most of our other cards. Because of this there is no room in the eeprom to have two images in flash (which is required when doing a failsafe burn).

MLNX_OFED installation is using the mlxfwmanager tool which is located at the firmware directory, this tool doesn't support burning non-failsafe images so this explains why the tool couldn't burn the new firmware on the card.

you can overcome with the following,

1. download the relevent bin file ( according to the NIC psid) from:

Firmware for ConnectX®-3 EN - Mellanox Technologies

2. burn the new firmware (using the --nofs switch):

1. mst start

2. flint -d mlx4_0 -i fw-ConnectX3-rel-2_31_5050-MCX311A-XCA_Ax-FlexBoot-3.4.225_ETH.bin --nofs burn

↧

ibping fails between Mellanox MT26428 and qlogic QDR on Centos6.5

June 5, 2014, 12:47 am

≫ Next: SL4540 GEN8 + MT27500 Family [ConnectX-3] ---> random packets "dissapearing"

≪ Previous: Re: OFED install firmware update fails on Ubuntu 14.04

Hi All,

I have a mix of Mellanox MT26428 and qlogic IBA7322 adapters all connected to a QLOGIC12300 switch.

Everything used to work fine with Centos6.4, after the upgrade to Centos6.5 the iping between the mellanox and qlogic adapter stopped working. While it is still fine for the qlogic-qlogic and mellanox-mellanox paths.

Below the details for two hosts. Any hint would be much appreciated.

Kind Regards,

Daniele.

HOST1:

[root@hpc-200-06-13-a ~]# lspci | grep Mellanox

02:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)

[root@hpc-200-06-13-a ~]# ibnodes

Ca : 0x003048fffff49b1c ports 1 "hpc-200-06-13-b HCA-1"

Ca : 0x0011750000708278 ports 1 "hpc-200-06-09 HCA-1"

Ca : 0x00117500007080ca ports 1 "hpc-200-06-11 HCA-1"

Ca : 0x0011750000702988 ports 1 "hpc-200-06-08 HCA-1"

Ca : 0x00117500007028c4 ports 1 "hpc-200-06-07 HCA-1"

Ca : 0x0011750000705360 ports 1 "hpc-200-06-06 HCA-2"

Ca : 0x0011750000702966 ports 1 "hpc-200-06-05 HCA-2"

Ca : 0x0011750000706e52 ports 1 "hpc-200-06-04 HCA-1"

Ca : 0x0011750000702dfc ports 1 "hpc-200-06-03 HCA-1"

Ca : 0x00117500007041a6 ports 1 "hpc-200-06-02 HCA-1"

Ca : 0x003048fffff499c4 ports 1 "hpc-200-06-13-a HCA-1"

Switch : 0x00066a00e3005938 ports 36 "QLogic 12300 GUID=0x00066a00e3005938" enhanced port 0 lid 1 lmc 0

[root@hpc-200-06-13-a ~]# ibstat

CA 'mlx4_0'

CA type: MT26428

Number of ports: 1

Firmware version: 2.7.200

Hardware version: b0

Node GUID: 0x003048fffff499c4

System image GUID: 0x003048fffff499c7

Port 1:

State: Active

Physical state: LinkUp

Rate: 40

Base lid: 12

LMC: 0

SM lid: 1

Capability mask: 0x02510868

Port GUID: 0x003048fffff499c5

Link layer: InfiniBand

HOST2

(only qib0 is connected to the switch)

[root@hpc-200-06-05 ~]# lspci | grep -i infi

01:00.0 InfiniBand: Mellanox Technologies MT27600 [Connect-IB]

03:00.0 InfiniBand: QLogic Corp. IBA7322 QDR InfiniBand HCA (rev 02)

[root@hpc-200-06-05 ~]# ibstat

CA 'mlx5_0'

CA type: MT4113

Number of ports: 2

Firmware version: 10.10.1000

Hardware version: 0

Node GUID: 0xf4521403001b4b40

System image GUID: 0xf4521403001b4b40

Port 1:

State: Active

Physical state: LinkUp

Rate: 56

Base lid: 1

LMC: 0

SM lid: 1

Capability mask: 0x0651484a

Port GUID: 0xf4521403001b4b40

Link layer: InfiniBand

Port 2:

State: Initializing

Physical state: LinkUp

Rate: 56

Base lid: 65535

LMC: 0

SM lid: 0

Capability mask: 0x06514848

Port GUID: 0xf4521403001b4b48

Link layer: InfiniBand

CA 'qib0'

CA type: InfiniPath_QLE7340

Number of ports: 1

Firmware version:

Hardware version: 2

Node GUID: 0x0011750000702966

System image GUID: 0x0011750000702966

Port 1:

State: Active

Physical state: LinkUp

Rate: 40

Base lid: 4

LMC: 0

SM lid: 1

Capability mask: 0x07610868

Port GUID: 0x0011750000702966

Link layer: InfiniBand

[root@hpc-200-06-05 ~]# ibnodes -C qib0

Ca : 0x003048fffff49b1c ports 1 "hpc-200-06-13-b HCA-1"

Ca : 0x003048fffff499c4 ports 1 "hpc-200-06-13-a HCA-1"

Ca : 0x0011750000708278 ports 1 "hpc-200-06-09 HCA-1"

Ca : 0x00117500007080ca ports 1 "hpc-200-06-11 HCA-1"

Ca : 0x0011750000706e52 ports 1 "hpc-200-06-04 HCA-1"

Ca : 0x0011750000702988 ports 1 "hpc-200-06-08 HCA-1"

Ca : 0x00117500007028c4 ports 1 "hpc-200-06-07 HCA-1"

Ca : 0x0011750000705360 ports 1 "hpc-200-06-06 HCA-2"

Ca : 0x0011750000702dfc ports 1 "hpc-200-06-03 HCA-1"

Ca : 0x00117500007041a6 ports 1 "hpc-200-06-02 HCA-1"

Ca : 0x0011750000702966 ports 1 "hpc-200-06-05 HCA-2"

Switch : 0x00066a00e3005938 ports 36 "QLogic 12300 GUID=0x00066a00e3005938" enhanced port 0 lid 1 lmc 0

Starting the server:

[root@hpc-200-06-05 ~]# ibping -v -d -S -G 0x0011750000702966

ibdebug: [31145] ibping_serv: starting to serve...

Launching the ping:

[root@hpc-200-06-13-a ~]# ibping -v -d -G 0x0011750000702966

ibwarn: [12545] sa_rpc_call: attr 0x35 mod 0x0 route Lid 1

ibwarn: [12545] mad_rpc_rmpp: rmpp (nil) data 0x7ffff73b0aa0

ibwarn: [12545] mad_rpc_rmpp: data offs 56 sz 200

rmpp mad data

0000 0000 0000 0000 fe80 0000 0000 0000

0011 7500 0070 2966 fe80 0000 0000 0000

0030 48ff fff4 99c5 0004 000c 0000 0000

0080 ffff 0000 8487 8e00 0000 0000 0000

0000 0000 0000 0000 0000 0000 0000 0000

0000 0000 0000 0000

ibdebug: [12545] ibping: Ping..

ibwarn: [12545] ib_vendor_call_via: route Lid 4 data 0x7ffff73b0e30

ibwarn: [12545] ib_vendor_call_via: class 0x132 method 0x1 attr 0x0 mod 0x0 datasz 216 off 40 res_ex 1

ibwarn: [12545] mad_rpc_rmpp: rmpp (nil) data 0x7ffff73b0e30

ibwarn: [12545] mad_rpc_rmpp: MAD completed with error status 0xc; dport (Lid 4)

ibdebug: [12545] main: ibping to Lid 4 failed

↧

SL4540 GEN8 + MT27500 Family [ConnectX-3] ---> random packets "dissapearing"

June 5, 2014, 4:12 am

≫ Next: Re: Virtual functions did not come up at boot time of virtual machines

≪ Previous: ibping fails between Mellanox MT26428 and qlogic QDR on Centos6.5

Hello,

We´ve just build 9 RHES 6.4 servers (3xSL4540 blades) where we are running some network intensive software.

The thing is that eventually, connectivity between this servers will fail (for example, making an http request or an ssh attempt).

More specific details:

If we do an ssh from server A, to server B, the connection is established OK.

But 1 every out of 10 ssh attempts, the connection request won´t reach its destiny.

At first we thought we were having some network issues, but switch is configured straight away with no firewalls nor any special commands (except from STP, which won´t interfere here).

For trying to recreate the scenario, we make a simple test with a SSH small script:

for (( ; ; )); do ssh hostname.xxx "ls -l" ; done

This will execute an ls on a loop.

Randomnly, and every 20-30 attempts, one of them will fail (hanging the session).

We did the same test with some http / scp / smtp tests, and we are getting the same error (random timeout cause of packet missing).

We did try doing a tcpdump, and the packet failing to reach server B, is actually dissapearing (its being sent but its not being "recepted" by server B).

Kernel stacks and sockstats looks fine (no orphaned nor tables being filled up)

We have 9 servers.

3 out of 6 servers are presenting this problems. The rest is working fine.

All of them have exactly the same configuration, same kernel parameters and same OS + software.

We are using bonding on mode 1.

We are using these drivers:

mlx4_core: Mellanox ConnectX core driver v1.1 (Dec, 2011)

mlx4_en: Mellanox ConnectX HCA Ethernet driver v2.0 (Dec 2011)

We did not wanted to upgrade driver/firmware since the problem is only being presented randomly on 3 servers out of 9.

Does this issue has been registered before?

Can you recommend me any extra/special parameters or configuration to test an discard hardware/link problems?

Thanks,

Nicolas.-

↧

Re: Virtual functions did not come up at boot time of virtual machines

June 5, 2014, 7:41 am

≫ Next: Re: OFED install firmware update fails on Ubuntu 14.04

≪ Previous: SL4540 GEN8 + MT27500 Family [ConnectX-3] ---> random packets "dissapearing"

Tested latest driver MLNX_OFED_LINUX-2.2-1.0.1-rhel6.4-x86_64 and this problem does not happen. Seems like it got fixed.

↧

Re: OFED install firmware update fails on Ubuntu 14.04

June 5, 2014, 7:51 am

≫ Next: Re: SL4540 GEN8 + MT27500 Family [ConnectX-3] ---> random packets "dissapearing"

≪ Previous: Re: Virtual functions did not come up at boot time of virtual machines

Thanks. It would be nice if mlxfwmanager would handle the f/w upgrade but at least there is a straightforward workaround. -- Bud

↧