Mixing OFED 1.5.3 and 2.2 in the same network?

Hi all,

We have 2 clusters using infiniband, as follows:

Computing cluster 1:

27 nodes
IBM bladecenter
CentOS 6.2
Each node has 1x MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]
Firmware version: 2.9.1100
OFED 1.5.3-3.0.0 (from the installer MLNX_OFED_LINUX-1.5.3-3.0.0-rhel6.2-x86_64, with addkernel and all that )
ib_ipoib

GPFS server cluster:

4 nodes
Dell Power edge R720
RHEL 6.3
Each node has 1x MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE]
Firmware version: 2.9.1000
OFED 1.5.3-3.1.0 (from the installer MLNX_OFED_LINUX-1.5.3-3.1.0-rhel6.3-x86_64 )
ib_ipoib

Our network topology consists of:

1x 36x port Mellanox FabricIT IS5030/U1 connected to:
- 4x GPFS servers - 1x port each
- 2x Voltaire 40Gb InfiniBand Switch Module. The bladecenter's switches - 2x ports each.
- 4x other gpfs clients
2x Voltaire 40Gb InfiniBand Switch Module.
- 13x nodes each. Connected internally tho the HCA's through the chassis backplane.

Both clusters have been running for more than a year. The original GPFS + Infiniband installation was done by IBM techs. We merely copied/adapted it when we moved the servers to Dell machines. We never did much research on Infiniband. Mostly went with what we found already installed/configured.

We run GPFS over infiniband (that's why I think we have ib_ipoib there, to use ip addresses to name nodes). The only infiniband parameter we ever modified was adding this to /etc/modprobe.d/mlx4_en.conf to all nodes of both clusters:

options mlx4_core pfctx=0 pfcrx=0 log_num_mtt=20 log_mtts_per_seg=4

Performance tests (ib_read/write_bw/lat) report ~3250MB/sec and ~2.5 usec (I cannot show actual numbers because there is currently heavy traffic messing with the numbers).

GPFS performance showed single-read/write performance (dd) of 2.0-2.5GB/s, and a global multi-node bandwidth of 6~10GB/s.

Having these numbers in mind, we consider that the system (and the infiniband network) are working fine. (Aren't they?)

Now, we are planning to build a new computing cluster (and/or rebuild the current one) and we started doing some tests with a couple of computing nodes.

We are moving to CentOS 6.5 and we are forced to move to OFED 2.2 (MLNX_OFED_LINUX-2.2-1.0.1-rhel6.5), as we tried to install 1.5.3 (MLNX_OFED_LINUX-1.5.3-4.0.42-rhel6.3) but the mlnx_add_kernel_support script only supports up to rhel6.4. Even if cheating, the compilation fails due to some missing includes. Thus, we moved on and installed the test cluster with CentOS 6.5 and MLNX_OFED_LINUX-2.2-1.0.1.

We have had a couple of problems, that we more or less determined after asking in the openfabrics discussion list:

perftest handshake mechanism changed from 1.5 to 2.2 and cannot run tests between the new and the old cluster.

We can deal with this. The performance tests between the two ofed-2.2 nodes seemed ok.

Loading the ib_ipoib module under ofed-2.2 changes the mac address of ib0.

This wouldn't be a problem if it weren't for the ofed installer deleting CentOS's rdma-3.10-3.el6.noarch rpm. This rpm contains the ifup-ib and ifdown-ib scripts able to initialize the infiniband interfaces ignoring mac address changes. We can get around this copying back the two "old" scripts after ofed's installation in the kickstart graph.

Even after all these problems, we have been able to join these nodes as GPFS clients with "normal" performance. But, after dealing with this, we are wondering what to do with our infiniband-gpfs network.

Can we keep the GPFS servers on ofed 1.5.3 and move the (new) clients to 2.2? Should we try to update everything to 2.2? Will there appear new problems when upgrading the servers in production? Should we keel everything in 1.53? Should we use the community ofed? (CentOS 6.5 default infiniband installation "works")

Any other criticism to our system is welcome.

Thanks in advance,

Txema

PS: All these doubts are due to one of our techs adding a client to the gpfs with a "poorly installed" infiniband, that stalled all the whole infiniband and GPFS traffic until we removed the node. So we are afraid of touching anything on that network.

Mixing OFED 1.5.3 and 2.2 in the same network?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112