Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6275 articles
Browse latest View live

infiniband: ofed installation in diskless environment

$
0
0

Good afternoon to everybody!

Can someone help me... There is no problem to install drivers in diskfull environment (especially if there are only a couple of nodes), but in my case there is xcat that is configured to provide diskless style of the cluster system.

And the first question is - How can I properly install mlx-connectX driver (as far as I can understand i have to install ofed) in my netboot image?

The second one - Can I mass-install (or mass-update) mlx-ofed  through some couples of the nodes? Or the only way is - one by one?

 

p.s. ofed updates the connectX firmware, so can that be done automatically if we are talking about mass-diskless and mass-diskfull?

 

Sincerely yours,

Nikolay!


Re: Trunks, pvlans, infiniband world

$
0
0

Hi, Sorry for the delays, Please see below physical layout

 

Infiniband_Layout.jpg

I"m wondering if anything special needs to be setup like in the Ethernet world to prevent loops etc..?

 

Regards, Daniel

Re: MLX4 custom packet steering

$
0
0

Hello Gilad,

Thank you very much for your reply. As a matter of fact, for the moment, I am resorting to a full software solution to fulfill my needs. I will contact you when I will have rigorously identified my use case (for now, this is a fuzzy research project) and possibly an outline of the solution with a sketchy prototype. I am pretty sure this will take some time, but I will take your invitation in consideration for sure.

Thank you very much again for your support.

Regards,

 

  Harold

MLNX OFED 3.2 centos 7.2 with RT kernel error

$
0
0

Hello.

I've updated my 7.1 centos to 7.2 and got a new kernel

Next I compiled and installed RT kernel like in this article: How to build the CentOS 7 RT kernel - Hardware - Wiki

I need to say that I always use RT kernel with mellanox and its first time I got an error.

Next I downloaded last MLNX OFED package MLNX_OFED_LINUX-3.2-2.0.0.0-rhel7.2-x86_64

Next I generate by ./mlnx_add_kernel_support.sh -m PATHTOMLNX --make-tgz --skip-repo new tgz. I unpacked it.

when I try to install it says:

Logs dir: /tmp/MLNX_OFED_LINUX-3.2-2.0.0.0.7321.logs

Verifying KMP rpms compatibility with target kernel...

The kernel KMP rpms coming with MLNX_OFED are not compatible with kernel: 3.10.0-327.10.1.rt56.211.el7.centos.x86_64

See log at /tmp/MLNX_OFED_LINUX-3.2-2.0.0.0.7321.logs/is_kmp_compat_check.log

If you believe that this is a false alarm you can force the installation using the '--skip-kmp-verify' flag.

The 3.10.0-327.10.1.rt56.211.el7.centos.x86_64 kernel is installed, MLNX_OFED does not have drivers available for this kernel.

You can run mlnx_add_kernel_support.sh in order to to generate an MLNX_OFED package with drivers for this kernel.

 

If I try to use --skip-kmp-verify flag - it installs, but I cannot restart openibd cause it writes next:

 

Module mlx4_core belong to kernel-rt which is not a part of[FAILED]ED, skipping...

Module mlx4_ib belong to kernel-rt which is not a part of M[FAILED], skipping...

Module mlx4_core belong to kernel-rt which is not a part of[FAILED]ED, skipping...

Module mlx4_en belong to kernel-rt which is not a part of M[FAILED], skipping...

Module mlx5_core belong to kernel-rt which is not a part of[FAILED]ED, skipping...

Module mlx5_ib belong to kernel-rt which is not a part of M[FAILED], skipping...

Module ib_umad belong to kernel-rt which is not a part of M[FAILED], skipping...

Module ib_uverbs belong to kernel-rt which is not a part of[FAILED]ED, skipping...

Module ib_ipoib belong to kernel-rt which is not a part of [FAILED]D, skipping...

Module rdma_cm belong to kernel-rt which is not a part of M[FAILED], skipping...

Module ib_ucm belong to kernel-rt which is not a part of ML[FAILED] skipping...

Module rdma_ucm belong to kernel-rt which is not a part of [FAILED]D, skipping...

 

earlier when I got this error mlnx_add_kernel_support helps me. now - it doesnt help.

What I'm doing wrong? and what I can to do to fix it?

 

UPD: I tried to add support in builded tgz but it talks thats all ok

./mlnx_add_kernel_support.sh -m /tmp/MLNX_OFED_LINUX-3.2-2.0.0.0-rhel7.2-x86_64-ext/ --make-tgz

Note: This program will create MLNX_OFED_LINUX TGZ for rhel7.2 under /tmp directory.

Do you want to continue?[y/N]:y

See log file /tmp/mlnx_ofed_iso.10452.log

Required kernel (3.10.0-327.10.1.rt56.211.el7.centos.x86_64) is already supported by MLNX_OFED_LINUX

Re: MLNX OFED 3.2 centos 7.2 with RT kernel error

$
0
0

UPD2: I installed MLNX_OFED_LINUX-3.1-1.1.2.0-rhel7.2-x86_64-ext (it is old version only for 7.0) but it works.

Re: Connectx-4 VPI driver for VMware

$
0
0

I have to agree with Jae-Hoon above, these cards are Ethernet only on ESXi and it is definitely stated otherwise, I also show them with a release in 2014 so its really a shame we still do not have that capability. Id rather see you release SRP support vs take it out of your marketing material though =). Is it that difficult to port your existing code from Linux to Esxi? That is the worst part of this in my opinion.. you have already done it with Connectx-4 cards under Linux, and you have already done it with Connectx-3 cards under Linux/VMware. Mellanox seems ok with letting the supported features lessen with future product generations and it is alarming.

Re: Connectx-4 VPI driver for VMware

$
0
0

I agree with you...:)

I think most problem caused from changed environments.


At first, VMware moved to ESXi.

That's a compact hypervisor kernel, not a general purpose Linux kernel. The age of vSphere ESX 4.x, I saw a some difference between ESX and ESXi.

Some IB command wasn't supported in ESXi host.


vSphere VMCI feature was deprecated. I expect that will be a key point. I think RDMA support in VM environment via vmci on ESXi hypervisor.


You can find some information on google via keyword vRDMA.


Sure! There is a overhead but it is not a HPC environment. It is more fast then general purpose ethernet protocol.


VPI+EN multi function support also same.

SRIOV on virtualization environment will be a major interface for high performance vm network.




Can VMA and DPDK be used together?

$
0
0

Want to know:

 

The reason is I want to use mlx connectx-3 as my nic in intel-ONP alike servers, so i can virtualize as much appliances as possible while keeping latency as low as possible.

 

Also want to use redis cluster to use ram as l1 storage, nvme as l2 , and l3 ssd 's. My question  is mainly about the ram though,

 

Hope someone can help.

 

 

Builder


MHQH19B-XTR and MSX6036F-1SFS

$
0
0

hi guys, I have very simple question...

 

Is MHQH19B-XTR (40Gbs) fully compatible with MSX6036F-1SFS (56Gbs) switch?

I understand that maximum speed can be only 40Gbs.

 

tnx!

Re: MLNX OFED 3.2 centos 7.2 with RT kernel error

$
0
0

UPD3: also I installed MLNX_OFED_LINUX-3.2-1.0.1.1-rhel7.2-x86_64-ext . works fine. so the prob only in 3.2-2.0.0.0. Mellanox fix it please

 

Switch sx6015

$
0
0

Hello, im going to buy the switch SX6015.... but i couldnt understand what does mean UNMANGED,
my dubt is the switch haven't just the subnet manager or the manage web interface too?

Thanks

Re: MLNX OFED 3.2 centos 7.2 with RT kernel error

$
0
0

Hello,

As far as I am aware of Mellanox OFED is not and was previously not was not tested and officially supported against RT kernels.

Re: HELP: DPDK with ConnectX-3 problem

$
0
0

DPDK2.2.0, SLES12SP1 (3.12.49-11-default) and OFED v3.2 works on my setup with Mellanox ConnectX-3 Pro.

MHQH19B-XTR - MFE_NO_FLASH_DETECTED

$
0
0

Hello, i have some problem with my HCA MHQH19B-XTR, im not able to flash it because of the error MFE_NO_FLASH_DETECTED

here's mst and flint output:

 

C:\Program Files\Mellanox\WinMFT>mst status

MST devices:
------------

mt401_pciconf0

 

 

C:\Program Files\Mellanox\WinMFT>flint -d mt401_pciconf0 -i MHQH19B-2.10.720.bin burn

-E- Cannot open Device: mt401_pciconf0. MFE_NO_FLASH_DETECTED

 

 

Somebody know how solve it ?
Thanks

not able to compile the kernel client module

$
0
0
Hi,
         

 

we have mellanox HCA connected with Server. We have installed OFED distribution in our Server . i am trying to write kernel client module to access the infiniband device using the RDMA stack. but i am getting some error  while insmod my module. please help me anyone who already worked in this error.

 

Error:

=============

insmod: ERROR: could not insert module Client.ko: Invalid parameters


 

Dmesg :

===========

 

 

[16266.034500] Client: disagrees about version of symbol ib_unregister_client

 

[16266.034508] Client: Unknown symbol ib_unregister_client (err -22)

[16266.034520] Client: disagrees about version of symbol ib_register_client

[16266.034523] Client: Unknown symbol ib_register_client (err -22)

 

i have searched in this error in internet. they mentioned like compilation with improper kernel header . i am not able to understand properly.

Please clarify me more and provide me the sample makefile to compile this

 

Please see the attached file for program reference.

Thanks,

Shanmugam.S

 


not able to compile the kernel client module

$
0
0
Hi,
         

we have mellanox HCA connected with Server. We have installed OFED distribution in our Server . i am trying to write kernel client module to access the infiniband device using the RDMA stack. but i am getting some error  while insmod my module. please help me anyone who already worked in this error.

 

Error:

=============

insmod: ERROR: could not insert module Client.ko: Invalid parameters


Dmesg :

===========

[16266.034500] Client: disagrees about version of symbol ib_unregister_client

[16266.034508] Client: Unknown symbol ib_unregister_client (err -22)

[16266.034520] Client: disagrees about version of symbol ib_register_client

[16266.034523] Client: Unknown symbol ib_register_client (err -22)

 

i have searched in this error in internet. they mentioned like compilation with improper header problem. i am not able to understand properly.

Please clarify me more and provide me the sample makefile to compile this

 

 

Please see the attached file for program reference.

Thanks,

Shanmugam.S

 

Re: MLNX OFED 3.2 centos 7.2 with RT kernel error

$
0
0

The mistake in scripts. as i told 3.2-1 works, but 3.2-2 fails.

05 марта 2016 г. 11:11 пользователь "ferbs" <community@mellanox.com>

написал:

 

Mellanox Interconnect Community

<https://community.mellanox.com/?et=watches.email.thread>

MLNX OFED 3.2 centos 7.2 with RT kernel error

 

reply from Erez Ferber

<https://community.mellanox.com/people/ferbs?et=watches.email.thread> in *Technical

Forums* - View the full discussion

<https://community.mellanox.com/message/6521?et=watches.email.thread#comment-6521>

 

Network topo for multi MPI clusters and one storage cluster?

$
0
0

We're in the process of integrating a parallel storage cluster (3 storage servers) with 3 independent HPC MPI clusters and I have a question about the best network topology for our situation.

 

Background:

Each MPI cluster uses a singe 36port IB switch and has 30 nodes. The clusters are not currently joined over IB, they work independently. MPI communication will not traverse from one MPI cluster to the next; controlled at the user space. The storage cluster consists of 3 file servers and operates as a parallel file system using RDMA over IB (each compute node must "see" all 3 file servers and each file server must see the other two). We already configured the file system and ran successful tests while connected to one MPI cluster using a single switch (just connected the 3 file servers to the same switch as the nodes).

 

Requirements:

Need to join the 3 storage servers (i.e. storage cluster) to each of the 3 MPI clusters so that every node can "see" each file server over RDMA.

Each MPI cluster must continue to operate independently (we think we can control this in the user space).

We're not concerned about over prescribing the lanes to the file servers. Data transfer throughput is not a huge concern.

Cost is a concern.


I attached an illustration with 2 design ideas. I think 3 ports at each MPI switch for storage traffic will be more than sufficent to handle the througput. At a minimum, I think all we have to do is use a 4th switch to join the storage cluster (Design #1). But I'm not 100% certain this is the best approach. In Design #2, I assume RDMA traffic will traverse updn through the L2 switch and MPI traffic will be limited to only the nodes within each MPI switch (again, controlled at the user level). Which is the ideal design, if any? What is the best routing algorithm(s) for the SM? What are the potiential problems?


Many many thanks in advance.


 

Re: Switch sx6015

$
0
0

Hi,

Unmanaged means that the switch doesn't have CPU, and the only ability to get statuses of the switch is via the SM that should be running on another server or managed switch in the network.

other than that, it is equipped only with the switch silicon.

It doesn't have management port, you cannnot connect it to your IT network.

 

Ophir.

Re: Trunks, pvlans, infiniband world

Viewing all 6275 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>