Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6275 articles
Browse latest View live

Re: infiniband SR-IOV with neutron error

$
0
0

Hi Bernie,

 

Can you update the /etc/modprobe.d/mlx4_core.conf file with the following options:

 

options mlx4_core num_vfs=8 port_type_array=1,1 probe_vf=8 enable_64b_cqe_eqe=0 log_num_mgm_entry_size=-1

 

Please restart openibd service upon updating the file.

 

Sophie.


Re: infiniband SR-IOV with neutron error

$
0
0

this option “probe_vf=8“ mean the vf can be used by this host not guest。

Re: Setting SR-IOV num_vfs for ConnectX-2 card

$
0
0

Hi Talat, thanks for your reply and the steps! I'll try them out

Minimum OpenSM level to support SRIOV on IS5030 switch

$
0
0

We are trying to attach a DDN GS7k to an existing IS5030 switch and we cannot get a ib link up.  We suspect the switch either does not support SRIOV or the OpenSM they are running on the switch does not support it.

 

Can you tell me the following

 

Minimum OpenSM required on an IS5030 switch to support SRIOV

 

How to tell the current OpenSM version running via the Cli or GUI.  We cannot find it on the gui currently and cannot find the Cli command

 

 

Thanks

 

Gary Rees

hca_self_test.ofed found errors in the port_rcv_errors counters

$
0
0

I am trying to configure NFS for our infiniband network, and following the instructions at HowTo Configure NFS over RDMA (RoCE)

I installed the MLNX_OFED drivers on CentOS 6.8.  (I had originally configured the network and IPoIB interface using the RHEL manual (Part II. InfiniBand and RDMA Networking) and was using NFS over the IPoIB but was receiving a bunch of page allocation failures)

I used the mlnxofedinstall script which completed successfully and updated the firmware, e.g.:

 

...

Device (84:00.0):

    84:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]

    Link Width: x8

    PCI Link Speed: 8GT/s

 

Installation finished successfully.

 

Preparing...                ########################################### [100%]

   1:mlnx-fw-updater        ########################################### [100%]

 

Added 'RUN_FW_UPDATER_ONBOOT=no to /etc/infiniband/openib.conf

 

Attempting to perform Firmware update...

Querying Mellanox devices firmware ...

 

Device #1:

----------

  Device Type:      ConnectX3

  Part Number:      MCX354A-FCB_A2-A5

  Description:      ConnectX-3 VPI adapter card; dual-port QSFP; FDR IB (56Gb/s) and 40GigE; PCIe3.0 x8 8GT/s; RoHS R6

  PSID:             MT_1090120019

  PCI Device Name:  84:00.0

  Port1 GUID:       e41d2d03006f89f1

  Port2 GUID:       e41d2d03006f89f2

  Versions:         Current        Available    

     FW             2.32.5100      2.36.5150    

     PXE            3.4.0306       3.4.0740     

 

  Status:           Update required

---------

Found 1 device(s) requiring firmware update...

Device #1: Updating FW ... Done

Restart needed for updates to take effect.

Log File: /tmp/MLNX_OFED_LINUX-3.4-1.0.0.0.17971.logs/fw_update.log

Please reboot your system for the changes to take effect.

To load the new driver, run:

/etc/init.d/openibd restart

#

 

I rebooted the system and then ran the self test:

# hca_self_test.ofed

 

---- Performing Adapter Device Self Test ----

Number of CAs Detected ................. 1

PCI Device Check ....................... PASS

Kernel Arch ............................ x86_64

Host Driver Version .................... MLNX_OFED_LINUX-3.4-1.0.0.0 (OFED-3.4-1.0.0): 2.6.32-642.el6.x86_64

Host Driver RPM Check .................. PASS

Firmware on CA #0 VPI .................. v2.36.5150

Host Driver Initialization ............. PASS

Number of CA Ports Active .............. 0

Port State of Port #1 on CA #0 (VPI)..... INIT (InfiniBand)

Port State of Port #2 on CA #0 (VPI)..... DOWN (InfiniBand)

Error Counter Check on CA #0 (VPI)...... FAIL

    REASON: found errors in the following counters

      Errors in /sys/class/infiniband/mlx4_0/ports/1/counters

         port_rcv_errors: 93

Kernel Syslog Check .................... PASS

Node GUID on CA #0 (VPI) ............... e4:1d:2d:03:00:6f:89:f0

------------------ DONE ---------------------

#

 

As you can see there is an error with the port_rcv_errors counter.  Also the port state for Port #1 will remain at INIT until i start the subnet manager (/etc/init.d/opensmd start) since we have unmanaged switch.  That used to start automatically.  So maybe the OFED installation wasn't completely successful?

 

Additionally, i am unable to configure NFS for RDMA. e.g.:

# echo rdma 20049 > /proc/fs/nfsd/portlist

-bash: echo: write error: Protocol not supported

#

Re: Patch needed to activate ROCEV2 for Connect 3X 10G card

$
0
0

Thanks Talat,

 

I can able to use connect 3X pro in ROCEV2 mode with inbox OFED.

One correction in your document

echo RoCE V2 > default_roce_mode

 

We need to use v2 (small v)

 

Thanks

Rama

ibdump not working for 100G Mellanox card, MLNX_OFED 3.4.1

$
0
0

Hi,

ibdump is not working with 100G Mellanox card.  I am using RHEL 7.0 kernel 4.7.

[root@localhost ~]# ibdump

Initiating resources ...

searching for IB devices in host

Port active_mtu=1024

MR was registered with addr=0x1219010, lkey=0x4ce0, rkey=0x4ce0, flags=0x1

------------------------------------------------

Device                         : "mlx5_0"

Physical port                  : 1

Link layer                     : Ethernet

Dump file                      : sniffer.pcap

Sniffer WQEs (max burst size)  : 4096

------------------------------------------------

Failed to set port sniffer1: command interface bad param

Re: hca_self_test.ofed found errors in the port_rcv_errors counters

$
0
0

it seems the port_rcv_errors error is based on the subnet manager not running, as the counter has not increased anymore since OpenSM was started.  I ran several RDMA verification tests which were all successful.  So i think that just leaves the RDMA support in NFS.
  The kernel is 2.6.32-642.11.1.el6.x86_64 and in the /boot/config-2.6.32-642.11.1.el6.x86_64 file it seems RDMA is enabled:

CONFIG_RDS_RDMA=m

CONFIG_NET_9P_RDMA=m

CONFIG_CARDMAN_4000=m

CONFIG_CARDMAN_4040=m

CONFIG_INFINIBAND_OCRDMA=m

CONFIG_SUNRPC_XPRT_RDMA_CLIENT=m

CONFIG_SUNRPC_XPRT_RDMA_SERVER=m

 

# modprobe svcrdma

# /etc/init.d/nfs restart

...      [  OK  ]

# echo rdma 20049 > /proc/fs/nfsd/portlist

-bash: echo: write error: Protocol not supported

#


Re: 40Gb/s IPoIB only gives 5Gb/s real throughput?!

$
0
0

Your memory might actually be limiting your throughput. It appears you are transferring from RAMdisk to RAMdisk? What speed RAM are you using? Also you would be better off using diskspd instead of an explorer file transfer to measure performance. 

 

Have you tried the nd_send_bw command line tool? I used that to verify connectivity and maximum throughput. I verified the numbers it was reporting by checking performance counters on my switch, which matched the reported throughput. I had a similar PCIe bandwidth limitation that I discovered by using nd_send_bw.

Re: Minimum OpenSM level to support SRIOV on IS5030 switch

$
0
0

Hi Gary,

 

I assume you also got an answer yesterday via a ticket opened.

 

The answer is the the IS5030 Opensm doesn't support SRIOV for ConnectX-4,Connect-IB

Re: How to configure cards to run at 25Gbs

$
0
0

Okay - I have to shamefully/humbly admit that the cables that I was provided were not 25GbE capable.  I've updated to Mellanox MCP2M00A001 cables and I now see 25GbE on each side.

 

Thanks for all the help/suggestions!

 

- Curt

Re: ibdump not working for 100G Mellanox card, MLNX_OFED 3.4.1

$
0
0

Hi Rama,

 

If the port protocol is configured as Ethernet, please follow instructions from our Mellanox OFED Driver UM:

Supported in ConnectX®-4 and ConnectX®-4 Lx adapter cards only.

 

3.1.17 Offloaded Traffic Sniffer

Offloaded Traffic Sniffer allows bypass kernel traffic (such as, RoCE, VMA, DPDK) to be captured by existing packet analyzer such as tcpdump.

To enable Offloaded Traffic Sniffer:

1.Turn ON the new ethtool private flags "sniffer" (off by default).

$ ethtool --set-priv-flags enp130s0f0 sniffer on

 

2.Once the sniffer flags are set on the interested Ethernet interface, run tcpdump to capture the interface's bypass kernel traffic.

 

Note that enabling Offloaded Traffic Sniffer can cause bypass kernel traffic speed degradation.

 

ibdump - Dump "Infiniband" traffic from Mellanox Technologies ConnectX HCA

            The dump file can be loaded by Wireshark for graphical traffic analysis

 

Sophie.

How to couple Connectx with FPGA?

$
0
0

Hi all,

 

my excuses in case this discussion is misplaced. I'm a new member of this community, but I expect o be here more often in the future.

 

I'm currently working on a concept and realization for a high-data-rate-acquisition system based on a cluster and Infiniband as an interconnect. Nothing new up to this point. The big thing with our system will be the data rate for data acquisition. Here at University, we're going for a large research project for signal processing topics at THz spectrum. Our concept includes two clusters (Infiniband, MPI, GPU) as well as input/output components for high-data-rate-aquisition (FPGA based). Each compute-node will be used as an input/output node for data with data rates up to 5 GBytes/s per node. The whole system will scale up with parallel instances of cluster-node/FPGA entities ending in system data rates of more than 80 GBytes/s. Lot of data to be transmited and stored in short time. Data will be processed offline.

 

I have some concerns regarding data input/output processing on node base. How to get the data into the cluster-nodes? Big question. Up to now, there is the solution of using PCIe bus for connecting compute-node with FPGA board. Surely, this will work, but still an 'old-fashioned' way without using RDMA, Infiniband etc. I've looked around to find a suitable solution for integrating Infiniband and/or Converged Ethernet as a technology to couple with FPGA data stream. To note the result: COTS devices are not available.

 

Currently, we're a little bit lost. On FPGA (UltraScale+) we could make use of a 100 GbE Xilinx transceiver. But how to get this transceiver working with Mellanox Connectx? I'm not really sure that Connectx and Xilinx 100 GbE transceiver can communicate with each other-  though both explain comformity for 100 GbE. In addition, support by Xilinx for upper transport layers are missing. Not to mention RDMA or RoCE.

 

Does anybody have experience with such an interconnect based on 100 GbE Ethernet?  The 100 GbE transceiver from Xilinx FPGA (UltraScale+) comes out-of-the-box with less or no support. What about hardware offloading? Xilinx seems to support this ...

 

Second concept is to use Infiniband as an interconnect for compute-node/FPGA coupling. Then it would be very helpful to get an Infiniband IP core for FPGA for at least FDR (5 GBytes/s transfer rate!). Could someone provide me some buisiness contacts for this?

 

Third concept could be to develop an integrated solution, FPGA plus Connectx silicon on PCIe interface card. This seems to be the most expensive solution wrt time and effort.

 

Would be very helpful to get some supporting answers from the community.

 

Best regards

 

Michael

Re: Correct Firmware for SX6710

Re: Correct Firmware for SX6710

$
0
0

I appreciate the stock answer but do not appreciate that you automatically decide that is the correct answer.

 

When I attempt to login via the link you sent, I get a failed login although I am clearly logged in here which to me means they are two separate logins?

 

Also, from the little I read so far on the site, what you may be referring to would be to allow me to update the software/firmware directly online. As I said, this is a lab switch and not connected to the outside internet so that would not be an option. If you have a link with instructions for a method to download the firmware and install it in an offline mode, that would be helpful.

 

I did also ask if there is something similar for OS upgrades as well. Again, I'd really appreciate any pointers/links/hints that anyone can provide. This is my first Mellanox equipment and as I'm sure you are all aware, it does have a learning curve associated with it.

 

Thanks!


Re: Correct Firmware for SX6710

$
0
0

Since providing the software requires an active support contract (which your account/company has) - please send an email to support@mellanox.com asking for download links for software and upgrade instructions.

MSX6025F

$
0
0

What is the difference between PN MSX6025F-1SFS to MSX6025F-2SFS ?

thanks and regards

Re: Correct Firmware for SX6710

$
0
0

Really? Well that's something I wasn't aware of. I guess I need to start beating some bushes to find out if we have that still in place and who has the information as I was told it was pretty much just up to me to maintain.

 

Thanks!

Re: How to couple Connectx with FPGA?

$
0
0

Hi Michael,

Mellanox has Innova Flex card which is fit to most of you requirements.

It is FPGA plus Connectx silicon on PCIe interface card and it supports RDMA and RoCE.

Current generation of the card supports 40Gb per port.

Please check if it can work for you.

Thank you,

Vladimir

Not getting connection between 4036E switch and MT26428 adapter

$
0
0

Hi all,

 

Infiniband n00b here, trying to learn with some spare equipment I've inherited at work...

 

I have a Voltaire 4036E switch, which I have connected up to a Linux server that has a Mellanox MT26428 dual-port adapter via a 5m DAC cable. I factory-reset the switch, and have done very minimal config so far (pretty much just set the hostname.) I do not see any lights on the switchport or adapter, and the adapter status is DOWN. I have checked that the cable is seated properly on both ends.

 

I can see the adapter showing up in the server (in lspci output) :

0c:00.0 InfiniBand: Mellanox Technologies MT26428 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE] (rev b0)

 

The cable connection is into port 1 --

root@dockerhost01:~# ibstatus
Infiniband device 'mlx4_0' port 1 status:  default gid: fe80:0000:0000:0000:0002:c903:0010:bf51  base lid: 0x0  sm lid: 0x0  state: 1: DOWN  phys state: 2: Polling  rate: 10 Gb/sec (4X)  link_layer: InfiniBand

Infiniband device 'mlx4_0' port 2 status:
  default gid: fe80:0000:0000:0000:0002:c903:0010:bf52  base lid: 0x0  sm lid: 0x0  state: 1: DOWN  phys state: 2: Polling  rate: 10 Gb/sec (4X)  link_layer: InfiniBand

 

On the switch side, I have the cable into port 18:

Test-IB-sw# cable-config show
Port 1: Not present
Port 2: Not present
Port 3: Not present
Port 4: Not present
Port 5: Not present
Port 6: Not present
Port 7: Not present
Port 8: Not present
Port 9: Not present
Port 10: Not present
Port 11: Not present
Port 12: Not present
Port 13: Not present
Port 14: Not present
Port 15: Not present
Port 16: Not present
Port 17: Not present
Port 18:Length 5m Vendor Name: Mellanox Code: QSFP+ Vendor PN: MCC4Q26C-005 Vendor Rev: B0 Vendor SN: AC501078867
Port 19: Not present
Port 20: Not present
Port 21: Not present
Port 22: Not present
Port 23: Not present
Port 24: Not present
Port 25: Not present
Port 26: Not present
Port 27: Not present
Port 28: Not present
Port 29: Not present
Port 30: Not present
Port 31: Not present
Port 32: Not present
Port 33: Not present
Port 34: Not present
Port eth1: Not present
Port eth2: Not present

 

And this is what I see when I get the port status:

Test-IB-sw(utilities)# ibportstate 1 18
PortInfo:
# Port info: Lid 1 port 18
LinkState:.......................Down
PhysLinkState:...................PortConfigurationTraining
LinkWidthSupported:..............1X or 4X
LinkWidthEnabled:................1X or 4X
LinkWidthActive:.................4X
LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps or 10.0 Gbps
LinkSpeedActive:.................undefined (7)

 

Do I have incompatible equipment connected? or if not, how best to continue troubleshooting?

 

Thanks,

Will

Viewing all 6275 articles
Browse latest View live