Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6275 articles
Browse latest View live

ConnectX-2 10GbE Ethernet "Flash not found", FW update problem

$
0
0

Dear Community,

 

I have two Mellanox ConnectX-2 Ethernet card (MNPA19-XTR).

In Linux, I see only:

01:00.0 Memory controller: Mellanox Technologies MT25408 [ConnectX IB SDR Flash Recovery] (rev b0)

 

I tired update the FW with different versions of MFT Tools (2.7, 3.x, 4.x) but no success. I follow and try many Google results.

I got always: "-E- Failed to query 0000:01:00.0 device, error : No such file or directory. MFE_NO_FLASH_DETECTED"

If I force flash chip type, I got: "Flash write failed: Flash erase of address 0x80000 failed: MFE_WRITE_TIMEOUT"

 

("Flash is not present" jumper = opened)

 

I attached a text file, with relevant flashing/query outputs.

 

Thank you for answer.

Ps: Mellanox support not helps, because I have no "contract".


Is an p2p (dedicated link, without switch) Fibre connexion totally lossless ?

$
0
0

Dear all,

 

I am doing RDMA transfer using connectx-5 100GbE Fibre with RoCEv2 (UD unreliable datagram Send) between two servers (<10 meters)

 

data size is around 8 GBytes or 80GB during tests

some time everything is fine and I dont have packet drop

but I also have  frequently a low number (around 0.01%) packet silently loss (nothing visible with verbs api neither dmesg or sysfs)

 

I am sure that some packet are dropped because I use RDMA_SEND_WITH_IMM verb with a pkt number that is checked while polling RWQ on destination host. Application is a loop that continuously post 15360 work request (3072bytes length)  at once

 

This is not related with completion queue overrun (they are polled).

 

I pay attention to cpu affinity, I try to put some amount of nanosleep on source host between ibv_post_send,I also set Ring parameters to max (8192). I suspected some transceiver temperature issue and try with another 40GbE copper link, and I have same issue

 

My question : are some (very few number but not zero) packet loss unavoidable ?

 

 

cheers

Re: What are some good budget options for NICs and Switch?

$
0
0

Hi Kirill,

 

Many thanks for posting this question in the Mellanox Community.

 

Please use the following link to inquire about your needs and what Mellanox products are suitable for your request. The link is https://mellanox.force.com/inquiry.

 

Many thanks.

 

Cheers,~Mellanox Technical Support

Re: RoCE v2 configuration with Linux drivers and packages

$
0
0

Hi Vetri,

Thank you for posting your question on the Mellanox Community.

Instead of using MLNX_OFED, you can use the OS distributed drivers for our adapters. These are called INBOX drivers. 

The following provides you the User Manual and Release Notes for some of the distributions which include the INBOX driver for our adapter. The User Manual provides the information how to set the RoCE mode if supported with the INBOX driver.

For any other inquiries regarding the OS distributed driver, you need to contact the OS vendor for the instructions.

The link is -> http://www.mellanox.com/page/inbox_drivers


Thanks and regards,

~Mellanox Technical Support

Failed to pxe boot win10 if set start type of mlx4_bus and ibbus to 0(boot start)

$
0
0

We have a diskless(pxe) boot system, according to our experience, in order to boot with mellanox connect2 nic, we should always set mlx4_bus and ibbus service to boot start, it worked well for windows 7. Recently, we moved to win10, but if we do the same, win10 does't boot, we debugged some processes of mlx4_bus and ibbus, only found that if we set boot start type to 0 for mlx4_bus and ibbus, there always lack of a \device\000000XX device to be created compared to a normal system, we don't know why windows 7 is ok but windows 10 failed for the same nic to boot. Could anyone help me to solve this?

Re: Mellanox Ethernet Adapters PRM is now available online!

$
0
0

Can we get an update?

The current linux driver source uses more opcodes which are not defined in the 0.40 version of the manual.

The ones I am curious about are:

 

MLX5_CMD_OP_ALLOC_ENCAP_HEADER            = 0x93d,
MLX5_CMD_OP_DEALLOC_ENCAP_HEADER          = 0x93e,
MLX5_CMD_OP_ALLOC_MODIFY_HEADER_CONTEXT   = 0x940,
MLX5_CMD_OP_DEALLOC_MODIFY_HEADER_CONTEXT = 0x941,

Re: send_bw test between QSFP ports on Dual Port Adapter

$
0
0

Hi Dmitri,

 

For testing tcp performance in Windows we recommend using nttcp tool, From the command line kindly run the NTttcp test and provide the output, the NTttcp tool is provided by Microsoft to test the network performance.

 

For example:

Server side: ntttcp.exe -s -m 8,*,<client ip> -l 128k -a 2 -t 30

Client side: ntttcp.exe -r -m 8,*, <client ip> -l 128k -a 2 -t 30

 

For your reference kindly see the download and explanation link:

https://gallery.technet.microsoft.com/NTttcp-Version-528-Now-f8b12769

 

Please let me know about the results.

Karen.

Re: How to configure host chaining for ConnectX-5 VPI

$
0
0

Hi Daniel,

 

I wanted to thank you for this directions they were very helpful. I was successful in linking three nodes together, all running Ubuntu 18.04. I was able to get ~96Gbs in speed between all the host using iperf2. I then took one of the boxes and loaded ESXi 6.7, and configured the same IP address on the two interface I had before. The VMware box can not communicate with the others now. I can communicate through the Nic between the other Ubuntu boxes. When I run a tcpdump on the ESXi I see the ARP request getting created, but get no response. I am wondering if you have any idea why the Chaining feature does not seem to work with ESXi?

 

Thanks

Shawn


Can't ibping Lid or GUID but can ping by ip

$
0
0

We are using an SB7790 unmanaged switch connected to:

  1. VMWARE (6.5) server with opensm on a guest Centos VM (7.5) - Mellanox ConnectX-4
  2. Server with Ubuntu (16.04.5 LTS) - Mellanox ConnectX-4
  3. Have all updated

 

Successful items:

  • Opensm is running (active) from Centos VM
  • ibstat finds all interfaces with active and linkup.
  • ibnetworkdiscover finds all interfaces connected
  • We can ping by ip to and from each server

 

Unsuccessful item:

  • Not able to ibping across switch

 

We're not sure what we might be missing.

 

Can't find many resources to do more troubleshooting. Anyone that could help would be greatly appreciated!

 

Thanks

Brian

Re: SN2100B v3.6.8004

$
0
0

Hi Reginald,

 

The reason for this is because of an enhanced security feature added for all versions starting from Mellanox Onyx/OS 3.6.8004 and above - HTTP is disabled by default. Therefore, we are not able to reach the GUI after upgrading to 3.6.8004 and above.

There are 2 possible solutions:

1.  Use HTTPS instead of HTTP to log into the GUI

2.  You can enable http by using the following commands:

      switch(config)# no web https ssl secure-cookie enable

      switch(config)# web http enable

      switch(config)# write memory

Now you can use HTTP and HTTPS connections to log into the GUI

 

Hope this helps

 

Thanks,

Pratik Pande

Re: Assign a MAC to a VLAN

$
0
0

Hi, that is not supported. The VLANs are separated only by port # and VLAN ID.

Soft-RoCE on mininet topology

$
0
0

Hi Team,

 

On two mininet VMs (on virtual box), I am able to run RDMA client and server and can also send traffic using rping tool. (Using link :- HowTo Configure Soft-RoCE )

 

Issue -

 

I have created 1switch 1 host topology on either VMs and connected both switch using GRE tunnel. (Host 1 can ping Host2 and also Host2 can ping Host1).

 

When tried to couple veth with rxe device got the error "sh: echo: I/O error".

 

Can you please suggest on Soft-RoCE working for mininet topology.

 

Thanks

Re: How to configure host chaining for ConnectX-5 VPI

$
0
0

You're welcome!

I'm glad I helped someone after all the headache I went through for it.

 

I have no hard experience with VMWare, and so take all of this with a grain of salt.

 

First thought is vlan tags. I was told that VMWare tags by default.

 

From my (limited) understanding and thoughts, host chaining inside VMware is not a good idea.

If you setup a virtual switch (on the vmware side) and put both ports of the card on the switch, give that switch an IP, that would allow for vmotion and such over the link at close to line speed. Letting the switch (analogous to openvswitch) do all of the routing, and fast pathing.

 

Thoughts - If there was host chaining:

Vmware still sees both ports (we can't assign IPs to raw port interfaces to start with.)

It doesn't really know which port to send out, so it could take the extra hop before it gets to the destination.

Three node, desired going from A -> B might take the path of A -> C -> B

 

Where I can talk is non-chaining speed.

We did try using openswitch and the cards with chaining off. So long as the stp stuff is turned on; we got nearly line speed.

 

We opened a support ticket for our problems with MTU. It took a while, but we found the problem.

They have a nice little utility (sysinfo-snapshot) for seeing the card internals and OS config options which helped us (by looking through it.)

mlx5: ethtool -m not working

$
0
0

I have a ConnectX-4 2x100G. I'm running Linux 4.16.16 (Fedora) with the mlx5_core kernel module installed. ethtool -m does not appear to work with this setup. Other ethool commands work fine such as ethtool -S and ethtool -i and just plain ethtool. I have an official Mellanox active optical cable transceiver plugged into the port. What is required to get the transceiver module info from the card?

 

$ ethtool -m enp9s0f0

Cannot get module EEPROM information: Input/output error

 

$ ethtool -i enp9s0f0

driver: mlx5_core

version: 5.0-0

firmware-version: 12.12.1100 (MT_2150110033)

expansion-rom-version:

bus-info: 0000:09:00.0

supports-statistics: yes

supports-test: yes

supports-eeprom-access: no

supports-register-dump: no

supports-priv-flags: yes

 

$ lspci | grep Mel

09:00.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

09:00.1 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4]

 

$ ethtool enp9s0f0

Settings for enp9s0f0:

    Supported ports: [ FIBRE ]

    Supported link modes:   10000baseKR/Full

                            40000baseCR4/Full

                            40000baseSR4/Full

                            40000baseLR4/Full

                            25000baseCR/Full

                            25000baseSR/Full

                            50000baseCR2/Full

                            100000baseSR4/Full

                            100000baseCR4/Full

                            100000baseLR4_ER4/Full

    Supported pause frame use: Symmetric

    Supports auto-negotiation: Yes

    Supported FEC modes: Not reported

    Advertised link modes:  10000baseKR/Full

                            40000baseCR4/Full

                            40000baseSR4/Full

                            40000baseLR4/Full

                            25000baseCR/Full

                            25000baseSR/Full

                            50000baseCR2/Full

                            100000baseSR4/Full

                            100000baseCR4/Full

                            100000baseLR4_ER4/Full

    Advertised pause frame use: Symmetric

    Advertised auto-negotiation: Yes

    Advertised FEC modes: Not reported

    Speed: 100000Mb/s

    Duplex: Full

    Port: FIBRE

    PHYAD: 0

    Transceiver: internal

    Auto-negotiation: on

    Supports Wake-on: d

    Wake-on: d

    Current message level: 0x00000004 (4)

                   link

    Link detected: yes

Can the cable of an AOC be replaced?

$
0
0

Hi all,

 

I've got some FDR AOCs with damaged cables. I'm hoping to reuse the transceivers instead of scrapping them. I opened op the top panel on one of the transceivers and saw that the does disconnect internally. Are there replacement cables that have those little ferules on the end, or an adapter to convert the transceiver into a standalone?

 

Thanks you


Re: Assign a MAC to a VLAN

CX5 - bad system state

$
0
0

I'm working with Xilinx Petalinux on a Xilinx PG213 core as root complex, so in general, there is no confidence in the HW or SW.

CX5 gets pretty far along before it fails with:

 

[    4.447417] pci 0000:01:00.0: calling mellanox_check_broken_intx_masking+0x0/0x168                                                                                

[    4.454965] mlx5_core 0000:01:00.0: runtime IRQ mapping not provided by arch                                                                                      

[    4.462017] mlx5_core 0000:01:00.0: enabling device (0000 -> 0002)                                                                                                

[    4.468151] mlx5_core 0000:01:00.0: enabling bus mastering                                                                                                        

[    4.473941] mlx5_core 0000:01:00.0: firmware version: 16.22.1002                                                                                                  

[    4.700002] mlx5_core 0000:01:00.0: mlx5_cmd_check:710:(pid 1710): MANAGE_PAGES(0x108) op_mod(0x1) failed, status bad system state(0x4), syndrome (0x4e2106)      

[    4.713926] mlx5_core 0000:01:00.0: give_pages:311:(pid 1710): func_id 0x0, npages 14972, err -5                                                                  

[    4.742890] mlx5_core 0000:01:00.0: failed to allocate init pages                                

 

Any clues on if this points to a HW problem? Or a SW problem?

Re: CX5 - bad system state

Keeping two versions driver for two kernels

$
0
0

Hi,

 

How to set in the installation script no removal of the old driver. I have two kernels (both needed):
1. Centos 7.5 (./install --eth-only);
2. Centos 7.5 + patch RT (compilation, ./install --eth-only -add-kernel-support).

Unfortunately, one driver uninstalls another during installation. This effectively blocks the use of the latest drivers for both kernels.

 

Please help.

 

Best Regards,

Robert

Slow File Transfer On 20Gbps IB

$
0
0

Dear All,

 

I am new in Infiniband devices. I bought Mellanox 2 pieces of Connectx-2 (20Gbps) from ebay and installed them on 2 debian servers (PCIE3 8 lanes) with no problem.  I had got 15Gbps measured with iperf3 as follow:

iperf3 -c 10.20.0.34

Connecting to host 10.20.0.34, port 5201

[  4] local 10.20.0.35 port 58208 connected to 10.20.0.34 port 5201

[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd

[  4]   0.00-1.00   sec  1.85 GBytes  15.9 Gbits/sec    0   11.9 MBytes      

[  4]   1.00-2.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   2.00-3.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   3.00-4.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   4.00-5.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   5.00-6.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   6.00-7.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   7.00-8.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   8.00-9.00   sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

[  4]   9.00-10.00  sec  1.82 GBytes  15.6 Gbits/sec    0   11.9 MBytes      

- - - - - - - - - - - - - - - - - - - - - - - - -

[ ID] Interval           Transfer     Bandwidth       Retr

[  4]   0.00-10.00  sec  18.2 GBytes  15.6 Gbits/sec    0             sender

[  4]   0.00-10.00  sec  18.2 GBytes  15.6 Gbits/sec                  receiver

 

 

But, why do I only get 150MB/s (about 1.2Gbps) while transfer a large file (3.5GB) via SCP and RSYNC?

I think no problem with disk I/O because I transfer from and to ramdisk.

 

I appreciate your helps. Thank you very much.

Viewing all 6275 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>