Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6275 articles
Browse latest View live

Re: mst start fails with ConnectX-4 on ppc64le

$
0
0

Hi Karen,

 

Thanks for your response. I do have the Advanced Toolchain Runtime installed.

 

$ sudo apt list --installed | grep advance-toolchain

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

advance-toolchain-at10.0-devel/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-mcore-libs/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-perf/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at10.0-runtime/now 10.0-3 ppc64el [installed,local]

advance-toolchain-at7.1-devel/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-mcore-libs/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-perf/trusty,now 7.1-5 ppc64el [installed]

advance-toolchain-at7.1-runtime/trusty,now 7.1-5 ppc64el [installed]

 

I did the export as mentioned(libc.so.6 exists on my system) but still see the error

 

$ echo $LD_PRELOAD

/lib/powerpc64le-linux-gnu/libc.so.6

 

I still see the error however.

 

${mbindir}/minit from /usr/bin/mst gives a segmentation fault for some reason (as seen in the logs from my previous message), not sure why that happens


Re: mst start fails with ConnectX-4 on ppc64le

$
0
0

Thank you Sood,

Please open a support ticket with the details so we can further investigate.

You can open a ticket by sending us an email to support@mellanox.com

 

Regards,

Karen.

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi,

 

Can you give more details on what you tried and what did you use ?

 

Thanks

Marc

Web interface error on SX6036

$
0
0

I am trying to setup a SX6036 VPI switch, previously used at another institute. I've configured the mgmt interface and can connect to the web UI, however it immediately gives the following error:

 

Internal Error

An internal error has occurred.

Your options from this point are:

See the logs for more details.

Return to the home page.

Retry the bad page which gave the error.

 

 

When I enable logging monitor and try to log in I see the following on the terminal:

 

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: web_include_template(), web_template.c:364, build 1: can't use empty string as operand of "!"

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: Error in template "status-logs" at line 545 of the generated TCL code

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: web_render_template(), web_template.c:226, build 1: Error code 14002 (assertion failed) returned

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: main(), rh_main.c:337, build 1: Error code 14002 (assertion failed) returned

Jul 23 11:34:29 ib-switch rh[5127]: [web.ERR]: Request handler failed with error code 14002: assertion failed

Jul 23 11:34:29 ib-switch httpd[4535]: [Mon Jul 23 11:34:29 2018] [error] [client ipremvd] Exited with error code 14002: assertion failed, referer: http://ip.removed./admin/launch?script=rh&template=failure&badpage=%2Fadmin%2Flaunch%3Fscript%3Drh%26template%3Dstatus-logs

 

 

Any idea as to check what may have failed and how to fix it?

 

regards

Andrew

Re: rxe driver does not support kernel ABI

$
0
0

I traced this to the function match_device() in libibverbs/init.c

 

There is a check for ABI versions:

 

if (sysfs_dev->abi_ver < ops->match_min_abi_version ||

            sysfs_dev->abi_ver > ops->match_max_abi_version) {

                fprintf(stderr, PFX

                        "Warning: Driver %s does not support the kernel ABI of %u (supports %u to %u) for device %s\n",

 

The variable sysfs_dev is being passed into this call by another routine called try_driver() which is called by try_drivers() which is called by try_all_drivers() which appears to be called by

ibverbs_get_device_list()

 

Does this help?

Re: rxe driver does not support kernel ABI

$
0
0

It appears that the abi version is stored here:

root@arria10:/sys/class/infiniband# cat rxe0/device/infiniband_verbs/uverbs0/abi_version

1

And this needs to be 2 according to the code...

How do I conifgure teaming in Server 2008 R2?

$
0
0

Hi All,

 

I have a couple of older Server 2008 R2 boxes that have ConnectX-3 Pro dual port cards in them.   I need to build LACP teams for my new network, but it doesn't appear that teaming exists within the Mellanox WinOF driver.  In Server 2008 R2 Microsoft Teaming didn't exist yet.

 

How am I supposed to configure these cards in LACP Teams?

 

Thanks

 

C

Re: rxe driver does not support kernel ABI

$
0
0

I went to kernel 4.17 and this went away.


Various ping programs segfaulting

$
0
0

I have a build of rdma-core in kernel 4.17 using yocto for an Altera Arria10 with a dual-core A53 ARM processor.  The system is build and rxe configures correctly, i.e. I can rxe_cfg start, rxe_cfg add eth0 and ibv_devices looks good:

 

root@arria10:~# rxe_cfg status

  Name  Link  Driver   Speed  NMTU  IPv4_addr  RDEV  RMTU

  eth0  yes   st_gmac         1500  10.0.1.28  rxe0  1024  (3)

root@arria10:~# ibv_devices

    device                 node GUID

    ------              ----------------

    rxe0                085697fffec1059b

root@arria10:~# ibv_devinfo rxe0

hca_id: rxe0

        transport:                      InfiniBand (0)

        fw_ver:                         0.0.0

        node_guid:                      0856:97ff:fec1:059b

        sys_image_guid:                 0000:0000:0000:0000

        vendor_id:                      0x0000

        vendor_part_id:                 0

        hw_ver:                         0x0

        phys_port_cnt:                  1

                port:   1

                        state:                  PORT_ACTIVE (4)

                        max_mtu:                4096 (5)

                        active_mtu:             1024 (3)

                        sm_lid:                 0

                        port_lid:               0

                        port_lmc:               0x00

                        link_layer:             Ethernet

 

This all looks good.  However, when I try to ping this machine against a PC running rdma-core, I'm getting some strange errors including a segfault when the Arria10 acts as server for udaddy.

 

root@arria10:~# udaddy -s 10.0.1.16

udaddy: starting client

[ 1883.526301] rdma_rxe: null vaddr

udaddy: connecting

failed to reg MR

udaddy: failed to create messages: -1

test complete

Segmentation faultrxe_mem_init_user

 

I traced the first error, rdma_rxe: null vaddr to rxe_mem_init_user() in <kernel>/drivers/infiniband/sw/rxe/rxe_mr.c  It appears that a page address, perhaps from a virtual to physical translation is failing.  Any thoughts on how to solve this?

 

Thanks,

FM

when using write op with more than 1024B(MTU) in softroce mode,the operation fail

$
0
0

when write message length is more than 1024B(mtu), it failed in softroce mode, pls help check why.

using the standard tool ib_write_lat to test: when ib_write_lat -s 1024 -n 5 when ib_write_lat -s 1025 -n 5, it fail.

my softroce version in in "Red Hat Enterprise Linux Server release 7.4 (Maipo)"

Is it a bug in softroce? Thanks!

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi, Marc.

 

I installed MLNX_OFED_LINUX-4.1-1.0.2.0 on my server and used the provided tool "mlnx_qos" to set the trust mode for Connect-X 3 Pro.

The command is "mlnx_qos -i p4p1 --trust=dscp".

Then the result is "Priority trust mode is not supported on your system".

 

Thanks

error packets

$
0
0

Hello, everybody!

I have errors on physical interfaces between mellanox switches, connected by MALGs.

Switches are connected by mellanox Active Cable (XLPPI). Errors appears one time in few days wich count about 1000.

You can see interface statistics in attached file.

What can be a reason of this errors?

May it be problems on queue?

 

Re: Assign a MAC to a VLAN

$
0
0

Hi,

What is the idea? Why you need it that way?

Re: error packets

$
0
0

Hi,

I see there are RX FCS errors on those physical interfaces. FCS errors are indication of for CRC errors which are generally layer 1 issue caused by the faulty port on the device or bad cable

You could try the following and see if this helps

1. Try to reseat the cables

2. Replace the cable with know good working cable

If problem still exists, please open a case with us and we will help you to resolve this issue

 

Thanks,

Pratik

Re: InfiniBand amber port led flashing

$
0
0

Hi Ken,

 

Port LED Flashing Amber means one or more ports have received symbol errors.

Possible causes are:

• Bad cable

• Bad connection

• Bad connector

Check symbol error counters on the system UI to identify the ports. Replace the cable on these ports.

As you have already replaced the cable on this port. There are no more symbol errors received and you see the LED becoming solid green.

 

Thanks,

Pratik


Re: Various ping programs segfaulting

$
0
0

This turned out to be a nasty little bug.  Turns out there is place where the rxe driver is registering memory that uses are area of memory that is not available in the ARM processor we are using.  Here's the patch that made it work...

 

2 files changed, 15 insertions(+), 2 deletions(-)

 

diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c

index 5c2684b..f2dc5a7 100644

--- a/drivers/infiniband/sw/rxe/rxe_mr.c

+++ b/drivers/infiniband/sw/rxe/rxe_mr.c

@@ -31,6 +31,7 @@

  * SOFTWARE.

  */

 

+#include <linux/highmem.h>

#include "rxe.h"

#include "rxe_loc.h"

 

@@ -94,7 +95,15 @@ static void rxe_mem_init(int access, struct rxe_mem *mem)

void rxe_mem_cleanup(struct rxe_pool_entry *arg)

{

        struct rxe_mem *mem = container_of(arg, typeof(*mem), pelem);

-       int i;

+       int i, entry;

+       struct scatterlist *sg;

+

+       if (mem->kmap_occurred) {

+               for_each_sg(mem->umem->sg_head.sgl, sg,

+                           mem->umem->nmap, entry) {

+                       kunmap(sg_page(sg));

+               }

+       }

 

        if (mem->umem)

                ib_umem_release(mem->umem);

@@ -200,12 +209,14 @@ int rxe_mem_init_user(struct rxe_dev *rxe, struct rxe_pd *pd, u64 start,

                buf = map[0]->buf;

 

                for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) {

-                       vaddr = page_address(sg_page(sg));

+                       // vaddr = page_address(sg_page(sg));

+                       vaddr = kmap(sg_page(sg));

                        if (!vaddr) {

                                pr_warn("null vaddr\n");

                                err = -ENOMEM;

                                goto err1;

                        }

+                       mem->kmap_occurred = 1;

 

                        buf->addr = (uintptr_t)vaddr;

                        buf->size = BIT(umem->page_shift);

diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.h b/drivers/infiniband/sw/rxe/rxe_verbs.h

index af1470d..9bd7eac 100644

--- a/drivers/infiniband/sw/rxe/rxe_verbs.h

+++ b/drivers/infiniband/sw/rxe/rxe_verbs.h

@@ -343,6 +343,8 @@ struct rxe_mem {

        u32                     num_map;

 

        struct rxe_map          **map;

+

+       int                     kmap_occurred;

};

 

struct rxe_mc_grp {

--

2.7.4

 

The idea is that you need to use kmap()/kunmap() rather than page_address() to handle these memory regions that are being used by both the kernel and user memory to make this work on the ARM...

 

Thanks,

FM

Building kernel module with ib client (un)register functions

$
0
0

Written minimal code of a kernel module registering RDMA-client using two functions ib_register_client(), ib_unregister_client(). The compiled code with the source code can be downloaded from the repository: https://github.com/sSadin/rdma_core_init.git

The compilation is successful. However, the module isn't loading, it generates an error in the system log:

... rdma_init: disagrees about version of symbol ib_unregister_client

... rdma_init: Unknown symbol ib_unregister_client (err -22)

... rdma_init: disagrees about version of symbol ib_register_client

... rdma_init: Unknown symbol ib_register_client (err -22)

----------------------------------

 

Installed OS: Ubunto 16.04

@uname -r

4.4.114

 

Installed Mellanox software: MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu16.04-x86_64.tgz

with command:

@./mlnxofedinstall --add-kernel-support

 

After install, there is new catalogs:

/usr/src/mlnx-ofed-kernel-4.4/include

/usr/src/ofa_kernel/default/include

with includes. But in /usr/src/linux-headers-4.4.0-116/include have "old" versions of files.

----------------------------------

@modinfo rdma_core_init.ko

srcversion:     21C176F120C52D1ED6D19F1

depends:        ib_core

vermagic:       4.4.114

----------------------------------

@modinfo ib_core

filename:       /lib/modules/4.4.114/updates/dkms/ib_core.ko

description:    core kernel InfiniBand API

srcversion:     A1112DAE0CC4C253540C773

depends:        mlx_compat

vermagic:       4.4.114

 

Note: if open the generated file rdma_init.mod.ko:

  { 0x51b43427, __VMLINUX_SYMBOL_STR(ib_register_client) },

and i open file ib_core.ko from path: /lib/modules/4.4.114/build/drivers/infiniband/core

CRC for this function is the same:

0000000051b43427 A __crc_ib_register_client

But command [modinfo ib_core] points to path: /lib/modules/4.4.114/updates/dkms, and CRC for this function is:

00000000b184c3d5 A __crc_ib_register_client

 

Q: what should I do to compile and load the module correctly?

 

Re: How to enable VF multi-queue for SR-IOV on KVM?

$
0
0

Where to open a technical support?

Re: "Priority trust-mode is not supported on your system"?

$
0
0

Hi,

 

What is your current system ? Distribution / Kernel

ConnectX-3 Pro

FW version ?

PSID ?

Can you try with the latest Mellanox OFED 4.4 ?

 

Maybe the p4p1 is not the mellanox interface ?

Maybe it is not configured as ethernet interface ?

 

Please check

 

Marc

why not just BUG_ON(!pci_channel_offline(dev->persist->pdev))

$
0
0

diff --git a/drivers/net/ethernet/mellanox/mlx4/catas.c b/drivers/net/ethernet/mellanox/mlx4/catas.c

index 715de8a..e866082 100644

--- a/drivers/net/ethernet/mellanox/mlx4/catas.c

+++ b/drivers/net/ethernet/mellanox/mlx4/catas.c

@@ -182,10 +182,17 @@ void mlx4_enter_error_state(struct mlx4_dev_persistent *persist)

       err = mlx4_reset_slave(dev);

  else

       err = mlx4_reset_master(dev);

- BUG_ON(err != 0);

+

+ if (!err)

+      mlx4_err(dev, "device was reset successfully\n");

+ else

+      /* EEH could have disabled the PCI channel during reset. That's

+      * recoverable and the PCI error flow will handle it.

+      */

+      if (!pci_channel_offline(dev->persist->pdev))

+           BUG_ON(1);

 

  dev->persist->state |= MLX4_DEVICE_STATE_INTERNAL_ERROR;

- mlx4_err(dev, "device was reset successfully\n");

  mutex_unlock(&persist->device_state_mutex);

Viewing all 6275 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>