If you are seeing the same behaviour without VMA, why to complicate the problem? Start tuning the system and see if it helps. Adding more components will not help to troubleshoot. After tuning, I would suggest to check netstat -s/nstat and 'netstat -unp' to check the receive queue size.

The tuning guides are available from Mellanox site - Performance Tuning for Mellanox Adapters

You also might check what is the current number of send/receive queues configured on interface and try to limit it to 16

ethtool -L <IFS> rx 16 tx 16

↧

Re: rx_fifo_errors and rx_dropped errors using VMA where CPU user less than 40%

August 1, 2018, 4:35 pm

≫ Next: Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

≪ Previous: Re: rx_fifo_errors and rx_dropped errors using VMA where CPU user less than 40%

Hi Alkx,

Thanks for your reply. I've done all the performance tuning steps from the site you recommend. I tried VMA because I was expecting someone would say "Have you tried VMA?", also vma_stats seems to give more visibility of the various buffer sizes (and errors) than available via the kernel.

I monitor /proc/net/udp. With VMA off, it shows no drops and rarely more than a few MB in the UDP buffer (I think this equivalent to netstat -unp).

Thanks for the tip on ethtool -L. Below are my current settings. I'll have a play with it and see if things improve. I hadn't seen that before. I wonder why it isn't in the tuning guides?

Also:

- What's the difference between the 'rings' (ethtool -g) and 'channels' (ethtool -L)?

- Why does making the channels smaller help?

ban115@tethys:~$ /sbin/ethtool -g enp132s0

Ring parameters for enp132s0:

Pre-set maximums:

RX: 8192

RX Mini: 0

RX Jumbo: 0

TX: 8192

Current hardware settings:

RX: 8192

RX Mini: 0

RX Jumbo: 0

TX: 512

ban115@tethys:~$ /sbin/ethtool -L enp132s0

no channel parameters changed, aborting

current values: tx 8 rx 32 other 0 combined 0

↧

Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

August 2, 2018, 9:36 am

≫ Next: RoCEv2 PFC/ECN Issues

≪ Previous: Re: rx_fifo_errors and rx_dropped errors using VMA where CPU user less than 40%

Hi Jainkun yang,

Sorry for very late reply.

I am getting 7 micro seconds latency for the starting Bytes.

When i run osu_bw test, i am seeing that System memory is also getting used along with GPU Memory. These seems strange right. With GPUDirect RDMA, we should not see any system memory usage right? Am i missing something?

lspcu -tv output is for both the systems

+-[0000:80]-+-00.0-[81]--

| +-01.0-[82]--

| +-01.1-[83]--

| +-02.0-[84]--

| +-02.2-[85]----00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]

| +-03.0-[86]----00.0 NVIDIA Corporation Device 15f8

On Host Systems:

80:02.2 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 2 (rev 02) (prog-if 00 [Normal decode])

80:03.0 PCI bridge: Intel Corporation Xeon E7 v3/Xeon E5 v3/Core i7 PCI Express Root Port 3 (rev 02) (prog-if 00 [Normal decode])

On Peer System:

80:02.2 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 2 (rev 01) (prog-if 00 [Normal decode])

80:03.0 PCI bridge: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D PCI Express Root Port 3 (rev 01) (prog-if 00 [Normal decode])

Host CPU:

# lscpu

Architecture: x86_64

CPU op-mode(s): 32-bit, 64-bit

Byte Order: Little Endian

CPU(s): 72

On-line CPU(s) list: 0-71

Thread(s) per core: 2

Core(s) per socket: 18

Socket(s): 2

NUMA node(s): 1

Vendor ID: GenuineIntel

CPU family: 6

Model: 63

Model name: Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz

Stepping: 2

CPU MHz: 1202.199

CPU max MHz: 3600.0000

CPU min MHz: 1200.0000

BogoMIPS: 4590.86

Virtualization: VT-x

L1d cache: 32K

L1i cache: 32K

L2 cache: 256K

L3 cache: 46080K

NUMA node0 CPU(s): 0-71

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm epb invpcid_single retpoline kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts

Peer CPU:

# lscpu

Architecture: x86_64

CPU op-mode(s): 32-bit, 64-bit

Byte Order: Little Endian

CPU(s): 32

On-line CPU(s) list: 0-31

Thread(s) per core: 2

Core(s) per socket: 8

Socket(s): 2

NUMA node(s): 1

Vendor ID: GenuineIntel

CPU family: 6

Model: 79

Model name: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz

Stepping: 1

CPU MHz: 1201.019

CPU max MHz: 3000.0000

CPU min MHz: 1200.0000

BogoMIPS: 4191.23

Virtualization: VT-x

L1d cache: 32K

L1i cache: 32K

L2 cache: 256K

L3 cache: 20480K

NUMA node0 CPU(s): 0-31

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb invpcid_single intel_pt retpoline kaiser tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdseed adx smap xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts

↧

RoCEv2 PFC/ECN Issues

August 3, 2018, 5:38 pm

≫ Next: Re: "Priority trust-mode is not supported on your system"?

≪ Previous: Re: MLNX+NVIDIA ASYNC GPUDirect - Segmentation fault: invalid permissions for mapped object running mpi with CUDA

We have two servers with ConnectX-4 100Ge cards and two Cisco C3232C switches with routing between them and are trying to get RoCEv2 routing through with PFC/ECN to provide the best performance during periods of congestion.

The funny thing is using base configuration with no other servers on the switches, we get terrible performance (1.6 Gbps) across the routed link using iSER when we are only using about 20 Gbps (1 iSER connection and test workload configuration). By using multiple iSER connections and PFC, we can get about 95 Gbps, so we know that the hardware is capable of the performance in routing mode. We can't understand why in the default case the performance is so bad. The fio test shows that a lot of IO happens, then there is none and it just cycles back and forth.

We would like to use both PFC and ECN for our configuration, but we are trying to validate that ECN will work without PFC and when we disable PFC, we can't test ECN most likely because of the above issue.

On the Cisco switches, we have policy maps that places our traffic with the DSCP markings into a group that has ECN enabled (I'm not a Cisco person, so I may not be getting the terminology quite right) and we can see the group counters on the Cisco incrementing. We don't ever see any packets marked with congestion, probably because the switch never sees any due to the above problem.

When we have the client set to 40 Gbps and do a read test with PFC, we get pause frames and great performance. We have the Cisco switches match the DSCP value and remark the COS for packets that traverse the router (interesting enough Cisco sends PFC pause frames on the routed link even though there are no VLANs configured. We captured it in wireshark, but with the adapters set to --trust=pcp, the performance in terrible, but --trust=dscp works well). The Cisco switches also show pause frame counters incrementing when we are 100g end to end. I'm not sure why it would be incrementing when there is no congestion.

We have done so many permutations of tests, that I may be getting fuzzy in some details. Here is a matrix of some tests that I can be sure of. This is all 100g end to end.

switch PFC mode (ports)	trust mode	pfc prio 3 enabled	skprio -> cos mapping	Result
static on/off	mlnx_qos --trust=X	mlnx_qos --pfc=0,0,0,X,0,0,0,0	ip link set rsY.Z type vlan egress 2:3
on	pcp	yes	yes	Good
on	pcp	yes	no	Good
on	pcp	no	yes	Bad
on	pcp	no	no	Bad
on	dscp	yes	yes	Good
on	dscp	yes	no	Good
on	dscp	no	yes	Bad
on	dscp	no	no	Bad
off	pcp	yes	yes	Bad
off	pcp	yes	no	Bad
off	pcp	no	yes	Bad
off	pcp	no	no	Bad
off	dscp	yes	yes	Bad
off	dscp	yes	no	Bad
off	dscp	no	yes	Bad
off	dscp	no	no	Bad

We are using OFED 4.4-1.0.0.0 on both nodes, one is CentOS 7.3, the other CentOS 7.4, running 4.9.116 and the firmware is 12.23.1000 on one card and 12.23.1020 on the other. In addition to the above matrix, we have only changed:

echo 26 > /sys/class/net/rs8bp2/ecn/roce_np/cnp_dscp

echo 106 > /sys/kernel/config/rdma_cm/mlx5_3/ports/1/default_roce_tos

If you have any ideas that we can try, we would appreciate it.

Thank you.

↧

Re: "Priority trust-mode is not supported on your system"?

August 5, 2018, 1:20 am

≫ Next: Problem installing MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu18.04-x86_64

≪ Previous: RoCEv2 PFC/ECN Issues

Thanks for your help!

↧

Problem installing MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu18.04-x86_64

August 6, 2018, 9:27 am

≫ Next: Need help updating firmware/speed for MNPA19-XTR adapters

≪ Previous: Re: "Priority trust-mode is not supported on your system"?

Hi,

I am running linux mint 19 which is basically ubuntu 18.04. I recently bought a ConnectX-3 CX311A and am trying to get it running.
I downloaded the MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu18.04-x86_64 and tried to run it:

sudo ./mlnxofedinstall --add-kernel-support --distro ubuntu18.04

Result:

Note: This program will create MLNX_OFED_LINUX TGZ for ubuntu18.04 under /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-4.15.0-29-generic directory.

See log file /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-4.15.0-29-generic/mlnx_iso.5496_logs/mlnx_ofed_iso.5496.log

Checking if all needed packages are installed...

Building MLNX_OFED_LINUX RPMS . Please wait...

find: 'MLNX_OFED_SRC-4.4-1.0.0.0/RPMS': No such file or directory

Creating metadata-rpms for 4.15.0-29-generic ...

ERROR: Failed executing "/usr/bin/perl /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-4.15.0-29-generic/mlnx_iso.5496/MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu18.04-ext/create_mlnx_ofed_installers.pl --with-hpc --tmpdir /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-4.15.0-29-generic/mlnx_iso.5496_logs --mofed /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-4.15.0-29-generic/mlnx_iso.5496/MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu18.04-ext --rpms-tdir /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-4.15.0-29-generic/mlnx_iso.5496/MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu18.04-ext/RPMS --output /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-4.15.0-29-generic/mlnx_iso.5496/MLNX_OFED_LINUX-4.4-1.0.0.0-ubuntu18.04-ext --kernel 4.15.0-29-generic --ignore-groups eth-only"

ERROR: See /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-4.15.0-29-generic/mlnx_iso.5496_logs/mlnx_ofed_iso.5496.log

Failed to build MLNX_OFED_LINUX for 4.15.0-29-generic

Once I check this log it says:

[33mUnsupported package: kmp [0m

Logs dir: /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-4.15.0-29-generic/mlnx_iso.5496_logs/OFED.5926.logs

General log file: /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0-4.15.0-29-generic/mlnx_iso.5496_logs/OFED.5926.logs/general.log

[32m

Below is the list of OFED packages that you have chosen

(some may have been added by the installer due to package dependencies):

[0m

ofed-scripts

mlnx-ofed-kernel-utils

mlnx-ofed-kernel-dkms

iser-dkms

isert-dkms

srp-dkms

mlnx-nfsrdma-dkms

mlnx-nvme-dkms

mlnx-rdma-rxe-dkms

kernel-mft-dkms

knem-dkms

knem

Checking SW Requirements...

This program will install the OFED package on your machine.

Note that all other Mellanox, OEM, OFED, RDMA or Distribution IB packages will be removed.

Those packages are removed due to conflicts with OFED, do not reinstall them.

Installing new packages

Building DEB for ofed-scripts-4.4 (ofed-scripts)...