Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all 6275 articles
Browse latest View live

Re: Infiniband on Linux... defaults to datagram mode somehow?

$
0
0

It does not have that directory.


Help for hardware decisions

$
0
0

Dear Community!

 

at the moment we expand our cluster. We started with ~22 Nodes and a SX6036. Even if the nodes

are connected only via QDR. Now new nodes should be integrated and following the fat tree

topology I plan to buy new switches. But now some questions come up.

 

1. For a non blocking configuration with less or equal 36 nodes is it better to stay with one switch, or

are there any technical reasons to have one level 2  switch and two level 1 switches?

 

2. I plan to use the SX6036 as level 2 switch and to add a SX6025 as second level 2 switch.

as leaf switches I want to use four IS5025, because the nodes are limited to QDR. For the

future i may want to add FDR capable switches/nodes. Sounds this setup reasonable to you?

 

3. Are there any special cables/tricks to connect the switches, or have I to take 18

single cables from leaf switch to level 2 switches?

 

best regards!

 

Sven

Re: ConnectX-2: Enabling RSS (Receive Side Scaling) in IPoIB mode

$
0
0

Thanks for the link, I didn't notice MLNX_OFED 2.0 was already out. I went through the tuning steps from the MLNX_OFED-1.5.3 release, and got the numbers above (11.4Gbps for plain IP forwarding, 6Gbps for forwarding + a netfilter kernel hook that ACCEPTs every packet).

 

I noticed a new mlnx_affinity script was introduced in MLNX_OFED 2, if that does something different than what I can do via  /proc/irq/x/smp_affinity then I will give it a try.

 

Note that I specified that I can't use mlnx_en since I have an IB-only switch. My question remains: are hardware queues available when running in IPoIB mode ? The 2.0 release notes mention "Flow Steering for Ethernet and InfiniBand" was introduced. Should I take this to mean 'yes' ?

 

Thanks

Bogdan

Re: Can ping but not connect to ports using infiniband

$
0
0

In case this information is useful. I've also run tcpdump on the ib0 interface and it seems that things are working fine. I have the ib0 interfaces set up with.

auto ib0

iface ib0 inet static

address 192.168.10.1

netmask 255.255.255.0

up echo connected >`find /sys -name mode | grep ib0`

up echo 65520 >`find /sys -name mtu | grep ib0`

 

and 192.168.10.11 for ib0 on the o ther machine.

 

The results of ping from 192.168.10.1 to 192.168.10.11 with tcpdump running on 192.168.10.11 are:

 

14:44:20.275887 IP 192.168.10.1 > 192.168.10.11: ICMP echo request, id 5352, seq 1, length 64

14:44:20.275925 IP 192.168.10.11 > 192.168.10.1: ICMP echo reply, id 5352, seq 1, length 64

14:44:20.278045 IP 192.168.10.11 > 192.168.10.1: ICMP echo reply, id 5352, seq 1, length 64

14:44:21.275273 IP 192.168.10.1 > 192.168.10.11: ICMP echo request, id 5352, seq 2, length 64

 

However, if I run netcat on a tcp or udp port (192.168.10.11) and the try to connect to it across the infiniband (from 192.168.10.1) I only get a bunch of initial packets one way, and no replies from the netcat server.

 

14:29:40.572374 IP 192.168.10.1.46100 > 192.168.10.11.1235: Flags [S], seq 574883956, win 65480, options [mss 65480,nop,nop,sackOK,nop,wscale 7], length 0

14:29:41.570871 IP 192.168.10.1.46100 > 192.168.10.11.1235: Flags [S], seq 574883956, win 65480, options [mss 65480,nop,nop,sackOK,nop,wscale 7], length 0

14:29:43.574893 IP 192.168.10.1.46100 > 192.168.10.11.1235: Flags [S], seq 574883956, win 65480, options [mss 65480,nop,nop,sackOK,nop,wscale 7], length 0

14:29:45.586912 ARP, Request who-has 192.168.10.11 tell 192.168.10.1, length 56

14:29:45.586929 ARP, Reply 192.168.10.11 is-at a0:00:01:00:fe:80:00:00:00:00:00:00:00:02:c9:03:00:9f:57:41, length 56

14:29:47.586948 IP 192.168.10.1.46100 > 192.168.10.11.1235: Flags [S], seq 574883956, win 65480, options [mss 65480,nop,nop,sackOK,nop,wscale 7], length 0

14:29:55.603038 IP 192.168.10.1.46100 > 192.168.10.11.1235: Flags [S], seq 574883956, win 65480, options [mss 65480,nop,nop,sackOK,nop,wscale 7], length 0

14:30:11.651239 IP 192.168.10.1.46100 > 192.168.10.11.1235: Flags [S], seq 574883956, win 65480, options [mss 65480,nop,nop,sackOK,nop,wscale 7], length 0

 

Thanks.

Re: Can ping but not connect to ports using infiniband

$
0
0

Hi Jon,

Having same subnet for two interfaces (ib0/ib1) may cause problems in the networking layer.

Try to configure ib0 on each one of the machines with 192.168.10.X, and bring down the other interfaces that starts with 192.168.10.X (for example, in your output I see that ib1 is 192.168.10.2, bring down this interface by running ifconfig ib1 down) then confirm that there is only one entry in the 'route -n' that shows 192.168.10.0/255.255.255.0.

 

Also, make sure that there is no firewall/iptable that is blocking your tcp connection, try:

/etc/init.d/iptables stop

rmmod ip_conntrack &> /dev/null

rmmod ip_tables &> /dev/null

rmmod iptable_filter &> /dev/null

 

If your TCP connection still doesn't work, please copy/paste the full output of ifconfig (for all interfaces), route -n, arp -na, and tcpdump output as you've done before.

Re: Getting eIPoIB to work ?

$
0
0

Thanks nldesai,

We've captured your feedback, and will fix ipoibd in the next release.

Re: Can ping but not connect to ports using infiniband

$
0
0

So I've taken down ib1 on both machines. And tried to reconnect. Currently openSM is only running on 192.168.10.1. "ping 192.168.10.1" works from 192.168.10.11 and vice versa. telnet to the netcat server on port 1234 via the ib0 ip still does not work. The server does not reply over inifiband.

 

#ibstat

CA 'mlx4_0'

        CA type: MT4099

        Number of ports: 2

        Firmware version: 2.11.500

        Hardware version: 0

        Node GUID: 0x0002c903009f5740

        System image GUID: 0x0002c903009f5743

        Port 1:

                State: Active

                Physical state: LinkUp

                Rate: 40 (FDR10)

                Base lid: 1

                LMC: 0

                SM lid: 3

                Capability mask: 0x02514868

                Port GUID: 0x0002c903009f5741

                Link layer: InfiniBand

        Port 2:

                State: Initializing

                Physical state: LinkUp

                Rate: 40 (FDR10)

                Base lid: 0

                LMC: 0

                SM lid: 0

                Capability mask: 0x02514868

                Port GUID: 0x0002c903009f5742

                Link layer: InfiniBand

 

Here is ifconfig on the 192.168.10.11 machine:

ib0       Link encap:UNSPEC  HWaddr A0-00-01-00-FE-80-00-00-00-00-00-00-00-00-00-00

          inet addr:192.168.10.11  Bcast:192.168.10.255  Mask:255.255.255.0

          inet6 addr: fe80::202:c903:9f:5741/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1

          RX packets:98 errors:0 dropped:0 overruns:0 frame:0

          TX packets:96 errors:0 dropped:7 overruns:0 carrier:0

          collisions:0 txqueuelen:1024

          RX bytes:10388 (10.3 KB)  TX bytes:11510 (11.5 KB)

 

 

lo        Link encap:Local Loopback

          inet addr:127.0.0.1  Mask:255.0.0.0

          inet6 addr: ::1/128 Scope:Host

          UP LOOPBACK RUNNING  MTU:16436  Metric:1

          RX packets:543 errors:0 dropped:0 overruns:0 frame:0

          TX packets:543 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0

          RX bytes:43156 (43.1 KB)  TX bytes:43156 (43.1 KB)

 

 

wlan0     Link encap:Ethernet  HWaddr 00:21:79:c3:18:bb

          inet addr:10.0.0.245  Bcast:10.0.0.255  Mask:255.255.255.0

          inet6 addr: fe80::221:79ff:fec3:18bb/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:3117 errors:0 dropped:0 overruns:0 frame:0

          TX packets:1438 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:450682 (450.6 KB)  TX bytes:218279 (218.2 KB)

 

# route -n

Destination     Gateway         Genmask         Flags Metric Ref    Use Iface

0.0.0.0         10.0.0.1        0.0.0.0         UG    0      0        0 wlan0

10.0.0.0        0.0.0.0         255.255.255.0   U     2      0        0 wlan0

169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 ib0

192.168.10.0    0.0.0.0         255.255.255.0   U     0      0        0 ib0

 

#arp -na

? (10.0.0.246) at 00:21:79:c3:18:c0 [ether] on wlan0

? (10.0.0.1) at 78:cd:8e:0c:91:12 [ether] on wlan0

 

and ifconfig for 192.168.10.1:

eth0      Link encap:Ethernet  HWaddr 10:60:4b:7c:88:d6

          UP BROADCAST MULTICAST  MTU:1500  Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

          Interrupt:20 Memory:eff00000-eff20000

 

 

ib0       Link encap:UNSPEC  HWaddr A0-00-01-00-FE-80-00-00-00-00-00-00-00-00-00-00

          inet addr:192.168.10.1  Bcast:192.168.10.255  Mask:255.255.255.0

          inet6 addr: fe80::202:c903:17:c081/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:65520  Metric:1

          RX packets:93 errors:0 dropped:0 overruns:0 frame:0

          TX packets:99 errors:0 dropped:6 overruns:0 carrier:0

          collisions:0 txqueuelen:1024

          RX bytes:10385 (10.3 KB)  TX bytes:10848 (10.8 KB)

 

 

lo        Link encap:Local Loopback

          inet addr:127.0.0.1  Mask:255.0.0.0

          inet6 addr: ::1/128 Scope:Host

          UP LOOPBACK RUNNING  MTU:16436  Metric:1

          RX packets:136 errors:0 dropped:0 overruns:0 frame:0

          TX packets:136 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:0

          RX bytes:10990 (10.9 KB)  TX bytes:10990 (10.9 KB)

 

 

wlan1     Link encap:Ethernet  HWaddr 00:21:79:c3:18:c0

          inet addr:10.0.0.246  Bcast:10.0.0.255  Mask:255.255.255.0

          inet6 addr: fe80::221:79ff:fec3:18c0/64 Scope:Link

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:4854 errors:0 dropped:0 overruns:0 frame:0

          TX packets:4276 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:743194 (743.1 KB)  TX bytes:661900 (661.9 KB)

 

#route -n

Destination     Gateway         Genmask         Flags Metric Ref    Use Iface

0.0.0.0         10.0.0.1        0.0.0.0         UG    0      0        0 wlan1

10.0.0.0        0.0.0.0         255.255.255.0   U     2      0        0 wlan1

169.254.0.0     0.0.0.0         255.255.0.0     U     1000   0        0 ib0

192.168.10.0    0.0.0.0         255.255.255.0   U     0      0        0 ib0

 

#arp -na

? (192.168.10.11) at a0:00:01:00:fe:80:00:00:00 [ether] on ib0

? (10.0.0.245) at 00:21:79:c3:18:bb [ether] on wlan1

? (10.0.0.1) at 78:cd:8e:0c:91:12 [ether] on wlan1

 

So I run "nc -l 1234 &" on 192.168.10.11. And then try to telnet as before from 192.168.10.1 with "telnet 192.168.10.11 1234".

#tcpdump -n -i ib0 port not ssh

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on ib0, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes

16:34:17.155170 IP 192.168.10.1.42476 > 192.168.10.11.1234: Flags [S], seq 2967937540, win 65480, options [mss 65480,nop,nop,sackOK,nop,wscale 7], length 0

16:34:18.152858 IP 192.168.10.1.42476 > 192.168.10.11.1234: Flags [S], seq 2967937540, win 65480, options [mss 65480,nop,nop,sackOK,nop,wscale 7], length 0

16:34:20.156877 IP 192.168.10.1.42476 > 192.168.10.11.1234: Flags [S], seq 2967937540, win 65480, options [mss 65480,nop,nop,sackOK,nop,wscale 7], length 0

16:34:22.164910 ARP, Request who-has 192.168.10.11 tell 192.168.10.1, length 56

16:34:22.164936 ARP, Reply 192.168.10.11 is-at a0:00:01:00:fe:80:00:00:00:00:00:00:00:02:c9:03:00:9f:57:41, length 56

16:34:24.164930 IP 192.168.10.1.42476 > 192.168.10.11.1234: Flags [S], seq 2967937540, win 65480, options [mss 65480,nop,nop,sackOK,nop,wscale 7], length 0

16:34:32.181038 IP 192.168.10.1.42476 > 192.168.10.11.1234: Flags [S], seq 2967937540, win 65480, options [mss 65480,nop,nop,sackOK,nop,wscale 7], length 0

16:34:48.197234 IP 192.168.10.1.42476 > 192.168.10.11.1234: Flags [S], seq 2967937540, win 65480, options [mss 65480,nop,nop,sackOK,nop,wscale 7], length 0

 

Thanks for your help so far.

Re: Can ping but not connect to ports using infiniband

$
0
0

Hi Jon,

Why does "route -n" associate 169.254.x.x to ib0? please try to make the necessarily changes so "route -n" shows ib0 only one time (for 192.168.10.X network).


Also, I find it odd that the rx/tx counter for 192.168.10.1 shows 93/99, and rx/tx counter for 192.168.10.11 shows 98/99, while the tcpdump shows that 192.168.10.1 is initiating IP packets that are not being answered by 192.168.10.11 (I would expect that 192.168.10.11 RX counter be much less than 192.168.10.1 TX counter), can you watch the counter of 192.168.10.11 when you try the TCP connection and see if it reports any RX/TX packets?


Do you see any ipoib errors (on any of the machines) under /var/log/messages?

What's the application you're trying to run (captured by tcpdump), is it netperf? can you double check the parameters  or try a different application?




Re: Can ping but not connect to ports using infiniband

$
0
0

There's no DHCP. I just manually assigned static private ips for the two infiniband interfaces so that is probably why 169.254.x.x is assigned.

 

"netcat" is running on the server and I am using telnet to test the port 1234.

 

/var/log/opensm.log says:

May 23 18:31:39 206699 [9D5D6700] 0x02 -> SUBNET UP

May 23 18:31:49 205683 [9D5D6700] 0x01 -> osm_prtn_make_partitions: Partition configuration /etc/opensm/partitions.conf is not accessible (No such file or directory)

 

These are the only messages that constantly repeat. Regarding syslog, I've pasted the relevenat ib messages. Overall there are a few messages, but there seem to be a few messages about ib_ipath and ib_qib, but I'm not sure if those are fine or not..I've also brought ib1 down on both machines after launching the kernel.

 

/var/log/syslog on 192.168.10.11: "#cat /var/log/syslog | grep ib0 | less"

 

May 23 16:11:17 192.168.10.11 kernel: [   22.512158] ADDRCONF(NETDEV_UP): ib0: link is not ready

May 23 16:11:17 192.168.10.11 kernel: [   22.550891] ib0: enabling connected mode will cause multicast packet drops

May 23 16:11:17 192.168.10.11 kernel: [   22.586082] ib0: mtu > 4092 will cause multicast packet drops.

May 23 16:11:17 192.168.10.11 NetworkManager[1250]:    SCPluginIfupdown: guessed connection type (ib0) = 802-3-ethernet

May 23 16:11:17 192.168.10.11 NetworkManager[1250]:    SCPlugin-Ifupdown: update_connection_setting_from_if_block: name:ib0, type:802-3-ethernet, id:Ifupdown (ib0), uuid: 353e15fa-f276-690c-8c3a-e7609d1d3651

May 23 16:11:17 192.168.10.11 NetworkManager[1250]:    SCPlugin-Ifupdown: adding ib0 to iface_connections

May 23 16:11:17 192.168.10.11 NetworkManager[1250]:    SCPlugin-Ifupdown: adding iface ib0 to well_known_interfaces

May 23 16:11:17 192.168.10.11 NetworkManager[1250]:    SCPlugin-Ifupdown: devices added (path: /sys/devices/pci0000:00/0000:00:1c.0/0000:05:00.0/net/ib0, iface: ib0)

May 23 16:11:17 192.168.10.11 NetworkManager[1250]:    SCPluginIfupdown: failed to parse MAC address 'a0:00:01:00:fe:80:00:00:00:00:00:00:00:02:c9:03:00:9f:57:41' for ib0

May 23 16:11:18 192.168.10.11 NetworkManager[1250]: <info> (ib0): carrier is OFF

May 23 16:11:18 192.168.10.11 NetworkManager[1250]: <info> (ib0): new InfiniBand device (driver: 'mlx4_core' ifindex: 3)

May 23 16:11:18 192.168.10.11 NetworkManager[1250]: <info> (ib0): exported as /org/freedesktop/NetworkManager/Devices/1

May 23 16:11:19 192.168.10.11 NetworkManager[1250]: <info> (ib0): carrier now ON (device state 10)

May 23 16:11:19 192.168.10.11 kernel: [   25.454698] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready

May 23 16:11:21 192.168.10.11 avahi-daemon[1050]: Joining mDNS multicast group on interface ib0.IPv6 with address fe80::202:c903:9f:5741.

May 23 16:11:21 192.168.10.11 avahi-daemon[1050]: New relevant interface ib0.IPv6 for mDNS.

May 23 16:11:21 192.168.10.11 avahi-daemon[1050]: Registering new address record for fe80::202:c903:9f:5741 on ib0.*.

May 23 16:11:30 192.168.10.11 kernel: [   35.906622] ib0: no IPv6 routers present

May 23 16:18:28 192.168.10.11 kernel: [  453.895683] device ib0 entered promiscuous mode

May 23 16:19:13 192.168.10.11 kernel: [  499.399522] device ib0 left promiscuous mode

May 23 16:19:18 192.168.10.11 kernel: [  503.710638] device ib0 entered promiscuous mode

May 23 16:19:46 192.168.10.11 kernel: [  531.578413] device ib0 left promiscuous mode

 

/var/log/syslog on 192.168.10.1 when grepping ib is...

 

May 23 16:08:42 192.168.10.1 kernel: [   14.363096] ib_ipath: disagrees about version of symbol ib_umem_release

May 23 16:08:42 192.168.10.1 kernel: [   14.363099] ib_ipath: Unknown symbol ib_umem_release (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.363152] ib_ipath: disagrees about version of symbol ib_modify_qp_is_ok

May 23 16:08:42 192.168.10.1 kernel: [   14.363154] ib_ipath: Unknown symbol ib_modify_qp_is_ok (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.363200] ib_ipath: disagrees about version of symbol ib_unregister_device

May 23 16:08:42 192.168.10.1 kernel: [   14.363202] ib_ipath: Unknown symbol ib_unregister_device (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.363212] ib_ipath: disagrees about version of symbol ib_register_device

May 23 16:08:42 192.168.10.1 kernel: [   14.363213] ib_ipath: Unknown symbol ib_register_device (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.363241] ib_ipath: disagrees about version of symbol ib_dispatch_event

May 23 16:08:42 192.168.10.1 kernel: [   14.363243] ib_ipath: Unknown symbol ib_dispatch_event (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.363258] ib_ipath: disagrees about version of symbol ib_umem_get

May 23 16:08:42 192.168.10.1 kernel: [   14.363260] ib_ipath: Unknown symbol ib_umem_get (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.363283] ib_ipath: disagrees about version of symbol ib_dealloc_device

May 23 16:08:42 192.168.10.1 kernel: [   14.363284] ib_ipath: Unknown symbol ib_dealloc_device (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.363288] ib_ipath: disagrees about version of symbol ib_alloc_device

May 23 16:08:42 192.168.10.1 kernel: [   14.363290] ib_ipath: Unknown symbol ib_alloc_device (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364674] ib_qib: disagrees about version of symbol ib_umem_release

May 23 16:08:42 192.168.10.1 kernel: [   14.364677] ib_qib: Unknown symbol ib_umem_release (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364737] ib_qib: disagrees about version of symbol ib_modify_qp_is_ok

May 23 16:08:42 192.168.10.1 kernel: [   14.364739] ib_qib: Unknown symbol ib_modify_qp_is_ok (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364778] ib_qib: disagrees about version of symbol ib_unregister_device

May 23 16:08:42 192.168.10.1 kernel: [   14.364780] ib_qib: Unknown symbol ib_unregister_device (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364790] ib_qib: disagrees about version of symbol ib_register_device

May 23 16:08:42 192.168.10.1 kernel: [   14.364791] ib_qib: Unknown symbol ib_register_device (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364819] ib_qib: disagrees about version of symbol ib_create_ah

May 23 16:08:42 192.168.10.1 kernel: [   14.364821] ib_qib: Unknown symbol ib_create_ah (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364826] ib_qib: disagrees about version of symbol ib_unregister_mad_agent

May 23 16:08:42 192.168.10.1 kernel: [   14.364828] ib_qib: Unknown symbol ib_unregister_mad_agent (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364838] ib_qib: disagrees about version of symbol ib_post_send_mad

May 23 16:08:42 192.168.10.1 kernel: [   14.364839] ib_qib: Unknown symbol ib_post_send_mad (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364845] ib_qib: disagrees about version of symbol ib_create_send_mad

May 23 16:08:42 192.168.10.1 kernel: [   14.364846] ib_qib: Unknown symbol ib_create_send_mad (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364851] ib_qib: disagrees about version of symbol ib_dispatch_event

May 23 16:08:42 192.168.10.1 kernel: [   14.364852] ib_qib: Unknown symbol ib_dispatch_event (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364873] ib_qib: disagrees about version of symbol ib_umem_get

May 23 16:08:42 192.168.10.1 kernel: [   14.364874] ib_qib: Unknown symbol ib_umem_get (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364912] ib_qib: disagrees about version of symbol ib_dealloc_device

May 23 16:08:42 192.168.10.1 kernel: [   14.364914] ib_qib: Unknown symbol ib_dealloc_device (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364918] ib_qib: disagrees about version of symbol ib_free_send_mad

May 23 16:08:42 192.168.10.1 kernel: [   14.364920] ib_qib: Unknown symbol ib_free_send_mad (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364924] ib_qib: disagrees about version of symbol ib_alloc_device

May 23 16:08:42 192.168.10.1 kernel: [   14.364926] ib_qib: Unknown symbol ib_alloc_device (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364931] ib_qib: disagrees about version of symbol ib_destroy_ah

May 23 16:08:42 192.168.10.1 kernel: [   14.364932] ib_qib: Unknown symbol ib_destroy_ah (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.364955] ib_qib: disagrees about version of symbol ib_register_mad_agent

May 23 16:08:42 192.168.10.1 kernel: [   14.364957] ib_qib: Unknown symbol ib_register_mad_agent (err -22)

May 23 16:08:42 192.168.10.1 kernel: [   14.448044] [drm] radeon: ib pool ready.

May 23 16:08:42 192.168.10.1 kernel: [   14.448090] [drm] ib test succeeded in 0 usecs

May 23 16:08:42 192.168.10.1 kernel: [   15.178090] type=1400 audit(1369350522.119:5): apparmor="STATUS" operation="profile_load" name="/usr/lib/cups/backend/cups-pdf" pid=1068 comm="apparmor_parser"

May 23 16:08:48 192.168.10.1 kernel: [   21.597470] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v1.0 (May 20 2013)

May 23 16:08:48 192.168.10.1 kernel: [   21.701871] ADDRCONF(NETDEV_UP): ib0: link is not ready

May 23 16:08:48 192.168.10.1 kernel: [   21.732160] ib0: enabling connected mode will cause multicast packet drops

May 23 16:08:48 192.168.10.1 kernel: [   21.760167] ib0: mtu > 4092 will cause multicast packet drops.

May 23 16:08:49 192.168.10.1 NetworkManager[1040]:    SCPlugin-Ifupdown: devices added (path: /sys/devices/pci0000:00/0000:00:01.0/0000:03:00.0/net/ib0, iface: ib0)

May 23 16:08:49 192.168.10.1 NetworkManager[1040]:    SCPluginIfupdown: failed to parse MAC address 'a0:00:01:00:fe:80:00:00:00:00:00:00:00:02:c9:03:00:17:c0:81' for ib0

May 23 16:08:49 192.168.10.1 NetworkManager[1040]: <info> (ib0): carrier is OFF

May 23 16:08:49 192.168.10.1 NetworkManager[1040]: <info> (ib0): new InfiniBand device (driver: 'mlx4_core' ifindex: 4)

May 23 16:08:49 192.168.10.1 NetworkManager[1040]: <info> (ib0): exported as /org/freedesktop/NetworkManager/Devices/2

May 23 16:08:49 192.168.10.1 NetworkManager[1040]:    SCPlugin-Ifupdown: devices added (path: /sys/devices/pci0000:00/0000:00:01.0/0000:03:00.0/net/ib1, iface: ib1)

May 23 16:08:49 192.168.10.1 NetworkManager[1040]:    SCPlugin-Ifupdown: device added (path: /sys/devices/pci0000:00/0000:00:01.0/0000:03:00.0/net/ib1, iface: ib1): no ifupdown configuration found.

May 23 16:08:49 192.168.10.1 NetworkManager[1040]: <info> (ib1): carrier is OFF

May 23 16:08:49 192.168.10.1 NetworkManager[1040]: <info> (ib1): new InfiniBand device (driver: 'mlx4_core' ifindex: 5)

May 23 16:08:49 192.168.10.1 NetworkManager[1040]: <info> (ib1): exported as /org/freedesktop/NetworkManager/Devices/3

May 23 16:08:49 192.168.10.1 NetworkManager[1040]: <info> (ib1): now managed

May 23 16:08:49 192.168.10.1 NetworkManager[1040]: <info> (ib1): device state change: unmanaged -> unavailable (reason 'managed') [10 20 2]

May 23 16:08:49 192.168.10.1 NetworkManager[1040]: <info> (ib1): bringing up device.

May 23 16:08:49 192.168.10.1 NetworkManager[1040]: <info> (ib1): deactivating device (reason 'managed') [2]

May 23 16:08:49 192.168.10.1 kernel: [   22.680448] ADDRCONF(NETDEV_UP): ib1: link is not ready

May 23 16:08:49 192.168.10.1 kernel: [   22.680717] ADDRCONF(NETDEV_UP): ib1: link is not ready

May 23 16:11:19 192.168.10.1 kernel: [  172.100233] mlx4_core 0000:03:00.0: mlx4_ib: Port 1 logical link is up

May 23 16:11:19 192.168.10.1 NetworkManager[1040]: <info> (ib0): carrier now ON (device state 10)

May 23 16:11:19 192.168.10.1 kernel: [  172.131133] ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready

May 23 16:11:29 192.168.10.1 kernel: [  182.729686] ib0: no IPv6 routers present

Re: Mellanox (old Voltaire) ISR9024D-M recover flash area

$
0
0

Hi!

I'm also tried with WinMFT via spark.exe

 

Below is result

 

Spark-01.JPG.jpg

this picture shows to me that only all boot sections were normal.

 

Another command "spark -d lid-7 swreset" was executed perfectly!

 

Below is another simulation command result.

 

Spark-02.JPG.jpg

Burnning simulation test to Primary image was successfull!

But I2C read error was occured again...

 

another option is -pe_i2c (also -se_i2c).

 

How can I find Primary, Secondary I2C address?

 

Is it must connect with your I2C cable and module?

Re: Can ping but not connect to ports using infiniband

$
0
0

One more thing to pay attention to Jon,

if you take two machines and connect then one to each other using both ports of each HCA AND have a switch in between this is all consider as "one IB fabric/network"

 

if you take two servers and connect them one to each other using both ports of each HCA (port 0 to port 0 and port 1 to port 1) *without* a switch in between, you got yourself TWO separate networks. those two networks will live in parallel but will not be as one like in the previous example.

 

i think that the source of your problems is right there. to prove my point, disconnect one of the two cables or put a switch in between.

 

Obviously, make sure each interface (ib0, ib1) has different IP and subnet (this is IP and routing. has nothing 2do with IB). or alternatively if you need both ports, you can use bonding with one address.

 

works??

Re: ConnectX-2: Enabling RSS (Receive Side Scaling) in IPoIB mode

$
0
0

UDP hardware queues work in ConnectX-2, but there are more flow steering / RSS features in ConnectX-3. 

Re: Mellanox positioning in the IB market?

$
0
0

Copying my Linkedin response

 

 

Henry, let me address your questions by adding comments below your questions:
 
Would you please share your secrete of success in InfiniBand Market?
 
>>> We are the only company in the world that owns our entire intellectual property (IP) from silicon, to systems, cables to adapters, software to firmware and so on. This allows us to be extremely flexible and price conscious towards our market. Additionally, many vertical markets outside of traditional HPC have noticed that VPI (our product name for flexible InfiniBand or Ethernet on our hardware) is the only way to scale clouds, big data, databases, and other enterprise applications. We see this from all our software partners outside of super computers, as well as the growing solid state disk providers.
 
Is Intel or LSI going to catch up?
>>> I'm not good at picking stocks, so I am not good at telling the future
All joking aside, we have an extremely robust roadmap and very loyal customer base. That being said, the market demands performance that only we can provide, so other vendors will have to find a way to break out of traditional paradigms.
 
What will be the demand/pricing trends?
>>> The price for VPI is competitive, if not better than traditional fabric and network providers. Here's an exercise: price out our gear per Gbps provided by shopping around. For example, what does 10Gbps vs. 56Gbps cost these days? You might be surprised.
 
What is the pro/cons between Fibre channel vs Ethernet ?
>>> We support Ethernet, so we see it as a very good product to sell. Fibre Channel has been limited by a 1x connection with relatively higher latency. InfiniBand is typically 4x connectors with lower latency. So, we can scale to 56Gbps in terms of bandwidth and latency under a micro where FC is limited to 16Gbps at it's very best, and many microseconds of latency.
 

Re: help for ibping and ibv_rc_pingpong over switchless configuration

$
0
0

Perhaps remove the second port all together, and/or restart 1 of the SMs to run more than 1 instance:

 

OpenSM Multiple instances: opensm(8) - Linux man page

-g, --guid<GUID in hex> This option specifies the local port GUID value with which OpenSM should bind. OpenSM may be bound to 1 port at a time. If GUID given is 0, OpenSM displays a list of possible port GUIDs and waits for user input. Without -g, OpenSM tries to use the default port.


Also - are any of the ib tools able to run on ib0? 

Re: Mellanox positioning in the IB market?

$
0
0

Saluting J. Margolis,

 

When do you think Mellanox stock price will be back to its peak last year?

 

Thanks & Happy Memory Day w/e

 

Henry


Re: mlx4_vnic

$
0
0

Hi torkel,

What other package did you previously use for mlx4_vnic?

Re: ConnectX-2: Enabling RSS (Receive Side Scaling) in IPoIB mode

$
0
0

MOFED 2.0 includes IPoIB RSS support for datagram mode.

 

Try to upgrade your software to MOFED 2.0, then check your CPU utilization when running IPoIB datagram (echo datagram > /sys/class/net/ibX/mode).

Re: MHGH28-XTC not working

$
0
0

Like I said  newbie when it comes to routing / ip forwarding  how would I get that to work?

Very stupid MHGH28-XTC question

$
0
0


Hi

 

Can you load an Ethernet driver for these cards?  I've bought a couple of Sun badged ones and flashed them back to Mellanox firmware.  As an IB card it looks fine but I really wanted Ethernet support so I could attach it to an Ethernet switch.  There are some drivers in the package but loading them results in a "code 10" within Windows (2012).  I've tried the 2008r2 drivers with similar results. So just asking if I am being particularly thick here before I troubleshoot more!

 

Thanks
Drew

Re: Very stupid MHGH28-XTC question

$
0
0

You need the ConnectX3 Cards for the right firmware to work on Server 2012.  The ConnectX2 cards are not supported.

 

Mike

Viewing all 6275 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>