I have an SR-IOV enabled installation of Openstack Kilo on RHEL 7.1 compute nodes.
I followed this document (Mellanox-Neutron-Kilo-Redhat-InfiniBand - OpenStack) and it seemed to work partially.
I can see ib0 attached to the VM (I log in from console and run command "lspci" and "ip link")
But ib0 doesn't link up.
I can make it work by using libvirt(not nova) directly,so I think this is a problem of Nova, eswitchd or neutron-mlnx-agent.
Error logs are here.
・On computer node
1. dmesg of VM
the message "ib0: multicast join failed for ff12:401b:8000:0000:0000:0000:ffff:ffff, status -22" appears many times after this.
I find it strange that "Bringing up interface ib0: [ OK ]" displays without " ADDRCONF(NETDEV_CHANGE): ib0: link becomes ready"
mlx4_core: Mellanox ConnectX core driver v3.1-1.0.3 (29 Sep 2015)
mlx4_core: Initializing 0000:00:05.0
mlx4_core 0000:00:05.0: Detected virtual function - running in slave mode
mlx4_core 0000:00:05.0: Sending reset
mlx4_core 0000:00:05.0: Sending vhcr0
mlx4_core 0000:00:05.0: Requested number of MACs is too much for port 1, reducing to 64
mlx4_core 0000:00:05.0: HCA minimum page size:512
mlx4_core 0000:00:05.0: Timestamping is not supported in slave mode
mlx4_core: device is working in RoCE mode: Roce V1
mlx4_core: gid_type 1 for UD QPs is not supported by the devicegid_type 0 was chosen instead
mlx4_core: UD QP Gid type is: V1
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
<mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v3.1-1.0.3 (29 Sep 2015)
<mlx4_ib> check_flow_steering_support: Device managed flow steering is unavailable for IB port in multifunction env.
mlx4_core 0000:00:05.0: mlx4_ib_add: allocated counter index 6 for port 1
mlx4_core 0000:00:05.0: mlx4_ib_add: allocated counter index 7 for port 2
microcode: CPU0 sig=0x206a1, pf=0x1, revision=0x1
platform microcode: firmware: requesting intel-ucode/06-2a-01
microcode: CPU1 sig=0x206a1, pf=0x1, revision=0x1
platform microcode: firmware: requesting intel-ucode/06-2a-01
Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
[ OK ]
mlx4_core 0000:00:05.0: mlx4_ib: multi-function enabled
mlx4_core 0000:00:05.0: mlx4_ib: operating in qp1 tunnel mode
knem 1.1.2.90mlnx: initialized
Setting hostname cbv-lsf4.novalocal: [ OK ]
Setting up Logical Volume Management: 7 logical volume(s) in volume group "rootvg" now active
[ OK ]
Checking filesystems
Checking all file systems.
[/sbin/fsck.ext4 (1) -- /] fsck.ext4 -a /dev/mapper/rootvg-lv_root
/dev/mapper/rootvg-lv_root: clean, 134671/4915200 files, 2194674/19660800 blocks
Entering non-interactive startup
Calling the system activity data collector (sadc)...
Starting monitoring for VG rootvg: 7 logical volume(s) in volume group "rootvg" monitored
[ OK ]
pps_core: LinuxPPS API ver. 1 registered
pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
PTP clock support registered
mlx4_en: Mellanox ConnectX HCA Ethernet driver v3.1-1.0.3 (29 Sep 2015)
card: mlx4_0, QP: 0xa78, inline size: 120
Default coalesing params for mtu:4092 - rx_frames:88 rx_usecs:16
card: mlx4_0, QP: 0xa80, inline size: 120
Default coalesing params for mtu:4092 - rx_frames:88 rx_usecs:16
Loading HCA driver and Access Layer:[ OK ]
NOHZ: local_softirq_pending 08
ADDRCONF(NETDEV_UP): ib0: link is not ready
ib0: multicast join failed for ff12:401b:8000:0000:0000:0000:ffff:ffff, status -22
ip6tables: No config file.[WARNING]
Bringing up loopback interface: [ OK ]
Bringing up interface eth0:
Determining IP information for eth0...ib0: multicast join failed for ff12:401b:8000:0000:0000:0000:ffff:ffff, status -22
done.
[ OK ]
Bringing up interface ib0: [ OK ]
2. /var/log/neutron/eswitchd
2016-08-25 13:21:35,989 DEBUG eswitchd [-] Handling message - {u'action': u'get_vnics', u'fabric': u'*'}
2016-08-25 13:21:35,989 DEBUG eswitchd [-] fabrics =['default']
2016-08-25 13:21:35,989 DEBUG eswitchd [-] vnics are {u'fa:16:3e:7d:d7:87': {'mac': u'fa:16:3e:7d:d7:87', 'device_id': u'afab526e-da36-44ee-8f5e-8743451bc8a4'}, 'fa:16:3e:d8:dd:a3': {'mac': 'fa:16:3e:d8:dd:a3', 'device_id': '7c5f4c1a-1492-4087-8eee-c54b91cc733b'}, '1a:5c:90:77:4f:88': {'mac': '1a:5c:90:77:4f:88', 'device_id': '0e3e7d62-b88f-4e9b-b685-116280c87f5a'}, u'fa:16:3e:4f:46:de': {'mac': u'fa:16:3e:4f:46:de', 'device_id': u'7b7e8f69-438c-4ec7-95fe-0d59f939fd19'}, 'fe:66:d7:3e:cb:ca': {'mac': 'fe:66:d7:3e:cb:ca', 'device_id': '7d4a002a-cfab-4189-9bff-b656c863592a'}}
2016-08-25 13:21:37,989 DEBUG eswitchd [-] Handling message - {u'action': u'get_vnics', u'fabric': u'*'}
2016-08-25 13:21:37,989 DEBUG eswitchd [-] fabrics =['default']
2016-08-25 13:21:37,990 DEBUG eswitchd [-] vnics are {u'fa:16:3e:7d:d7:87': {'mac': u'fa:16:3e:7d:d7:87', 'device_id': u'afab526e-da36-44ee-8f5e-8743451bc8a4'}, 'fa:16:3e:d8:dd:a3': {'mac': 'fa:16:3e:d8:dd:a3', 'device_id': '7c5f4c1a-1492-4087-8eee-c54b91cc733b'}, '1a:5c:90:77:4f:88': {'mac': '1a:5c:90:77:4f:88', 'device_id': '0e3e7d62-b88f-4e9b-b685-116280c87f5a'}, u'fa:16:3e:4f:46:de': {'mac': u'fa:16:3e:4f:46:de', 'device_id': u'7b7e8f69-438c-4ec7-95fe-0d59f939fd19'}, 'fe:66:d7:3e:cb:ca': {'mac': 'fe:66:d7:3e:cb:ca', 'device_id': '7d4a002a-cfab-4189-9bff-b656c863592a'}}
3. /var/log/neutron/mlnx-agent.log
2016-08-25 13:20:16.230 8881 DEBUG oslo_messaging._drivers.amqp [-] UNIQUE_ID is b3adf08a1ac24b8d83eee8f48f0e47aa. _add_unique_id /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqp.py:264
2016-08-25 13:20:17.973 8881 DEBUG networking_mlnx.plugins.ml2.drivers.mlnx.agent.utils [req-e22856b5-392a-4794-ac1e-b34fdf0eb9e1 ] get_attached_vnics get_attached_vnics /usr/lib/python2.7/site-packages/networking_mlnx/plugins/ml2/drivers/mlnx/agent/utils.py:82
2016-08-25 13:20:19.974 8881 DEBUG networking_mlnx.plugins.ml2.drivers.mlnx.agent.utils [req-e22856b5-392a-4794-ac1e-b34fdf0eb9e1 ] get_attached_vnics get_attached_vnics /usr/lib/python2.7/site-packages/networking_mlnx/plugins/ml2/drivers/mlnx/agent/utils.py:82
2016-08-25 13:20:21.974 8881 DEBUG networking_mlnx.plugins.ml2.drivers.mlnx.agent.utils [req-e22856b5-392a-4794-ac1e-b34fdf0eb9e1 ] get_attached_vnics get_attached_vnics /usr/lib/python2.7/site-packages/networking_mlnx/plugins/ml2/drivers/mlnx/agent/utils.py:82
I think this is a similar issue to this post,but I can't find what to do the next.
Mellanox eSwitchd issue on Openstack Havana nova-compute
Could you tell me any ideas?