David
I assume you have dual port IB cards.
You should:
- connect the IB card with both switches
- define per subnet a pkey (we have default and 4 pkeys from 60 to 63)
Example for partition.conf:
# Defaultpartition
Default=0x7fff, ipoib, mtu=4 : ALL=full, SELF=full ;
# Namenskonvention key<VLAN tag> z.B. key60=0x803C 60 (dezimal)=3C (hexadecimal)
key60=0x803C, ipoib, mtu=4 : ALL=full;
key61=0x803D, ipoib, mtu=4 : ALL=full;
key62=0x803E, ipoib, mtu=4 : ALL=full;
key63=0x803F, ipoib, mtu=4 : ALL=full;
- define a virtual switch in ESX, standard or distributed does not matter
- define in ESX the first port as active and the second port as standby (this is important)
- define per subnet a portgroup and use the pkey as VLAN id
This works for us in our clustered storage setup with Solaris 11.2 and corosync/pacemaker.
We use 3 subnets: storage, vmotion, backup.
Each of the 4 IB switches are connected with 2 other switches as a mesh of 4 switches.
If you use just one port per subnet you disable failover between ports.
And I guess you would like to use redundancy and automatic failover if you use 2 switches.
You may try to change the order of active/standby per portgroup=subnet=VLAN if you want to have traffic over all ports.
And we use NFS over IPOIB and not ISCSI or SRP with datastores sizes between 5 and 170 TB.
NFS is simpler and fast enough with IPOIB.
Andreas