Thanks, Bmac!
My Mellanox card isn't IPoIB though. It's an actual 40Gbps Ethernet HCA. I'm sure the silicone is the same as the IB cards but it's hard-wired to function as a pure Ethernet controller with the proper header and MAC address of an ethernet controller.
I appreciate the test suggestion but I'm really only interested in getting my 40GbE cards working as efficiently as they can in Windows using TCP/IP. 1.3GB/s is pretty crap considering 40GbE is theoretically 5GB/s and PCIe3 has max throughput of 6.5GB/s.
My rendering software uses mapped network drives so that's what I'm testing against.
I do plan on migrating to CentOS in the future but for now Windows will have to do.
I'll definitely try datagram mode, though I tried larger MTUs and it didn't change anything.
I updated the firmware on both cards and installed Windows 2012 on my file server as the 4.3 drivers have a lot more options than the 3.2 ones do in Win2008R2.
There are 3 performance tuning options in the 4.2 drivers: Single Port, Multicast, and Single Stream. Which would be best suited for my type of application? I'm basically accessing large (500MB+) Binary scene files and lots of textures (5-200MB images) off of a RAID array.