Hmm Strange.
have you tried turning TCP offload on the driver to OFF. that used to make some difference for me with 20Gb cards using Ethernet. they used to achieve 1800MB/s so you should get up to 3600MB/s or 3.6GB/s on 40Gb
Im guesing your not using a 40Gb Mellanox managed switch either, as i believe there are many benefits to doing so, like collision management etc.
well not multicast, either single port or single stream. But considering your connecting via windows mapped network, which uses CIFS, then thats your problem right there. CIFS just cant carry that much data.
You need a SCSI transport like iSCSI and then connect windows using iscsi initiator over your IB IP network. iSCSI on 20Gb Infiniband cards using firmware 2.7 and later get 1800MB/s read write over iSCSI on SCST Target.
Again id seriously suggest if you want real performance forget windows of any flavour as a San Target. It is very bad at target mode.
Use ubuntu to setup SCST. takes 30 mins. no need to recompile teh kernel it works just as good without that.
install ubuntu 12.10.
then follow this doc http://www.zimbio.com/Ubuntu+Linux/articles/5vq_mlaTjIT/How+To+Install+SCST+on+Ubuntu
its a bit fiddly understanding how to setup SCST but i can send you the commands if you want. That will get you much better speed.
Also make sure you use a LSI raid card or something better than mobo raid.
Cheers