I'm looking for a little advice on "hacking" Mono (and in fact, .NET too).
Context: As part of the Isis2 library (Isis2.codeplex.com) I want to support very fast "zero copy" replication of memory-mapped files on machines that have the right sort of hardware (Infiband NICs), and minimal copying for more standard Ethernet with UDP. So the setup is this: We have a set of processes {A,B....} all linked to Isis2, and some member, maybe A, has a big memory-mapped file, call it F, and asks Isis2 to please rereplicate F onto B, D, G, and X. The library will do this very efficiently and very rapidly, even with heavy use by many concurrent initiators. The idea would be to offer this to HPC and cloud developers who are running big-data applications.
Now, Isis2 is coded in C# on .NET and cross-compiles to Linux via Mono. Both .NET and Mono are managed, so neither wants to let me do zero-copy network I/O -- the normal model would be "copy your data into a managed byte[] object, then use SendTo or SendAsync to send. To receive, same deal: Receive or ReceiveAsync into a byte[] object, then copy to the target location in the file." This will be slower than what the hardware can sustain.
Turns out that on .NET I can hack around the normal memory protections. I built my own mapped file wrapper (in fact based on one posted years ago by a researcher at Columbia). I pull in the Win32Kernel.dll library, and then use Win32 methods to map my file, initiate the socket Send and Receive calls, etc. With a bit of hacking I can mimic .NET asynchronous I/O this way, and I end up with something fairly clean and coded entirely in C# with nothing .NET even recognizes as unsafe code. I get to treat my mapped file as a big unmanaged byte array, avoiding all that unneeded copying. Obviously I'll protect all of this from my Isis2 users; they won't know.
Now we get to the crux of my question: on Linux, I obviously can't load the Win32 kernel dll since it doesn't exist. So I need to implement some basic functionality using core Linux O/S calls: the fmap() call will map my file. Linux has its own form of asynchronous I/O too: for Infiniband, I'll use the Verbs library from Mellanox, and for UDP, I'll work with raw IP sends and signals ("interrupts") on completion. Ugly, but I can get this to work, I think. Again, I'll then try to wrap all this to look as much like standard asynchronous Windows async I/O as possible, for code cleanness in Isis2 itself, and I'll hide the whole unmanaged, unsafe mess from end users.
Since I'll be sending a gigabyte or so at a time, in chunks, one key goal is that data sent in order would ideally be received in the order I post my async receives. Obviously I do have to worry about unreliable communication (causes stuff to end up dropped, and I'll then have to copy). But if nothing is dropped I want the n'th chunk I send to end up in the n'th receive region...
So here's my question: Has anyone already done this? Does anyone have any tips on how Mono implements the asynchronous I/O calls that .NET uses so heavily? I should presumably do it the same way. And does anyone have any advice on how to do this with minimal pain?
One more question: Win32 is limited to 2Gb of mapped files. Cloud systems would often run Win64. Any suggestions on how to maximize interoperability while allowing full use of Win64 for those who are running that? (A kind of O/S reflection issue...)