I'm working on a failover mechanism for TCP connections. If a host breaks down (hardware failure), I'd like to be able to take up the connection on another machine. I want to stream periodically the state of the "live" socket to a "backup" host and have it take-over (tcp_repair and all) when the "live" host breaks.
I have a prototype with libsoccr and it works OK, except that I have to pause the socket for some time, and depending on the buffer sizes it can take some time (hundreds of microsecs, sometimes 1-2ms) and it's a bit problematic for my application, since I dump its state quite often (~every 10ms).
I'd like to be able to checkpoint a TCP socket (via libsoccr if it's the way, I'm also OK with raw syscalls if necessary) without pausing the socket. Is it possible to just "fork" or duplicate a TCP socket with its complete state, with some kind of CoW so that the live socket isn't paused ?
Would fork help here ? Any idea ?