Checkpointing (snapshot/resume) library on Windows?

762 Views Asked by At

Quoting Wikipedia, checkpointing "basically consists of storing a snapshot of the current application state, and later on, use it for restarting the execution in case of failure."

I need to checkpoint and resume a C++ scientific application (that we wrote). The program is single-threaded and has no dependence on other running applications: no GUI, no networking, no pipes, no fork, etc. All it does are calculations and file I/O.

On Linux, DMTCP works perfectly well for me. It does not even require modifications to source code nor re-linking. BLCR and Condor support checkpointing on Linux as well.

In the near future, I'll have to run the application on Windows. I searched around and couldn't find any checkpointing library for Windows. I could, in principle, modify the application so that it dumps its state onto the disk upon request, and reloads the data the next time it runs. However, due to the complexity of the application, this requires a lot of effort even with the help of serialization libraries.

So, are there any C/C++ checkpointing library on Windows? It's perfectly fine if the library requires modifications in my code. Ideally the library would allow me to checkpoint upon request (e.g. by sending a signal/message), instead of only be able to save the state at particular points in the code.

(I'm aware of similar questions that says checkpointing is not generally possible. However, it is possible for my case and I've been doing it all the time on Linux.)

1

There are 1 best solutions below

3
On

Try building you program as a shared library and call it from inside factor or sbcl. then use the builting checkpointing capability of either.