I've got a process that I need to be running 24/7 (on multiple servers) and basically on some of the servers it runs fine - but on others, after a few hours it cuts out without saying anything. I was just wondering if there is anything I can do to monitor different PIDs to get some sort of information about the exact time it stops and a little information on why.
Thanks.
what you might be looking for is supervisord. It's generally used for automatically restarting processes when they crash, but it can also do a lot more (like control logging of stdout/stderr/returned value) through a little configuring or extending.