Creating Java Program which run without fail

113 Views Asked by At

I am new to zookeeper, Apache curator and need your help to design a prorgram:

I need to create a java program, that will run a script every hour (based on cron expression provided by end user).
Consider I have 3 servers, I need to make sure the script runs every hour without failure even in case of a server is down (in this case script must run on other server). Every hour script will be running only on one server.
I have to create an interface to provide input the this java program.
Input will be (i) Script to be run and (ii) Cron expression to schedule script.

1) Please suggest an idea how can I design my program to achieve this. How zookeeper, Apache curator can be used in the same.
2) Is there any way to cache the script on these 3 servers that end-user provide to run?

Can Apache curator's NodeCache be used to cache the script on these 3 servers? Your response will be highly appreciated.

2

There are 2 best solutions below

0
On BEST ANSWER

With three servers, where one is to run no matter what, you need a distributed approach. The problem is that in the event of failures, you might not be able to solve the puzzle of whether to run the script or not.

For a start, you can just have one computer connect to others and tell them not to run. This is called a "hold down" approach; but, it has a lot of issues when you can't connect to the other computers. The problems are that most starting programmers fail to really understand the changes a network environment makes on how they need to design programs. Please take a little time to read over the typical fallacies of distributed computing.

Chron solves this by not caring what happens on other computers, so chron has the wrong design goals.

With three computers, you will also have three different clocks, with their own speeds and times. A good distributed solution will have some concept of time that doesn't directly rely on each machine's clock.

Distributed solutions (if they are to tolerate faults or failures) must be able to run without reliable communication to the other machines. Sometimes the group gets split in half, where one group of machines cannot communicate to the other group. In many cases, both group will perform the "critical" action in fear that the other group didn't. It other cases, both groups might not perform the "critical" action assuming that the other group did. A good solution will ensure that the "critical action" is performed once, even when the computers cannot communicate. Often this is done by "majority" where your group (quorum) cannot perform a critical action if you don't have access to at least a majority of the involved machines.

Look at the Paxos algorithim to get an idea of the issues; and, once you are more aware of the problems, look back at your chosen technologies to determine which parts of the problems they are attempting to solve considering the "fallacies of distributed computing". Also realize that a perfect, 100% correct solution might not be possible; because, the pre-selected machine(s) to run the script might suffer a network failure, and then a power failure in sequence in such a manner that the up machines just assume there's only a network outage.

1
On

This is an interview question, right? If yes, be aware that this answer only gets you partway.

The simplest solution is to have all three servers running, and attempt to acquire a lock to perform the processing. See http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Locks

To ensure that only one server runs the job, you will need to record the last execution time. This is simply "store a value with known key," and you'll find it in one of the intro tutorials.

Of course, if this is an interview question, the interviewer will ask follow-on questions such as "what happens if the script fails halfway through?" or "what if the computers don't have the same time?" You won't (easily) solve either of those problems with ZooKeeper.