Usage of Falcon for Big data processing

327 Views Asked by At

I wish to process data (for example validate csv column) in HDFS using Falcon. I have succesfully installed Falcon (version - Hortonworks Sandbox 2.1, Falcon -0.5.0.2.1.1.0) and able to submit a job. However the job is not running and UI have nothing to start/stop the Job. I wish to know how to validate the output of a job and proceed to another job depending on validation of first job - a workflow.

2

There are 2 best solutions below

0
On

If you are looking for a custom logic you can create a oozie workflow and have that workflow submit a falcon job as the last task.

<process name="sample-process">
...
   <workflow engine="oozie" path="/projects/bootcamp/workflow"/>
...
</process>

https://falcon.apache.org/EntitySpecification.html#Process_Specification

Hope it helps.

2
On

you mentioned that job was submitted. If you are using the command line of apache falcon, "submit" alone is not enough, "schedule" command should also be run. For falcon "submit" job will not make is go into running state, "schedule" is necessary.

you can refer to http://falcon.apache.org/0.6.1/FalconCLI.html for all the commands.