Using Apache Falcon to setup data replication accross clusters

129 Views Asked by At

We have been PoC-ing falcon for our data ingestion workflow. We have a requirement to use falcon to setup a replication between two clusters (feed replication, not mirroring). The problem I have is that the user ID on cluster A is difference from the ID in cluster B. Has anyone used falcon with this setup? I can't seem to find a way to get this to work.

1) I am setting up a replication from Cluster A => Cluster B 2) I am defining the falcon job on cluster A

At the time of the job setup it looks like I can only define one user ID that owns the job. How do I setup a job where the ID on cluster A is different from the ID in cluster B? Any help would be awesome!!

1

There are 1 best solutions below

0
On

Apache Falcon uses 'ACL owner', which should have write access as the target cluster where the data is to be copied.

Source cluster should have webhdfs enabled, by which the data will be accessed.

So on the source cluster dont schedule the feed, if the user does not have write access which is required for retention.

Hope this helps.