Using Apache Falcon to setup data replication accross clusters

164 Views Asked by Shay At 28 June 2025 at 05:27

We have been PoC-ing falcon for our data ingestion workflow. We have a requirement to use falcon to setup a replication between two clusters (feed replication, not mirroring). The problem I have is that the user ID on cluster A is difference from the ID in cluster B. Has anyone used falcon with this setup? I can't seem to find a way to get this to work.

1) I am setting up a replication from Cluster A => Cluster B 2) I am defining the falcon job on cluster A

At the time of the job setup it looks like I can only define one user ID that owns the job. How do I setup a job where the ID on cluster A is different from the ID in cluster B? Any help would be awesome!!

Original Q&A

There are 1 best solutions below

Sanjeev On 20 May 2016 at 14:12

Apache Falcon uses 'ACL owner', which should have write access as the target cluster where the data is to be copied.

Source cluster should have webhdfs enabled, by which the data will be accessed.

So on the source cluster dont schedule the feed, if the user does not have write access which is required for retention.

Hope this helps.

Using Apache Falcon to setup data replication accross clusters

There are 1 best solutions below

Related Questions in FALCON

Related Questions in BIGDATA

Trending Questions

Popular # Hahtags

Popular Questions