How to compare replication in Big Data

171 Views Asked by At

Problem statement -

Replications are usual tasks in industry and its equally important to verify the replication, if replicated database has same data as same as the original database.

Example -

I have database D1 and for testing purpose I am replicating the database D1 to database D2.

Post completion of replication, I want to validate if both databases are identical or not, which can be done using row level comparison, however its the worst solution for the big databases where the length of data could be in terabytes.

Could experts here provide the solution or any hint for such realtime challenges ?

1

There are 1 best solutions below

0
On

Could experts here provide the solution?

Each database solves the problem in a different way. The method which is used depends on the architecture of the database. Examples:

  • Cassandra architecture + a process resembling replication,
  • few "things" uses Merkle Trees. For example, as a replication can be considered "git clone" command. A new replica is created. Git architecture uses Markle trees to connect it's "internal files" so it's self-verifying solution. The same goes for Bitcoin blockchain,
  • when there is a need for "live replication" - or better - a distributed computing, more advanced solutions can be used like Paxos.

(...) any hint for such realtime challenges?

I'm not sure if you wanted to ask what the challenges are, so just in case: having one database D1 replicated to D2 is hard to compare because of the volume of data, but most importantly, because D1 is in a real-world scenario a "living" database, which is constantly changing.