Confused about Apache HBase basics?

143 Views Asked by At

I'm currently reading through Seven Databases in Seven Weeks and I've come across this statement.

HBase also makes strong consistency guarantees, making it easier to transition from relational databases for some use cases. Finally, HBase guarantees atomicity at the row level, which means that you can have strong consistency guarantees at a crucial level of HBase’s data model.

I'm having some trouble understanding it.

My shallow understanding is that Apache HBase is a distributed database, so it's like a Master-Slave sort of thing?

So, when you do a write, you first do it on the Master and then the Master copies over the writes to the slaves. The consistency guarantee is that all the slaves have the same the same values for their records? So, a high consistency guarantee means that they will all have the same values, where as a low consistency guarantee means that the master may have written changes to some of the slaves, but not all (so if you're reading values from one of the slaves, you might get different results based off which slave you read from)?

Is this correct so far?

So, with HBase... "guaranteeing atomicity at the row level" means a transaction will only be completed when the master has written to all the slaves? And that provides the high consistency?

Am I headed on the right track? If not, I'd really appreciate some clarification on what that paragraph means.

Thank you very much!

1

There are 1 best solutions below

0
On

If by 'master' you mean region/shard/partition master, then you are on the right track. Every row key is associated with exactly one Region (HBase terminology for shard), and every region is replicated across multiple servers/disks/racks/whatever. There is only one primary Region server (or 'master') that the client talks to, as per every row key.

So, with HBase... "guaranteeing atomicity at the row level" means a transaction will only be completed when the master has written to all the slaves? And that provides the high consistency?

No, consistency and atomicity are two different things. HBase provides atomicity on a row level, which means that when you write to a row, then the entire write operation is fully completed or not changed anything - there is no in between (partial update). This is not the case when you write to multiple rows in one command - some might chage and some might not, but no row will be partially updated or changed. Consistency (in this context) means that updates must first be acknowledged by the remote replicas, before the clients gets ok. This is done primarily via HDFS-based transaction log file. You may read on HBase WAL for more details.