Filters in HBase: Designed to filter data row-wise, or column-wise, or both?

Question

Filters in HBase: Designed to filter data row-wise, or column-wise, or both?

722 Views Asked by eriophora At 16 June 2025 at 02:24

I've been confounded by how filters work in HBase (or, largely equivalently, in HappyBase--which I use to interact with HBase). The source of my confusion is that I can't seem to get a handle on what filters do.

Some filters, like SingleColumnValueFilter, cause rows not to be emitted based on the value of one of their columns. This makes sense--in my mind, this is what filters should be for. However, other filters, like FirstKeyOnlyFilter, appear not to filter in the row-wise sense, but rather filter the data that is surfaced to the requester--i.e., they filter columnwise, like the columns argument. Not only this, but they appear to affect whether or not other filters get access to data.

Perhaps I'm just using them wrong. But, to me, a "filter" should remove items based on the output that operates on their properties, like "Find me all people over 7 feet tall!" But the behavior of FirstKeyOnlyFilter, at least in HBase, seems to be more akin to "Bring me everyones left Ear and nothing else!" Further, if I have a filter like:

SingleColumnValueFilter('body', 'height', =, 'regexstring:^over7ft$') AND FirstKeyOnlyFilter, FirstKeyOnlyFilter appears to restrict the first filter from accessing the column family:column "body:height".

What is with this design choice? The filter above looks like it's saying, "Bright me the name of everyone exactly 7 feet tall!" but instead it's saying something more like "Bright me every name if the name is 7 feet tall!." The first key of a row doesn't have columns any more than names can be said to have a 'height.'

What am I doing wrong? Is this a peculiarity of HappyBase or is it the same in HBase proper?

Original Q&A

There are 1 best solutions below

**Martin Serrano** · Answer 1

Filters match on both on the columns available in each row.

As you have noticed some HBase filters restrict the columns that are returned to the client. This is an intentional design choice to reduce the amount of memory and network resources used during the client call.

Recall that HBase is really a rowkey mapping to a series of key-value pairs (the key in the key-value is referred to as the column qualifier). They are not strictly a set in that underlying data abstraction is really a rowkey+columnQualifier to value (a Cell). Filters work at the Cell level. This is also why column qualifiers are recommended to be short since they are actually stored with every row/value.

The FirstKeyOnlyFilter is designed to return as little data as possible, while maintaining the knowledge that a rowkey did exist with some key-value mapping. It could be any key-value mapping that is returned.

Alternatively, you can use the KeyOnlyFilter instead of the FirstKeyOnlyFilter which will null out the values associated with each column that is returned. This should give you the capability to match as needed while minimizing the data returned.

Filters in HBase: Designed to filter data row-wise, or column-wise, or both?

There are 1 best solutions below

Related Questions in FILTER

Related Questions in HBASE

Related Questions in HAPPYBASE

Trending Questions

Popular # Hahtags

Popular Questions