HBase slower in RAMdisk

420 Views Asked by At

I have a general question about using Apache HBase with a RAMdisk. There is a big collection of data in a single table, about 25GB in total. With this data I am doing some basic aggregations, using a Java program.

As I have enough RAM avaiable I tried to put this data set into a RAMdisk using tmpfs:

mount -t tmpfs -o size=40G none /home/user/ramdisk

Then I stopped HBase, copied the content of the data folder into the RAMdisk. Finally I created a symbolic link, linking the old data directory to the new one and started HBase again.

It works, but when I process the aggregations now, It became slightly slower than before.

I could image of not having that much impact of using a RAMdisk, if HBase compresses the data (Snappy-compression is activated) and so on... but I can't guess why a faster medium would lead to a slower access of the data. There is enough available RAM left such that this cannot be the bottleneck.

Maybe someone has a general idea or insight about this?

1

There are 1 best solutions below

0
On

I think it's going to be one of two things: A: Do you really have more than 40G of free ram before allocating the disk ? I'm impressed & all if you actually had that much free, but seeing ram free afterwards isn't an indicator that you didn't just use a big chunk of swap.

B: compression (even something fast like snappy) is going to hurt performance... particularly for something like a database engine that has a lot of wacky optimization in it. You're right that a ramdisk should be ludicrously faster, but it having to jump all over your database queries, and then having to jump all over the compressed image to decompress chunks, has to have a pretty big overhead.