How to take druid segment data backup?

1k Views Asked by At

I am new to druid. In our application we use druid for timeseries data and this can go pretty large(10-20TBs). Druid provide you facility of deep storage. But if this deep storage crashes/or not reachable then it will result in data loss and which in turn affect the analytics the application is running. I am thinking of taking an incremental backup druid segment data to some secure location like ftp server. So if deep storage is unavailable, then they can restore the data from this ftp server.

Is there any tool/utility available in druid to incrementally backup/restore druid segment?

1

There are 1 best solutions below

2
On

In general it's important to take regular snapshots of the metadata storage as this is the "index" of what's in the Deep Storage. Maybe one snapshot per day, and store them for however long you like. It's good to store them for at least a couple of weeks, in case you need to roll back for some reason.

You also need to back up new segments in deep storage when they appear. It isn't important to take consistent snapshots, just to get every file eventually.

Also see https://groups.google.com/g/druid-user/c/itfKT5vaDl8

One other note as you mentioned data loss: Deep Storage is not queried directly - queries execute on the local segment cache in, for example, the Historical process. The Deep Storage is written to at ingestion time, so you might "lose" data that can't be ingested once it's available again, but you will continue to get analytics capability as the already-loaded data is on the historicals... Just a thought haha !

I hope that helps....?!?!