I'm using odo from the blaze project to merge multiple pandas hdfstore tables following the suggestion in this question: Concatenate two big pandas.HDFStore HDF5 files
The stores have identical columns and non-overlapping indicies by design and a few million rows. The individual files may fit into memory but the total combined file probably will not.
Is there a way I can preserve the settings the hdfstore was created with? I loose the data columns and compression settings.
I tried odo(part, whole, datacolumns=['col1','col2'])
without luck.
Alternatively, any suggestions for alternative methods would be appreciated. I could of course do this manually but then I have to manage the chunksizing in order to not run out of memory.
odo
doesn't support propogation ofcompression
and/ordata_columns
ATM. Both are pretty easy to add, I created an issue hereYou can do this in
pandas
this way:Iterate over the input files. Chunk read/write to the final store. Note that you have to specify the
data_columns
here as well.