Guide for efficient usage of HDFStore

452 Views Asked by At

In my data processing application, I have around 80% of the processing time just spend in the function pandas.HDFStore.put. Although there are various SO questions around similar problems, I haven't found any definite guide on how to use HDFStore in the most efficient way.

What options do I have to reduce the writing-time?

My data consists only out of float64 columns and some spare int-columns, it may contain duplicated indices and/or column names and it is a-priori not sorted. It will be data collected over decades (sec-min resolution), so the solution should be scalable.

My Basic usage cases are the following:

# 1. Store creation
store = pd.HDFStore(pro['hdf_path'], complevel=7,
                    complib='blosc', fletcher32=True)

# 2. Iterative addition of new data
store.put('/table/T1', data, format='table', data_columns=True,
          append=True, index=False)

# 3. Basic queries of certain columns (I only need 'index' in 'where')
store.select('/table/T1', columns=['A', 'B', ...],
             where='index>="{}" & index<{sign}"{}"'.format(_t1, _t2))

# 4. Retrieving a tree with all tables and all column
#    names in that table (without loading it)
for path, groups, leaves in store.walk():
    ...
    for lv in sorted(leaves):
       _item_path = '/'.join([path, lv])
       columns = store.get_node('{}/table'.format(_item_path)).description._v_names

Concretely I would be interested in how to change the following parameters to optimize the writing time:

  • 'complib', 'complevel' in store creation
  • making the indexing more efficient (maybe calling create_table_index only at the end?)
  • parameters of store.put/store.append
  • I read something about index levels, like ('medium', 6), would that have an influence?
  • Could I reduce the index from datetime (which is stored as Int64 then) to something more efficient, by dropping e.g. miliseconds?

(Reading is not so much a problem as store.select with where=... is quite efficient.)

Thank you for any help, it is very much appreciated!

0

There are 0 best solutions below