How to use HDFStore.select screen data

139 Views Asked by At

Question:
1、how to select the rows(Pseudo code) : columns['Name']='Name_A' (Name_A just a example) & columns['time'] isin (2021-11-21 00:00:00,2021-11-22 00:00:00) .

I have store about 4 billion rows data to a hdf5 file.
Now, I want to select some data.
My code like this:

import pandas as pd
ss = pd.HDFStore("xh_data_L9.hdf5")   #<class 'pandas.io.pytables.HDFStore'> 
print(type(ss))
print(ss.keys())
s_1 = ss.select('alldata',start=0,stop=500) # data example 
print(s_1)
ss.close()

I found HDFStore.select usage like this:

HDFStore.select(key, where=None, start=None, stop=None, columns=None, iterator=False, chunksize=None, auto_close=False)
# can not run success.
s_3 = ss.select('alldata',where="Time>2021-11-21 00:00:00 & Time<2021-11-22 00:00:00)")  
s_3 = ss.select('alldata',['Name'] == 'Name_A')  

I have google some method,but don't how to use "where"

code and result

1

There are 1 best solutions below

0
Jeffrey On

I found that the reason was whether the data_columns was established when the file was created.

#this method created hdf5 don't have data_columns
ss.append('store',df_temp,index=True)   

#this method created hdf5  have data_columns
store.append("store", df_temp, format="table", data_columns=True) 

#query whether include data_columns
import pandas as pd
ss = pd.HDFStore("store.hdf5")
print(ss.info())

if the result include " dc->[Time,Name,Value]".

ss.select("store",where="Name='Name_A'")
#Single quotation marks are required before and after the varies.

The following is the official website explanation for data_columns:

data_columns :   
list of columns, or True, default None  
List of columns to create as indexed data columns for on-disk  
queries, or True to use all columns. By default only the axes of the object are indexed. 
See here <https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#query-via-data-columns>.