Why not all files are included in the resampled dataframe?

33 Views Asked by At

I want to resample one min stock data into 1 hour.

Here is my code(my orginal coded is in chinese):

import pandas as pd
    import glob

    path = 'C:/Users/Desktop/fut_data_1min_2023'
    all_files = glob.glob(path + "/*.csv")

    all_files
here is my result:
['C:/Users/Desktop/fut_data_1min_2023\\a2301.csv',
 'C:/Users/Desktop/fut_data_1min_2023\\a2303.csv',
 'C:/Users/Desktop/fut_data_1min_2023\\a2305.csv',
 'C:/Users/Desktop/fut_data_1min_2023\\a2307.csv',
 'C:/Users/Desktop/fut_data_1min_2023\\a2309.csv',
 'C:/Users/Desktop/fut_data_1min_2023\\a2311.csv',
 'C:/Users/Desktop/fut_data_1min_2023\\a2401.csv',
 'C:/Users/Desktop/fut_data_1min_2023\\a2403.csv',
 'C:/Users/Desktop/fut_data_1min_2023\\b2312.csv',
...
 'C:/Users/Desktop/fut_data_1min_2023\\zn2403.csv',
 'C:/Users/Desktop/fut_data_1min_2023\\zn2404.csv',
 'C:/Users/Desktop/fut_data_1min_2023\\zn2405.csv',
 'C:/Users/Desktop/fut_data_1min_2023\\zn2406.csv',
 'C:/Users/Desktop/fut_data_1min_2023\\zn2407.csv']

I created a dateframe to included all csv files.

li = [pd.read_csv(filename, index_col=None, header=0, encoding='gbk') for filename in all_files]
    df = pd.concat(li, axis=0, ignore_index=True)
    df['时间'] = pd.to_datetime(df['时间']) #date

    df.set_index('时间', inplace=True)                 
    aggregation_rules = {
        '市场代码': 'first', #Market Code
        '合约代码': 'first', #Contract Code
        '开': 'first', #open
        '高': 'max',#high
        '低': 'min',#low
        '收': 'last',#close
        '成交量': 'sum',#volume
        '成交额': 'sum',#amount
        '持仓量': 'sum'#Open Interest
    }
    #1 hour resample
    df_resampled_1hr = df.resample('H').agg(aggregation_rules)
    print(df_resampled_1hr)
**here is my result:**
                           市场代码   合约代码       开         高       低        收  \
时间                                                                    
2023-01-03 09:00:00    DC  a2301  5258.0  235100.0  409.08  23375.0   
2023-01-03 10:00:00    DC  a2301  5240.0  233490.0  409.52  22880.0   
2023-01-03 11:00:00    DC  a2301  5258.0  232800.0  409.68  22900.0   
2023-01-03 12:00:00  None   None     NaN       NaN     NaN      NaN   
2023-01-03 13:00:00    DC  a2301  5258.0  232730.0  409.94  22950.0   
...                   ...    ...     ...       ...     ...      ...   
2023-07-31 11:00:00    DC  a2309  4931.0  233900.0  455.04  20340.0   
2023-07-31 12:00:00  None   None     NaN       NaN     NaN      NaN   
2023-07-31 13:00:00    DC  a2309  4928.0  234120.0  454.78  20405.0   
2023-07-31 14:00:00    DC  a2309  4951.0  234010.0  454.90  20160.0   
2023-07-31 15:00:00    DC  a2309  4967.0  233540.0  455.00  20160.0   

                           成交量           成交额         持仓量  
时间                                                        
2023-01-03 09:00:00  9624618.0  4.146605e+11  1848710138  
2023-01-03 10:00:00  4184041.0  1.710870e+11  1413715236  
2023-01-03 11:00:00  1750278.0  8.170628e+10   976859153  
2023-01-03 12:00:00        0.0  0.000000e+00           0  
2023-01-03 13:00:00  2119662.0  8.518207e+10   916939846  
...                        ...           ...         ...  
2023-07-31 11:00:00  2645954.0  9.241121e+10  1340286918  
2023-07-31 12:00:00        0.0  0.000000e+00           0  
2023-07-31 13:00:00  2960147.0  1.042305e+11  1253078394  
2023-07-31 14:00:00  4385137.0  1.701434e+11  2590938012  
2023-07-31 15:00:00   170406.0  6.481716e+09    42982844  

[5023 rows x 9 columns]

1 hour resample did not included all csv files, it stoped at a2309. I tried different aggregation_rule, did not work. So, I believe there is something wrong with my resample. But I cannot figure it out, please help!

0

There are 0 best solutions below