I want to save locally all the tick-by-tick data from a broker without overloading the RAM. The streaming starts on Sunday and ends the first hours of Saturday. Anytime during the week I want to be able to load the file from the hard disk and analyze the "so far" updates.
Sometimes it can have more than 1.000 or even 10.000 ticks during a second. I want to subscribe to 40 instruments (for example EURUSD is one).
So what is the best way in python to do so?
This is what I have tried:
Either pandas dataframe:
df = pd.DataFrame(columns=['Time_sent', 'Time_received', 'Instrument', 'Bid', 'Ofr'])
And then in any update, I concatenate the new row:
df = pd.concat([df, pd.DataFrame([[datetime.fromtimestamp(int(update.get_time()) /1000),
datetime.now(),
update.get_instrument(),
update.get_bid_value(),
update.get_ofr_value()]], columns=df.columns)], ignore_index=True)
And finally, I update the ticks:
df.to_pickle('ticks.pkl')
But it has two disadvantages: It overloads the RAM unnecessarily and in any update it overwrites the ticks.pkl file, so it needs much time if it is large and I cannot so easily access it any time I desire.
Either json:
I define a function:
def append_record(record):
with open('ticks.json', 'a') as f:
json.dump(record, f)
f.write(os.linesep)
And then, any time I put an update locally:
append_record({'Time_sent': update.get_time(),
'Time_received': datetime.now().timestamp() * 1000,
'Instrument': update.get_instrument(),
'Bid': get_bid_value(),
'Ofr': get_ofr_value()})
But it has two disadvantages: The file (ticks.json) is much larger than the ticks.pkl and I don't think it is the best and most up-to-date python method.
Any advice? Maybe I should use the SQL database (SQLite) or h5? What is the best way to do what I want? In any case I should not reload/reread the saved file, append it and then again save it. It should be something smarter.
A database setup would be ideal for this scenario: SQL, SQLite, etc. However, if database storage is not viable then a different file format like CSV may be better, this can be opened in an append only mode, saving the entire file from being rewritten every tick. CSV has options for appending data efficiently, and can be imported to popular spread sheeting software for viewing.
For simplicity, CSV seems to be the easiest to implement and use. However, for scalability SQLite is a far better choice, allowing more complex querying and optimised data storage and retrieval for large data sets.