I am facing memory issues while trying to unpack a large Msgpack dataset in Google Colab for training an ML model. Despite attempting to load it in chunks, it consumes all available RAM. I have included the code I used, which should work for everyone on Colab. I need assistance in optimizing the unpacking process, especially for large datasets. There is limited documentation on large data unpacking with Msgpack, and I am uncertain if it works better on a local machine. Any help to make it work on Colab and suggestions for saving the data to an SQLite3 database would be greatly appreciated. Thank you.
!gdown "https://drive.google.com/uc?id=1uF5ohoVHWRWprfq7zrtk0rg7-747zEUH"
!gzip -d course_42.msgpack.gz
import msgpack
import gc
gc.disable()
file_obj = open('course_42.msgpack', 'rb')
unpacker = msgpack.Unpacker(file_obj, raw=False)
for unpacked in unpacker:
print(unpacked)
break
file_obj.close()