I'm attempting to achieve zero-copy sharing of a Pandas DataFrame between processes launched from seperate console sessions. Please consider the following two Python files:
producer.py:
import pandas as pd
import numpy as np
import pickle
df = pd.DataFrame({'text': ['a','b','c'], 'ints':[1,2,3], 'floats': [1.0,2,3]})
print(df)
# text ints floats
# 0 a 1 1.0
# 1 b 2 2.0
# 2 c 3 3.0
# prints as expected!
buffers = []
with open("my_df.pickle", "wb") as f:
pickle.dump(df, protocol=5, buffer_callback=buffers.append, file=f)
for b in buffers:
print(len(b.raw()))
# 24
# 24
# only prints 2 buffers! Expected 3 buffers (1 for each column)
subsequently I run from another console consumer.py:
import pandas as pd
import numpy as np
import pickle
buffers = [pickle.PickleBuffer(bytes(24)), pickle.PickleBuffer(bytes(24))]
f = open("my_df.pickle", "rb")
df = pickle.load(f, buffers=buffers)
print(df)
# text ints floats
# 0 a 0 0.0
# 1 b 0 0.0
# 2 c 0 0.0
# Unexpected output. Numerical values are zero'd. And only 1 out of 3 columns ('text') is correctly populated.
It seems that the 2 PickleBuffers are for the numerical columns only, yet they are not brought across correctly, whilst the text column is!
(Obviously the intention is to bring across the full DataFrame correctly.)
Any advice most welcome!