After installing jsonpickle on my machine
( pip install jsonpickle==1.4.1 --no-compile), I have noticed that the compilation of the pandas.py file in the ext subfolder is not always reproducible.
In the ext subfolder I executed the following bash code to compile all .py files to .pyc files:
python -m compileall -d somereldir --invalidation-mode checked-hash
this created a pandas.cpython-37.pyc file in the __pycache__ subdirectory.
In the __pycache__ subdirectory, I then executed:
xxd pandas.cpython-37.pyc > hex1.hex
If I do the abovementioned steps again and write the hexdump to hex2.hex, I noticed that there are two lines that do not match.
diff hex1.hex hex2.hex
288,289c288,289
< 000011f0: 0029 013e 0200 0000 723f 0000 00da 056e .).>....r?.....n
< 00001200: 616d 6573 7213 0000 0029 0372 3300 0000 amesr....).r3...
---
> 000011f0: 0029 013e 0200 0000 da05 6e61 6d65 7372 .).>......namesr
> 00001200: 3f00 0000 7213 0000 0029 0372 3300 0000 ?...r....).r3...
I performed it several times and it appears that there are two "versions" of .pyc file, sometimes they match, sometimes they don't.
Because of this, I have several questions:
- Why is there a difference in the
.pycfiles? - How can I make sure that the compiled
.pycfile is always the same. - I checked some other python libraries and all of them produced reproducible
.pycfiles, so what is different for thispandas.pyfile?
After splitting the
pandas.pyfile in smaller parts and compiling these, I was able to determine the location of the problem on line 135:which answers the questions:
{'name','names'}). The order of elements in a set is not necessarily preserved after compilation. Although dictionaries preserve insertion order as of Python 3.7, I could not find anything about order preservation of elements in sets for Python 3.7.