Difference on importing csv data on Linux and MacOS
Hello everyone,
when importing a csv file with pandas.read_csv
using UTF8 (BOM) encoding under Linux, the first column name contains the BOM as cleartext, e.g. \\xEF\\xBB\\xBFColumnName
.
When i do the same under MacOS, everything is beautiful. Why does this happen?
I use python-3.10.12 and pandas-2.1.2.
It seems that it is not an issue with pandas but with printf behaviour when used in a makefile:
Given a file target.csv in UTF-8 (no-BOM) with contents:
and a Makefile containing a target like:
a
make target.csv
leads to a target.csv containing
as plain text and encoding is shown in editor (e.g. VSCode or VIM) as UTF-8.
But when directly issuing
printf '\xEF\xBB\xBF' | cat - target.csv~ > target.csv
in a bash prompt, file is correctly encoded as UTF-8 (BOM):This is correctly done on MacOS.
I will put this question in a new thread because it is not an issue with
pandas
orpython
but withmake
/printf
.