I am creating a Python package that needs certain data files in order to work. I've been looking for a way to include these data files with the package installation. I found a way using importlib.resources.files()
. However, I'm receiving an error when I try to decode the objects I am returned.
I've created a barebones example package. The package tree is as follows.
.
├── package
│ ├── __init__.py
│ ├── one.ppn
│ └── two.rhn
├── pyproject.toml
└── setup.py
1 directory, 5 files
The entire point of this example package is to be able to access one.ppn
and two.rhn
. This is done by identifying absolute file paths, and then savings them as constants to be imported. The code is located in __init__.py
.
# package.__init__.py
from importlib.resources import files
PACKAGE_DATA = files('package')
KEYWORD_PATH = PACKAGE_DATA.joinpath('one.ppn')
print(PACKAGE_DATA)
print(KEYWORD_PATH)
CONTEXT_PATH = PACKAGE_DATA.joinpath('two.rhn').read_text()
I have created an editable install (pip3 install -e ../Package
) in a seperate directory. If I then import package
, I receive the following output.
/home/millertime/Desktop/Package/package
/home/millertime/Desktop/Package/package/one.ppn
Traceback (most recent call last):
File "/home/millertime/Desktop/Test/test.py", line 1, in <module>
import package
File "/home/millertime/Desktop/Package/package/__init__.py", line 11, in <module>
CONTEXT_PATH = PACKAGE_DATA.joinpath('two.rhn').read_text()
File "/usr/lib/python3.9/pathlib.py", line 1256, in read_text
return f.read()
File "/usr/lib/python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 0: invalid continuation byte
You can see that importlib
is functioning perfectly at first, and has correctly identified the absolute file paths to my data files. However, when I try to decode to a str
that I can actually use, I receive a UnicodeDecodeError
.
I'm not sure if my pyproject.toml
file is relevant, so I'm going to include it here. The only part I could see contributing to the problem is [tool.setuptools.package-data]
.
# pyproject.toml
[build-system]
requires = ["setuptools>=61.0.0", "wheel"]
build-backend = "setuptools.build_meta"
[project]
name = "package"
version = "1.0.0"
[tool.setuptools]
packages = [
"package"
]
[tool.setuptools.package-data]
package = [
"one.ppn",
"two.rhn"
]
I researched other instances of this error and tried a couple of things to solve it.
- I attempted to create my own decoding method using a
with
statement and theread_bytes()
method of the object, but received the same error. - I saw that many of the errors were related to the encoder, and thought that maybe I was using the wrong one (
utf-8
). I installedchardet
to tell me what kind I should use, and received another error relating to being unable to decode due to an "invalid continuation byte".
It seems to me that this an internal problem with importlib
. I don't see how it could be related to my data file types, given it's just a string representing a file path, not the actual data of the file.
I am currently using Python 3.9.2 on a Raspberry Pi 4. Thanks in advance.