Python importlib.resources.files() throws UnicodeDecodeError: invalid continuation byte

60 Views Asked by At

I am creating a Python package that needs certain data files in order to work. I've been looking for a way to include these data files with the package installation. I found a way using importlib.resources.files(). However, I'm receiving an error when I try to decode the objects I am returned.

I've created a barebones example package. The package tree is as follows.

.
├── package
│   ├── __init__.py
│   ├── one.ppn
│   └── two.rhn
├── pyproject.toml
└── setup.py

1 directory, 5 files

The entire point of this example package is to be able to access one.ppn and two.rhn. This is done by identifying absolute file paths, and then savings them as constants to be imported. The code is located in __init__.py.

# package.__init__.py

from importlib.resources import files


PACKAGE_DATA = files('package')

KEYWORD_PATH = PACKAGE_DATA.joinpath('one.ppn')

print(PACKAGE_DATA)
print(KEYWORD_PATH)

CONTEXT_PATH = PACKAGE_DATA.joinpath('two.rhn').read_text()

I have created an editable install (pip3 install -e ../Package) in a seperate directory. If I then import package, I receive the following output.

/home/millertime/Desktop/Package/package
/home/millertime/Desktop/Package/package/one.ppn
Traceback (most recent call last):
  File "/home/millertime/Desktop/Test/test.py", line 1, in <module>
    import package
  File "/home/millertime/Desktop/Package/package/__init__.py", line 11, in <module>
    CONTEXT_PATH = PACKAGE_DATA.joinpath('two.rhn').read_text()
  File "/usr/lib/python3.9/pathlib.py", line 1256, in read_text
    return f.read()
  File "/usr/lib/python3.9/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 0: invalid continuation byte

You can see that importlib is functioning perfectly at first, and has correctly identified the absolute file paths to my data files. However, when I try to decode to a str that I can actually use, I receive a UnicodeDecodeError.

I'm not sure if my pyproject.toml file is relevant, so I'm going to include it here. The only part I could see contributing to the problem is [tool.setuptools.package-data].

# pyproject.toml

[build-system]
requires = ["setuptools>=61.0.0", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "package"
version = "1.0.0"

[tool.setuptools]
packages = [
    "package"
]

[tool.setuptools.package-data]
package = [
    "one.ppn",
    "two.rhn"
]

I researched other instances of this error and tried a couple of things to solve it.

  1. I attempted to create my own decoding method using a with statement and the read_bytes() method of the object, but received the same error.
  2. I saw that many of the errors were related to the encoder, and thought that maybe I was using the wrong one (utf-8). I installed chardet to tell me what kind I should use, and received another error relating to being unable to decode due to an "invalid continuation byte".

It seems to me that this an internal problem with importlib. I don't see how it could be related to my data file types, given it's just a string representing a file path, not the actual data of the file.

I am currently using Python 3.9.2 on a Raspberry Pi 4. Thanks in advance.

0

There are 0 best solutions below