Unicodedecode Error when pushing to CouchDB: 'utf8' codec can't decode byte 0xe9

163 Views Asked by At

I am using couchdb and couchapp on windows. I'm working on an ongoing project of a professor https://github.com/Hypertopic/Tire-a-part. I'm currently trying to set up the app on my computer.

When I do:

couchapp push http://127.0.0.1:5984/tire-a-part

I get an error:

Traceback (most recent call last):
File "couchapp\dispatch.pyc", line 48, in dispatch
File "couchapp\dispatch.pyc", line 92, in _dispatch
File "couchapp\commands.pyc", line 79, in push
File "couchapp\localdoc.pyc", line 123, in push
File "couchapp\client.pyc", line 294, in save_doc
File "json\__init__.pyc", line 231, in dumps
File "json\encoder.pyc", line 201, in encode
File "json\encoder.pyc", line 264, in iterencode
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 1: 
invalid continuation byte

My professor and my friends all have mac and don't have this problem. After a few hours on the net trying to search for similar problems I understand that it is an encoding error but i don't understand what is not correctly encoded and what should I do. Thanks

Edit: I have discovered the debug option of couchapp. It gives much more detail but i still don't really understand as it is my first time with couchapp and couchdb. This is the last part of the debug as i don't think the begining is important:

2018-04-14 12:42:16 [DEBUG] push spec/samples/scopus.bib
2018-04-14 12:42:16 [DEBUG] push spec/spec_helper.rb
2018-04-14 12:42:16 [DEBUG] Resource uri: http://127.0.0.1:5984/tire-a-part
2018-04-14 12:42:16 [DEBUG] Request: GET _design/Tire-a-part
2018-04-14 12:42:16 [DEBUG] Headers: {'Accept': 'application/json', 'User- 
Agent': 'couchapp/0.7.5'}
2018-04-14 12:42:16 [DEBUG] Params: {}
2018-04-14 12:42:16 [DEBUG] Start to perform request: GET 127.0.0.1:5984 
/tire-a-part/_design/Tire-a-part
2018-04-14 12:42:16 [DEBUG] Send headers: ['GET /tire-a-part/_design/Tire-a- 
part HTTP/1.1\r\n', 'Host: 127.0.0.1:5984\r\n', 'User-Agent: 
restkit/3.0.4\r\n', 'Accept-Encoding: identity\r\n', 'Accept: 
application/json\r\n']
2018-04-14 12:42:16 [DEBUG] Start to parse response
2018-04-14 12:42:16 [DEBUG] Got response: 404 Object Not Found
2018-04-14 12:42:16 [DEBUG] headers: [MultiDict([('X-CouchDB-Body- 
Time','0'),('X-Couch-Request-ID', '5ab9eee6cb'), ('Server', 'CouchDB/2.1.1 
(Erlang OTP/18)'), ('Date', 'Sat, 14 Apr 2018 10:42:16 GMT'), ('Content- 
Type','application/json'), ('Content-Length', '41'), ('Cache-Control', 
'must-revalidate')])]
2018-04-14 12:42:16 [DEBUG] return response class
2018-04-14 12:42:16 [DEBUG] release connection
2018-04-14 12:42:16 [DEBUG] C:\Users\jules\Desktop\LO10 projet\Tire-a- 
part\vendor don't exist
2018-04-14 12:42:16 [CRITICAL] 'utf8' codec can't decode byte 0xe9 in 
position 
1: invalid continuation byte

I compared this with what my friend got on mac and it is the exact same except for the [CRITICAL] line. after the 'vendordon't exist' couchapp put _design/Tire-a-part

2

There are 2 best solutions below

1
Megidd On

I don't have the answer, but I tried something: I started a Python3.5 command line, and declared a variable byte='\xe9' and then printed the variable with print(byte). As can be seen below, the 0xe9 byte looks like to be the é character:

$ python3.5
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> byte='\xe9'
>>> print(byte)
é
>>> 

I'm not sure why Windows has problem with the é character but macOS works fine.


On Linux shell command-line, when I put the é character in a file and take a hex-dump of the file, I see that the é character is actually equal to c3 a9, notice that the 0a is the new-line or line-feed:

$ echo 'é' > file
$ cat file 
é
$ hd file 
00000000  c3 a9 0a                                          |...|
00000003

Therefore, I think the problem is that the é character is encoded with one byte of 0xe9 rather than two bytes of 0xc3 0xa9.


I played around with Go to see where the e9 comes from and I notice that the Unicode for é is actually \u00e9 and it is corresponding the two bytes of \xc3\xa9 i.e. 0xc3 and 0xa9, as shown below. Therefore, on your Windows machine, somehow the Unicode is mixed up with the hexadecimal bytes.

enter image description here

0
user9642562 On

It really seem it was an encoding error of 'é'. There were 2 files with 'é' in their name. After changing it to 'e' the push command work. The app don't work but that's a story for another day....