Syntax Error: Non-UTF-8 code starting with \xe0 in file "...." but no encoding declared

9.3k Views Asked by At
# -*- coding: <utf-8> -*-
import re
conversiontable = { 'ॐ' : 'oṁ', 'ऀ' : 'ṁ', 'ँ' : 'ṃ', 'ं' : 'ṃ', 'ः' : 'ḥ', 'अ' : 'a', 'आ' : 'ā', 'इ' : 'i', 'ई' : 'ī', 'उ' : 'u', 'ऊ' : 'ū', 'ऋ' : 'r̥', 'ॠ' : ' r̥̄', 'ऌ' : 'l̥', 'ॡ' : ' l̥̄', 'ऍ' : 'ê', 'ऎ' : 'e', 'ए' : 'e', 'ऐ' : 'ai', 'ऑ' : 'ô', 'ऒ' : 'o', 'ओ' : 'o', 'औ' : 'au', 'ा' : 'ā', 'ि' : 'i', 'ी' : 'ī', 'ु' : 'u', 'ू' : 'ū', 'ृ' : 'r̥', 'ॄ' : ' r̥̄', 'ॢ' : 'l̥', 'ॣ' : ' l̥̄', 'ॅ' : 'ê', 'े' : 'e', 'ै' : 'ai', 'ॉ' : 'ô', 'ो' : 'o', 'ौ' : 'au', 'क़' : 'q', 'क' : 'k', 'ख़' : 'x', 'ख' : 'kh', 'ग़' : 'ġ', 'ग' : 'g', 'ॻ' : 'g', 'घ' : 'gh', 'ङ' : 'ṅ', 'च' : 'c', 'छ' : 'ch', 'ज़' : 'z', 'ज' : 'j', 'ॼ' : 'j', 'झ' : 'jh', 'ञ' : 'ñ', 'ट' : 'ṭ', 'ठ' : 'ṭh', 'ड़' : 'ṛ', 'ड' : 'ḍ', 'ॸ' : 'ḍ', 'ॾ' : 'd', 'ढ़' : 'ṛh', 'ढ' : 'ḍh', 'ण' : 'ṇ', 'त' : 't', 'थ' : 'th', 'द' : 'd', 'ध' : 'dh', 'न' : 'n', 'प' : 'p', 'फ़' : 'f', 'फ' : 'ph', 'ब' : 'b', 'ॿ' : 'b', 'भ' : 'bh', 'म' : 'm', 'य' : 'y', 'र' : 'r', 'ल' : 'l', 'ळ' : 'ḷ', 'व' : 'v', 'श' : 'ś', 'ष' : 'ṣ', 'स' : 's', 'ह' : 'h', 'ऽ' : '\'', '्' : '', '़' : '', '०' : '0', '१' : '1', '२' : '2', '३' : '3', '४' : '4', '५' : '5', '६' : '6', '७' : '7', '८' : '8', '९' : '9', 'ꣳ' : 'ṁ', '।' : '.', '॥' : '..', ' ' : ' ', }
consonants = '\u0915-\u0939\u0958-\u095F\u0978-\u097C\u097E-\u097F' 
vowelsigns = '\u093E-\u094C\u093A-\u093B\u094E-\u094F\u0955-\u0957' 
nukta = '\u093C' 
virama = '\u094D' 
devanagarichars = '\u0900-\u097F\u1CD0-\u1CFF\uA8E0-\uA8FF'

I have been trying to the above mapping to transliterate text from Latin to Devanagari. I am using VS Code and it is throwing me encoding error "Non-UTF-8 code starting with \xe0 in file "...." but no encoding declared". I tried putting utf-8 encoding line above the code but that didn't work.

Can anyone explain me why it is happening and what should I do to correct it?

1

There are 1 best solutions below

0
On

Remove the first line.

It may just confuse you and Python. It is not by writing that the file is UTF-8, that it happens.

You should check the encoding of the file. Check that VS uses UTF-8 for your file. Set the encoding there, and save the file. So now python will see the file as UTF-8.

If you write the first line, Python just think that the file is UTF-8, which it is not, and so the error. You are forcing Python to read the file as UTF-8.

If you remove the first line, you may have the same error (UTF-8 is the default, so if there is not strong suggestion of other encoding, it may still interpret it as UTF-8 and give you the same error, on the other hand, it may try to find the encoding).

But if you set UTF-8 as encoding of your file (in VS, in general in any code editor), then the file will be saved as UTF-8.

PS: If this is not a homework, you may want to read the chapter about Devaganari in Unicode standard: there are many more special cases that one should care of. And possibly a standard about transliteration (or just look for a library)