Why doesn't unicodedata recognise certain characters?

2.5k Views Asked by Hammerite At 03 July 2014 at 11:42

In Python 2.7 at least, unicodedata.name() doesn't recognise certain characters.

>>> from unicodedata import name
>>> name(u'\n')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: no such name
>>> name(u'a')
'LATIN SMALL LETTER A'

Certainly Unicode contains the character \n, and it has a name, specifically "LINE FEED".

NB. unicodedata.lookup('LINE FEED') and unicodedata.lookup(u'LINE FEED') both give a KeyError: undefined character name.

Original Q&A

There are 1 best solutions below

Martijn Pieters On 03 July 2014 at 12:06 BEST ANSWER

The unicodedata.name() lookup relies on column 2 of the UnicodeData.txt database in the standard (Python 2.7 uses Unicode 5.2.0).

If that name starts with < it is ignored. All control codes, including newlines, are in that category; the first column has no name other than <control>:

000A;<control>;Cc;0;B;;;;;N;LINE FEED (LF);;;;

Column 10 is the old, Unicode 1.0 name, and should not be used, according to the standard. In other words, \n has no name, other than the generic <control>, which the Python database ignores (as it is not unique).

Python 3.3 added support for NameAliases.txt, which lets you look up names by alias; so lookup('LINE FEED'), lookup('new line') or lookup('eol'), etc, all reference \n. However, the unicodedata.name() method does not support aliases, nor could it (which would it pick?):

Added support for Unicode name aliases and named sequences. Both unicodedata.lookup() and '\N{...}' now resolve name aliases, and unicodedata.lookup() resolves named sequences too.

TL;DR: LINE FEED is not the official name for \n, it is but an alias for it. Python 3.3 and up let you look up characters by alias.

Why doesn't unicodedata recognise certain characters?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in UNICODE

Related Questions in PYTHON-MODULE-UNICODEDATA

Trending Questions

Popular # Hahtags

Popular Questions