Why does NFKC normalization lose superscript & subscript info?

167 Views Asked by At

I notice that when normalizing a Unicode string to NFKC form, superscript characters like ¹ (U+00B9), ² (U+00B2), ³ (U+00B3), etc are converted to the corresponding ASCII digit (ex. 1, 2, 3, etc).

Does anyone know the rationale for this behavior? It seems like it's losing information in the process. For example, a superscript number usually has some contextual meaning.

0

There are 0 best solutions below