I notice that when normalizing a Unicode string to NFKC form, superscript characters like ¹
(U+00B9), ²
(U+00B2), ³
(U+00B3), etc are converted to the corresponding ASCII digit (ex. 1
, 2
, 3
, etc).
Does anyone know the rationale for this behavior? It seems like it's losing information in the process. For example, a superscript number usually has some contextual meaning.