Why is the hexdump of my Unicode text file different from the byte sequence I manually entered?

1.7k Views Asked by jollyroger At 27 July 2025 at 18:03

Why does the following lead to such a different byte sequence in the hexdump?

$ echo -e "\u0f67\u0fb9\u0fa8\u0fb3\u0fba\u0fbc\u0fbb\u0f83\u0f0b" > uni
$ hexdump uni
0000000 bde0 e0a7 b9be bee0 e0a8 b3be bee0 e0ba
0000010 bcbe bee0 e0bb 83be bce0 0a8b
000001c

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE=C
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Locale is correctly set to: en_US.UTF-8 and the actual unicode output is correct: ཧྐྵྨླྺྼྻྃ་

Original Q&A

There are 1 best solutions below

jollyroger On 13 December 2013 at 17:23

My misconception stems from believing that the characters I echoed were utf8, when they are in fact utf16. When looking up the first character, the utf8 is displayed as

 e0 bd a7

Which should be in big endian. So to change the endianess, hexdump can be run with the -C parameter.

Why is the hexdump of my Unicode text file different from the byte sequence I manually entered?

There are 1 best solutions below

Related Questions in UNICODE

Related Questions in HEXDUMP

Trending Questions

Popular # Hahtags

Popular Questions