If I have a lexicographical sorted list of Java Strings [s1,s2,s3,s4, ...., sn]
, and then convert each String into a byte array using UTF-8 encoding bx = sx.getBytes("UTF-8")
, is the list of byte arrays [b1,b2,b3,...bn]
also lexicographical sorted?
Does Java String.getBytes("UTF-8") preserve lexicograhpical order?
7.8k Views Asked by Carsten At
2
There are 2 best solutions below
1

You get a list/array of objects X, in a given orden.
You create a new list/array Y of such objects, applying a method.
Y will have the ordering that you created it with (normally you will have just kept X order). No reordering happens.
Also, lexycographical ordering for a byte[] is meaningless.
Yes. According to RFC 3239:
As Ian Roberts pointed out, this applies for "true UTF-8 (such as
String.getBytes
will give you)", but beware ofDataInputStream
's fake UTF-8, which will sort [U+000000] after [U+000001] and [U+00F000] after [U+10FFFF].