It looks like we are limited to four different data types when it comes down to sorting the columns in a row in a Cassandra table. The four types I can see are:
BytesType, AsciiType, UTF8Type, IntegerType
However, to sort properly in a given language, one uses strcoll()
, which makes use of the locale and ends up sorting certain characters before or after others depending on the language.
For example, in the French language you have accents on the e character that are sorted as following:
... d e é ê è ë f ...
I would imagine that the UTF8Type
is not going to make that function work as expected for a French speaker.
Is the only way to get that to work, to actually implement our own sort in Cassandra? (Argh, I don't like Java...)
You can always set the locale to a constant one so you always get the same results. Alternatively, you could sort it by Unicode number, not Java's locale-aware algorithm.