Can Cassandra sort columns for a specific human language?

143 Views Asked by At

It looks like we are limited to four different data types when it comes down to sorting the columns in a row in a Cassandra table. The four types I can see are:

BytesType, AsciiType, UTF8Type, IntegerType

However, to sort properly in a given language, one uses strcoll(), which makes use of the locale and ends up sorting certain characters before or after others depending on the language.

For example, in the French language you have accents on the e character that are sorted as following:

... d e é ê è ë f ...

I would imagine that the UTF8Type is not going to make that function work as expected for a French speaker.

Is the only way to get that to work, to actually implement our own sort in Cassandra? (Argh, I don't like Java...)

1

There are 1 best solutions below

3
On

You can always set the locale to a constant one so you always get the same results. Alternatively, you could sort it by Unicode number, not Java's locale-aware algorithm.