Hive case-insensitive alphabetical sorting

1.7k Views Asked by At

When I have an "order by" clause inside of a hive query, for example:

SELECT *
FROM categories
ORDER BY category_name

The results will be sorted as all the capital letters first and then all the lower letters. I need some table constraint or configuration to enforce the below behavior. A session sorting with UPPER/LOWER won't help.

Current results:

AAA
KKK
ZZZ
aaa
bbb
yyy

Expected results:

aaa
AAA
bbb
KKK
yyy
ZZZ

Is there any configuration which enforces hive to sort the data Alphabetical sorting first?

Within sql it's a collation. Within oracle it's LTS.

What is the right configuration for this kind of expected sorting results, and where to set it?

2

There are 2 best solutions below

1
On

How about just using lower()?

SELECT *
FROM categories
ORDER BY LOWER(category_name);

Note: this will be arbitrary about the case of the result. Because lower-case letters come after upper case in all modern collations, you could do:

SELECT c.*
FROM categories c
ORDER BY LOWER(c.category_name), c.category_name DESC;
0
On

In order to implement the alphabetical sorting or any kind of sorting you can use cluster by in your query.

SELECT *
FROM categories
cluster BY LOWER(category_name);

You can alternatively use the distribute by with sort by option for more customized solution. SELECT * FROM categories DISTRIBUTE BY LOWER(category_name) SORT BY LOWER(category_name) DESC