How to use GNU sort with numeric data in binary format?

2.5k Views Asked by At

Is there any way to use GNU Coreutils sort with 64bit numbers stored in binary file? If file wasn't binary then sort -n is the solution, but I didn't find any options to use it with binary data.

File is quite large (~100GB) and if it is possible I don't want to make its' text (non-binary) copy.

Sample of data:

$ xxd file 00292e0: 4036 1eb7 6888 d319 de6b 7402 9ca9 f116 @6..h....kt..... 00292f0: db68 7f05 199f 9d36 cf01 cb28 e49f 1116 .h.....6...(.... 0029300: 0c7c 8b55 2963 ef0c 277a f2b0 38d7 2b19 .|.U)c..'z..8.+. 0029310: c83b 2614 4327 d838 820c 1bb8 444f 1731 .;&.C'.8....DO.1 0029320: 1695 cab3 cd12 092a 0691 d7e4 5fcc b01d .......*...._... 0029330: b12b 7c1b a209 7c1c 568a 125c 541c d334 .+|...|.V..\T..4 0029340: 09a3 ecbc 8370 e205 9265 7759 a378 4e2f .....p...ewY.xN/

2

There are 2 best solutions below

0
On

sort(1) will not help you here. For a small file it could be possible to split your file into lines and feed it to sort(1), but not for 100G file of course.

The answer to this question on Serverfault has a link of the tool written for solving exactly your task. You can check the github project there (it seems to be written in Go so you will need to install a compiler if you decide to use it).

Quick googling does not find any other popular tool for this task written on some more popular language (and it surprises me a bit as the task itself is just a merge sort that thousands of students implement each year on their CS courses, but that's an off-topic).

2
On

The bsort utility does this.

It is a lightning fast inplace radix sort written in C. One of the test cases for its development was a 100Gb file on a machine with 16Gb ram - took about 22 seconds or so to sort.