I'm implementing a port, located at mariotoffia/goannoy, of Spotify/Annoy library to go. I've got problems in regards to sorting, especially partial sort (partial as in c++ std::partial_sort<>) performance.
Is there anyone that can point me to a direction (or even better a library), that implements partial sort in golang that do have good performance?
This is what, I've come up with:
The struct containing the pair to use when sorting (it is used in my custom priority queue)
type Pair[T constraints.Ordered, S constraints.Ordered] struct {
First T
Second S
}
func (p *Pair[T, S]) Less(other *Pair[T, S]) bool {
return p.First < other.First ||
(p.First == other.First && p.Second < other.Second)
}
type Pairs[T constraints.Ordered, S constraints.Ordered] []*Pair[T, S]
func (pq Pairs[_, _]) Len() int {
return len(pq)
}
func (pq Pairs[_, _]) Less(i, j int) bool {
return pq[i].First < pq[j].First ||
(pq[i].First == pq[j].First && pq[i].Second < pq[j].Second)
}
func (pq Pairs[_, _]) Swap(i, j int) {
pq[i], pq[j] = pq[j], pq[i]
}
func (pq *Pairs[TP, TV]) Push(x interface{}) {
*pq = append(*pq, x.(*Pair[TP, TV]))
}
func (pq *Pairs[_, _]) Pop() interface{} {
old := *pq
n := len(old)
item := old[n-1]
*pq = old[:n-1]
return item
}
func (pq *Pairs[TP, TV]) Top() *Pair[TP, TV] {
if len(*pq) == 0 {
return nil
}
return (*pq)[0]
}
This is the implementation of the actual partition sort:
func PartialSortSlice[TV interfaces.VectorType, TIX interfaces.IndexTypes](
s []*Pair[TV, TIX],
begin, middle, end int,
) {
if begin >= end || middle <= begin || middle > end {
return
}
// Find the N smallest elements
N := middle - begin
if end-begin > 20 && end-begin < 5000000 {
SortPairs(s)
return
}
for i := 0; i < N; i++ {
minIndex := begin + i
// Find the index of the smallest element in the unsorted part
for j := begin + i + 1; j < end; j++ {
if s[j].Less(s[minIndex]) {
minIndex = j
}
}
// Swap elements
if minIndex != begin+i {
s[begin+i], s[minIndex] = s[minIndex], s[begin+i]
}
}
// Sort sub-range [begin, middle)
if N > 15 {
SortPairs(s[begin:middle])
} else {
for i := begin + 1; i < middle; i++ {
for j := i; j > begin && s[j].Less(s[j-1]); j-- {
s[j], s[j-1] = s[j-1], s[j]
}
}
}
}
The limits for using the utils.SortPairs(...)
is done by performance benchmarks, iteratively and manually use the limits...
The utils.SortPairs(...)
uses the standar sort:
func SortPairs[TV interfaces.VectorType, TIX interfaces.IndexTypes](
pairs []*Pair[TV, TIX],
) {
sort.Slice(pairs, func(i, j int) bool {
return pairs[i].Less(pairs[j])
})
}
The other, I've came up is faster in certain situations, but it too slow when having lots of large arrays e.g. millions of elements (I'd guess hammering the go gc too hard).
func PartialSortSlice2[TV interfaces.VectorType, TIX interfaces.IndexTypes](
s []*Pair[TV, TIX],
begin, middle, end int,
) {
if begin >= end || middle <= begin || middle > end {
return
}
// Find the N smallest elements using a binary heap
N := middle - begin
h := pmnh[TV, TIX]{indices: make([]int, N), data: s}
for i := 0; i < N; i++ {
h.indices[i] = i + begin
}
heap.Init(&h)
for i := N; i < end-begin; i++ {
if s[begin+i].Less(s[h.indices[0]]) {
h.indices[0] = i + begin
heap.Fix(&h, 0)
}
}
// Swap elements
for i := 0; i < N; i++ {
s[begin+i], s[h.indices[i]-begin] = s[h.indices[i]-begin], s[begin+i]
}
// Sort sub-range [begin, middle) in place
for i := begin + 1; i < middle; i++ {
for j := i; j > begin && s[j].Less(s[j-1]); j-- {
s[j], s[j-1] = s[j-1], s[j]
}
}
}
(I'll probably incorporate the second one in the first one, when e.g. < 10000, it seems to perform better in those lower ranges...)
Is there any other solution, that does not allocate any memory (or very little/low frequent) and is really, really fast?
To see the poor performance in proof graph:
git clone https://github.com/mariotoffia/goannoy.git
cd goannoy
go run cmd/precision/main.go -file -items 1000000 -prec 1000 -cpu-profile
cat results.txt
( use the -mem-profile for memory)*
You need to clear about what you mean by a "partial" sort. For example, if
First
is afloat64
, does "partial" mean that you don't care about the precise order ifFirst
differs by less than some tolerance?If so, and the max/min values of
First
are reasonably bounded, you could simply create bins of similarFirst
values using a slice of slices of*Pair
. The first slice would have length:where
delta
is the tolerance you want. Then you simply append every*Pair
to the relevant slice using an index like:(you might need to make the slice longer by 1, you could
round
the result, and/or add some bounds checking).This groups the
Pair
into bins that haveFirst
values withindelta
. Each slice of similarPair
entries is unsorted. It makes the partial sortO(n)
instead ofO(n) * log(n)
.tl;dr: this is a bucket sort without the 2nd phase of sorting the subarrays.