Keep top N elements of an array by group

269 Views Asked by At

I am using proprietary language in Blaze Advisor (rule enginge). I am looking for an algorithm how to keep only top N items in array by groups formed by specific attribute. As an example there are two arrays:

parrent[0].id = 1
parrent[1].id = 2

And second array:

child[0].parrentId = 1
child[0].result = 3.0
child[1].parrentId = 1
child[1].result = 2.0
child[2].parrentId = 1
child[2].result = 4.0
child[3].parrentId = 1
child[3].result = 6.0
child[4].parrentId = 1
child[4].result = -1.0
child[5].parrentId = 2
child[5].result = 1.0
child[6].parrentId = 2
child[6].result = 16.0
child[7].parrentId = 2
child[7].result = 2.0
child[8].parrentId = 2
child[8].result = -10.0
child[9].parrentId = 2
child[9].result = 5.0

I would like to keep only top three elements for each parrentId in child array as indicated by result attribute. In my language I can do all the basic operations - I can use if/else, while, for, for each constructs, and create new variables. I can sort array asc/desc and get indices of sorted elements. I can remove elements of an array.

For my data I need the following result:

child[0].parrentId = 1
child[0].result = 3.0
child[1].parrentId = 1
child[2].result = 4.0
child[3].parrentId = 1
child[3].result = 6.0
child[6].parrentId = 2
child[6].result = 16.0
child[7].parrentId = 2
child[7].result = 2.0
child[9].parrentId = 2
child[9].result = 5.0
2

There are 2 best solutions below

1
On BEST ANSWER

With the auxiliary class: enter image description here

And the function: enter image description here

That has the code:

len is an integer initially top.children.count - 1;
idx is an integer initially len;
while idx > atIdx do {
    top.children[idx] = top.children[idx-1];
    decrement idx;
}
top.children[atIdx] = child;

This code can do what you asked for:

child is an fixed array of 10 Child;

counter is an integer initially 0;
while counter < 10 do { child[counter] = a Child; increment counter }

child[0].parrentId = 1;
child[0].result = 3.0;
child[1].parrentId = 1;
child[1].result = 2.0;
child[2].parrentId = 1;
child[2].result = 4.0;
child[3].parrentId = 1;
child[3].result = 6.0;
child[4].parrentId = 1;
child[4].result = -1.0;
child[5].parrentId = 2;
child[5].result = 1.0;
child[6].parrentId = 2;
child[6].result = 16.0;
child[7].parrentId = 2;
child[7].result = 2.0;
child[8].parrentId = 2;
child[8].result = -10.0;
child[9].parrentId = 2;
child[9].result = 5.0;

groups is an array of real;

topN is an integer initially 4;

//Init the hashmap of [group] -> [array of 'N' top Child]
top3fromGroup is an association from real to TopChildren;
for each Child in child do if not groups.contains(it.parrentId) then { 
    top3fromGroup[it.parrentId] = a TopChildren;
    initCounter is an integer initially 0;
    while initCounter < topN do {
        top3fromGroup[it.parrentId].children[initCounter] = a Child initially { it.result = Double.MIN_VALUE;} 
        increment initCounter;
    }
    groups.append(it.parrentId);
}

//Filling the groups at the hashmap with the Child elements ordered inside its groups
for each real in groups do { 
    group is a real initially it;
    for each Child in child do {
        localChild is some Child initially it;
        if it.parrentId = group then {
            top is some TopChildren initially top3fromGroup[group]; 
            topValuesIdx is an integer initially 0;
            while topValuesIdx < top.children.count do {
                topChild is some Child initially top.children[topValuesIdx];
                if localChild.result > topChild.result then { 
                    insertAt(topValuesIdx, localChild, top);
                    topValuesIdx = top.children.count;
                } 
                increment topValuesIdx;
            }
        }
    }
}

//Printing the hashmap
for each real in groups do {
    group is a real initially it;
    print("Group: " group);
    childIdx is an integer initially 0;
    for each Child in top3fromGroup[it].children do {
        print("\tchild["childIdx"].parrentId = " it.parrentId); 
        print("\tchild["childIdx"].result = " it.result);
        increment childIdx;
    }
}

The output on the Eclipse/Blaze console would be:

Group: 1.0
    child[0].parrentId = 1.0
    child[0].result = 6.0
    child[1].parrentId = 1.0
    child[1].result = 4.0
    child[2].parrentId = 1.0
    child[2].result = 3.0
    child[3].parrentId = 1.0
    child[3].result = 2.0
Group: 2.0
    child[0].parrentId = 2.0
    child[0].result = 16.0
    child[1].parrentId = 2.0
    child[1].result = 5.0
    child[2].parrentId = 2.0
    child[2].result = 2.0
    child[3].parrentId = 2.0
    child[3].result = 1.0

Execution complete.

I know this is a very simple solution and not the optimal one.

0
On

Keeping the top N elements of an array can be done using a selection algorithm. The fact that you have the data grouped should not add much complexity to the problem, just ignore the elements that are not in the group.

For example, if your use a partial sorting you can just partially sort the elements in your group. You won't know the index of the N'th entry ahead of time like in the standard partial sort but you can modify the outer loop to iterate through all the entries (and not just the first N) but keep track of how many you have partially sorted already and break when done.

By the way, since you state in your question that you can sort the array in your proprietary language then if it is not too inefficient you cold just sort the entire array and then loop over the entire array until you have found the top N items for all the groups of interest. It's brute force but if you don't have performance issues then it's probably the easiest answer.