How to group two element tuples for a continuous range of first element with same second element?

93 Views Asked by At

I have a list of two element tuples, the first element of each tuple is an integer, these tuples are equivalent to key-value pairs, in fact these tuples are flattened dict_items, generated using list(d.items()).

The element in the first position is guaranteed to be unique (they are keys), and there are lots of key-value pairs where the value is the same and the keys are in a continuous range, meaning there are lots of consecutive key-value pairs where one key is equal to the previous key plus one.

I would like to group the pairs into triplets, where the first two elements are integers and are start and end of such consecutive pairs, and the third element is the value.

The logic is simple, if the input is [(0, 0), (1, 0), (2, 0)], the output should be [(0, 2, 0)], the first number is the start of the key range, and the second number is the end of the key range, the third is the value. 0, 1, 2 are consecutive integers.

Given [(0, 0), (1, 0), (2, 0), (3, 1), (4, 1)], the output should be [(0, 2, 0), (3, 4, 1)], consecutive keys with same values are grouped.

Given [(0, 0), (1, 0), (2, 0), (3, 1), (4, 1), (5, 2), (7, 2), (9, 2)], the output should be [(0, 2, 0), (3, 4, 1), (5, 5, 2), (7, 7, 2), (9, 9, 2)], because 5, 7, 9 are not consecutive integers, 5 + 1 != 7 and 7 + 1 != 9.

Input:

[(3, 0),
 (4, 0),
 (5, 0),
 (6, 2),
 (7, 2),
 (8, 2),
 (9, 2),
 (10, 2),
 (11, 2),
 (12, 2),
 (13, 1),
 (14, 1),
 (15, 3),
 (16, 3),
 (17, 3),
 (18, 3),
 (19, 3),
 (20, 3),
 (21, 3),
 (22, 3),
 (23, 3),
 (24, 3),
 (25, 3),
 (26, 3),
 (27, 1),
 (28, 1)]

Output:

[(3, 5, 0), (6, 12, 2), (13, 14, 1), (15, 26, 3), (27, 28, 1)]

My code gives the correct output but isn't efficient:

def group_numbers(numbers):
    l = len(numbers)
    i = 0
    output = []
    while i < l:
        di = 0
        curn, curv = numbers[i]
        while i != l and curn + di == numbers[i][0] and curv == numbers[i][1]:
            i += 1
            di += 1
        output.append((curn, numbers[i - 1][0], curv))
    return output

Code to generate test cases:

def make_test_case(num, lim, dat):
    numbers = {}

    for _ in range(num):
        start = random.randrange(lim)
        end = random.randrange(lim)
        if start > end:
            start, end = end, start
        x = random.randrange(dat)
        numbers |= {n: x for n in range(start, end + 1)}

    return sorted(numbers.items())

How to do this more efficiently, such as using itertools.groupby?

Answers are required to be verified against my correct approach.


Note that there can be gaps where there are no values in the input, I wanted to make the question short so I didn't include such a test case, but do know that such gaps should not be filled. The more efficient approach should produce the same output as mine.

A manual test case to demonstrate this:

In [35]: group_numbers([(0, 0), (1, 0), (2, 0), (3, 0), (10, 1), (11, 1), (12, 1), (13, 1)])
Out[35]: [(0, 3, 0), (10, 13, 1)]

Clarification for a test case suggested in the comments, the expected output is:

In [61]: group_numbers([(3, 0),  (5, 0),  (7, 0)])
Out[61]: [(3, 3, 0), (5, 5, 0), (7, 7, 0)]

The output for [(1, 0), (1, 1), (2, 0)] should be undefined, it should raise an exception if encountered. Inputs like this aren't valid inputs. As you can see from my code to generate the sample, all numbers can only have one value.

And output for [(1, 0), (3, 0), (5, 0)] is [(1, 1, 0), (3, 3, 0), (5, 5, 0)].


Edit

I am not a native English speaker, in fact I am bad with languages in general (though hopefully not programming languages), and I have no people skills (I really have no one to talk to), so my question may be confusing originally, I feared that if I made it long it would surely contain grammatical mistakes.

I have edited my question to include more details and explain things more thoroughly, to hopefully make the question less confusing.

2

There are 2 best solutions below

9
On BEST ANSWER

Considering all modifications, in order to guarantee a correct output by taking all conditions into considerations, you can use this solution:

def group_numbers(numbers):
    output = []

    curr_i, curr_j = numbers[0]
    
    start = curr_i
    
    last_ij = numbers[-1]
    last_ij = (last_ij[0], last_ij[1] + 1)
    
    for i, j in numbers[1:] + [last_ij]:
        if i - curr_i == 1 and j == curr_j:
            curr_i = i
        else:
            output.append((start, curr_i, curr_j))
            
            curr_j = j
            curr_i = i
            start = curr_i
    
    return output

OP test cases:

In [1]: group_numbers([(3, 0), (4, 0), (5, 0), (6, 2), (7, 2), (8, 2), (9, 2), (10, 2), (11, 2), (12, 2), (13, 1), (14, 1), (15, 3), (16, 3), (17, 3), (18, 3), (19, 3), (20, 3), (21, 3), (22, 3), (23, 3), (24, 3), (25, 3), (26, 3), (27, 1), (28, 1)])
Out[1]: [(3, 5, 0), (6, 12, 2), (13, 14, 1), (15, 26, 3), (27, 28, 1)]

In [2]: group_numbers([(0, 0), (1, 0), (2, 0), (3, 0), (10, 1), (11, 1), (12, 1), (13, 1)])
Out[2]: [(0, 3, 0), (10, 13, 1)]

In [3]: group_numbers([(3, 0), (5, 0), (7, 0)])
Out[3]: [(3, 3, 0), (5, 5, 0), (7, 7, 0)]

OLD ANSWER (using itertools and not considering the last case)

Using just itertools.groupby, you can do this:

from itertools import groupby
from operator import itemgetter

output = [((v := list(g))[0][0], v[-1][0], i) for i, g in groupby(numbers, itemgetter(1))]

Without operator.itemgetter it would be:

output = [((v := list(g))[0][0], v[-1][0], i) for i, g in groupby(numbers, lambda x: x[1])]

OP test cases (assuming the solution is defined as a function group_numbers):

In [1]: group_numbers([(3, 0), (4, 0), (5, 0), (6, 2), (7, 2), (8, 2), (9, 2), (10, 2), (11, 2), (12, 2), (13, 1), (14, 1), (15, 3), (16, 3), (17, 3), (18, 3), (19, 3), (20, 3), (21, 3), (22, 3), (23, 3), (24, 3), (25, 3), (26, 3), (27, 1), (28, 1)])
Out[1]: [(3, 5, 0), (6, 12, 2), (13, 14, 1), (15, 26, 3), (27, 28, 1)]

In [2]: group_numbers([(0, 0), (1, 0), (2, 0), (3, 0), (10, 1), (11, 1), (12, 1), (13, 1)])
Out[2]: [(0, 3, 0), (10, 13, 1)]

In [3]: group_numbers([(3, 0), (5, 0), (7, 0)])
Out[3]: [(3, 7, 0)]
1
On

With generator function which checks for a consecutive (ascending) range of first elements of tuples basing on arithmetic sum:

from itertools import groupby

def group_numbers(nums):
    for k, g in groupby(nums, key=lambda x: x[1]):
        g = list(i[0] for i in g)
        # check for consecutive ascending of numbers in group 
        if sum(g) == (len(g) * (g[0] + g[-1])) / 2:
            yield (g[0], g[-1], k)

print(list(group_numbers(nums)))

[(3, 5, 0), (6, 12, 2), (13, 14, 1), (15, 26, 3), (27, 28, 1)]