create or use a function to find the repeated sequence of items in a list

143 Views Asked by At

A function that takes a list/array and finds the repeated sequence of numbers.

Example

[111, 0, 3, 1, 111, 0, 3, 1, 111, 0, 3, 1]

[111, 0, 3, 1] is the block that is being repeated and is what I'm looking for.

[11, 34, 132, 54, 90, 430, 657, 689, 34, 90, 90, 90, 46, 34, 657, 689, 34, 90, 90, 90, 46, 34, 657, 689, 34, 90, 90, 90, 46, 34]

[657, 689, 34, 90, 90, 90, 46, 34] is the repeated sub list.

[569, 374, 879, 374, 879, 460, 568, 488, 460, 568, 488, 460, 568, 488, 750, 750]

This has 2 [374, 879]; [460, 568, 488].

[45, 98, 45, 98, 45]

[45, 98] is the block not [98, 45] as it's not at the start. Blocks are identified from the start and [98, 45] is excluded because it's beginning overlaps with [45, 98].

Some constraints of the dataset.

  • The entire list is not a repeating sequence
  • The repeating sequence will at least exist twice fully
  • It can appear in any position of the list (beginning/middle/end)
  • It may not overlap with itself or other sequences, if it does the largest one should be picked
  • More than one might exist
  • A block has at least two items
  • Smaller blocks may make up a larger block so the largest one should be prioritized [230, 205, 900, 617, 821, 188, 617, 821, 205, 900], [617, 821] is a block but not a valid one since it's inside [205, 900, 617, 821, 188, 617, 821, 205, 900]

Expected Result

The function should return it a convenient data structure such as list of list, or key value pair, or list of list indicating the start and end of each unique repeation from their initial position to their first end. Initial ordering should be maintained.

Attempt

def get_seq_group(seq):
    return [(key, list(group)) for key, group in itertools.groupby(seq)]

list_a = [111, 0, 3, 1, 111, 0, 3, 1, 111, 0, 3, 1]
list_b = [67, 4, 67, 4, 67, 4, 67, 4, 2, 9, 0]
list_c = [11, 34, 132, 54, 90, 430, 657, 689, 34, 90, 90, 90, 46, 34, 657, 689, 34, 90, 90, 90, 46, 34, 657, 689, 34,
          90, 90, 90, 46, 34]
list_d = [569, 374, 879, 374, 879, 460, 568, 488, 460, 568, 488, 460, 568, 488, 750, 750]

print(get_seq_group(list_a))
print(get_seq_group(list_b))
print(get_seq_group(list_c))
print(get_seq_group(list_d))

I attempted to group them but to no avail. It just returns a key value pair of each instance as adjacent numbers are not same. [(111, [111]), (0, [0]), (3, [3]), (1, [1]), (111, [111]), (0, [0]), (3, [3]), (1, [1]), (111, [111]), (0, [0]), (3, [3]), (1, [1])] Output for list_a.

I'm not aware of any algorithm addressing this case.

I tried converting things to string and joining them but going back to a list is impossible as all the digits merge. The solutions here didn't also work for me.

The explanation of any provided algorithm would also be nice. Functional and imperative solutions would be appreciated as well for contrasting.

1

There are 1 best solutions below

1
On

You can try to use re module (link to Regex101):

import re


testcases = [
    [111, 0, 3, 1, 111, 0, 3, 1, 111, 0, 3, 1],
    [11, 34, 132, 54, 90, 430, 657, 689, 34, 90, 90, 90, 46, 34, 657, 689, 34, 90, 90, 90, 46, 34, 657, 689, 34, 90, 90, 90, 46, 34],
    [569, 374, 879, 374, 879, 460, 568, 488, 460, 568, 488, 460, 568, 488, 750, 750],
    [45, 98, 45, 98, 45]
]


for t in testcases:
    s = " ".join(map(str, t)) + " "
    out = [
        [int(v) for v in r.split()] for r in re.findall(r"(\b(?:\d+\s+){2,}).*\1", s)
    ]
    print(t)
    print(out)
    print()

Prints:

[111, 0, 3, 1, 111, 0, 3, 1, 111, 0, 3, 1]
[[111, 0, 3, 1]]

[11, 34, 132, 54, 90, 430, 657, 689, 34, 90, 90, 90, 46, 34, 657, 689, 34, 90, 90, 90, 46, 34, 657, 689, 34, 90, 90, 90, 46, 34]
[[657, 689, 34, 90, 90, 90, 46, 34]]

[569, 374, 879, 374, 879, 460, 568, 488, 460, 568, 488, 460, 568, 488, 750, 750]
[[374, 879], [460, 568, 488]]

[45, 98, 45, 98, 45]
[[45, 98]]