How to dedupe numbers that are in sequence using Python

40 Views Asked by At

I am looking for code to accomplish following in Python (Snowflake solution would also work.)

column A (before transformation)
8->8->8->8->5->7
8->5->5->5->7->8->7->7
25->15->15->13->18
25->15->15->13->18->15

Need to dedupe numbers keeping the sequence intact. They are de-duped only if they are together.

column A (after transformation)
8->5->7
8->5->7->8->7
25->15->13->18->15

Thank you!

no idea how to do that.

1

There are 1 best solutions below

2
BitsAreNumbersToo On

You can remove consecutive duplicates by adding each item to a new list only if it doesn't match the prior item, or, as suggested by @juanpa.arrivillag in comments, you can use itertools to accomplish this in one line.

Here is an example:

# Copied from the examples in the question
a = [
    [8, 8, 8, 8, 5, 7],
    [8, 5, 5, 5, 7, 8, 7, 7],
    [25, 15, 15, 13, 18],
    [25, 15, 15, 13, 18, 15]
]
# create a new output list
b = []
# Process each list in a
for sublist in a:
    # Create an output to hold the new, reduced list
    b.append([])
    for item in sublist:
        # If this list is empty (first item) or if it's a new item, add it
        if not len(b[-1]) or b[-1][-1] != item:
            b[-1].append(item)
# Print the results to the terminal
for sublistb in b:
    print(sublistb)
print()

# As suggested by @juanpa.arrivillag
import itertools
c = [[item for item, _ in itertools.groupby(sublist)] for sublist in a]
for sublistc in c:
    print(sublistc)
print(f'Same: {c == b}')

When I run this I get the following output:

[8, 5, 7]
[8, 5, 7, 8, 7]
[25, 15, 13, 18]
[25, 15, 13, 18, 15]

[8, 5, 7]
[8, 5, 7, 8, 7]
[25, 15, 13, 18]
[25, 15, 13, 18, 15]
Same: True

Let me know if you have any questions.

EDIT: Added the itertools oneliner. Shoutout to @juanpa.arrivillag for the excellent one-liner.