Python - Group by any matching value

646 Views Asked by At

I have a data structure like this

[
  (123, 321),
  (123, 456),
  (999, 654),
  (111, 456),
  (999, 898),
  (111, 654),
  (481, 739),
]

How can I group the tuples together by any matching element? i.e. to get this result (order doesn't matter)

[
  [(123, 321), (123, 456), (111, 456), (111, 654), (999, 654), (999, 898)],
  [(481, 739)],
]

Here's another example:

input = 
  [
    (123, 321),
    (123, 456),
    (111, 456),
    (999, 898),
    (481, 898),
    (481, 549),
  ]

output = 
  [
    [(123, 321), (123, 456), (111, 456)],
    [(999, 898), (481, 898), (481, 549)],
  ]
2

There are 2 best solutions below

1
Prins On

If I have understood the question correctly then you are trying to generate a list that contains two lists as follows:

  • The first list contains all those tuples where a value in the tuple occurs in any other tuple.
  • The second list contains all those tuples where the values of the tuple do not occur in any other tuple.

If the above is the case then effectively which are looking to do is determine which values occur more than once and use this information to generate your lists.

One approach is to use Python's Counter class to count up the occurrences of each value. You can do this as follows:

from collections import Counter
occurrences = Counter(e[0] for e in data) + Counter(e[1] for e in data)

This will give you the following:

Counter({123: 2, 999: 2, 111: 2, 456: 2, 654: 2, 481: 1, 321: 1, 898: 1, 739: 1})

Next, you can iterate through your list of tuples and separate them based on whether the values in the tuple occur more than once or not:

grouped = [[], []]

for entry in data:
  if occurrences[entry[0]] > 1 or occurrences[entry[1]] > 1:
    grouped[0].append(entry)
  else:
    grouped[1].append(entry)

This will then give you the desired result:

[
  [(123, 321), (123, 456), (999, 654), (111, 456), (999, 898), (111, 654)],
  [(481, 739)]
]
0
Prins On

Following further clarification, the problem can be solved as follows:

  • Step 1: Start by preparing your data by storing it in a list containing lists of tuples. Each inner list only contains a single tuple. This is the initial case i.e. no groupings.
  • Step 2: Make a first pass and group the data. This will result in your initial groupings.
  • Step 3: Keep passing through the data and combining groups until these can no longer be combined.

We can implement these steps as follows:

Step 1: We will use a list comprehension to prepare the data.

# prepare the data by turning it into a list of lists containing tuples
prepared = [[tup] for tup in data]

Step 2: We will create a group function to generate the groups.

def group(entries):
  groups = []
  
  for entry in entries:
    i = existing_group(entry, groups)
    
    # add to existing group or create a new group
    if i == -1:
        groups.append(entry)
    else:
      groups[i] = groups[i] + entry
      
  return groups

The group function simply iterates each entry in our data and checks if it belongs in an existing or new group. We can write an existing_group function to help determine this.

def existing_group(entry, groups):
  for i, group in enumerate(groups):
    
    # flatten
    group_values = sum(group, ())
    entry_values = sum(entry, ())
    
    # find one or more matching values
    if any(value in group_values for value in entry_values):
      return i
      
  return -1

Thus, we can call our group function as follows:

# intiai grouping of data
groups = group(prepared)

Step 3: We can use a loop to keep grouping the data until the groups cannot be changed any further.

# keep grouping the data until the groups are unchanged
num_groups = 0
  
while num_groups != len(groups):
  num_groups = len(groups)
  groups = group(groups)

This will then give the desired result.

Final Solution: The resulting solution is as follows.

def existing_group(entry, groups):
  for i, group in enumerate(groups):
    group_values = sum(group, ())
    entry_values = sum(entry, ())
    if any(value in group_values for value in entry_values):
      return i  
  return -1


def group(entries):
  groups = []
  for entry in entries:
    i = existing_group(entry, groups)
    if i == -1:
        groups.append(entry)
    else:
      groups[i] = groups[i] + entry
  return groups
  


prepared = [[tup] for tup in data]
groups = group(prepared)
num_groups = 0

while num_groups != len(groups):
  num_groups = len(groups)
  groups = group(groups)

print(groups)