Facing trouble in creating sublists of a list

83 Views Asked by At

My task is to create combinations, more like a Cartesian product for some attribute lines of a library file. I am currently facing the problem of grouping the same attributes(of course the adjacent parameters are different) as sublists of a list. Remember my input may contain a thousand lines of attributes , which I need to extract from a library file.

######################

Example input:

attr1 apple 1                                                          
attr1 banana 2

attr2 grapes 1                                   
attr2 oranges 2

attr3 watermelon 0

######################

Example output:

[['attr1 apple 1','attr1 banana 2'], ['attr2 grapes 1','attr2 oranges 2'], ['attr3 watermelon 0']]

The result I am getting:

['attr1 apple 1','attr1 banana 2', 'attr2 grapes 1','attr2 oranges 2', 'attr3 watermelon 0']

Below is the code:

import re

# regex pattern definition
pattern = re.compile(r'attr\d+')

# Open the file for reading
with open(r"file path") as file:
    # Initialize an empty list to store matching lines
    matching_lines = []

    # reading each line 
    for line in file:
        # regex pattern match
        if pattern.search(line):
            # matching line append to the list
            matching_lines.append(line.strip())

# Grouping the  elements based on the regex pattern

#The required list
grouped_elements = []

#Temporary list for sublist grouping
current_group = []

for sentence in matching_lines:
    if pattern.search(sentence):
        current_group.append(sentence)
    else:
        if current_group:
            grouped_elements.append(current_group)
        current_group = [sentence]

if current_group:
    grouped_elements.append(current_group)

# Print the grouped elements
for group in grouped_elements:
    print(group)

2

There are 2 best solutions below

0
JLDiaz On BEST ANSWER

When the file is already sorted, there is a simple solution:

from itertools import groupby

def read_data(filename):
    """Yields one line at a time, skipping empty lines"""
    with open(filename) as file:
        for line in file:
            line = line.strip()
            if not line:
                continue
            yield line      

def grouping_key(x):
    "Selects the part of the line to use as key for grouping"
    return x.split()[0]   # The first word

groups = []
for k, g in groupby(read_data("sample.txt"), grouping_key):
    groups.append(list(g))

print(groups)
1
YouHoGeon On
grouped_elements = [
    ['A', 'B', 'C'],
    ['D', 'E'],
    ['F', 'G', 'H', 'I']
]

count = [len(row) for row in grouped_elements]
count.append(1)
# count is [3, 2, 4, 1]

for i in range(len(count) - 2, -1, -1):
    count[i] = count[i] * count[i + 1]

# count is [3 * 2 * 4 * 1, 2 * 4 * 1, 4 * 1, 1]

for i in range(count[0]):
    row = []
    for j in range(len(grouped_elements)):
        idx = (i // count[j + 1]) % len(grouped_elements[j])
        
        row.append(grouped_elements[j][idx])
    
    print(row)
result

['A', 'D', 'F']
['A', 'D', 'G']
['A', 'D', 'H']
['A', 'D', 'I']
['A', 'E', 'F']
['A', 'E', 'G']
['A', 'E', 'H']
['A', 'E', 'I']
['B', 'D', 'F']
['B', 'D', 'G']
['B', 'D', 'H']
['B', 'D', 'I']
['B', 'E', 'F']
['B', 'E', 'G']
['B', 'E', 'H']
['B', 'E', 'I']
['C', 'D', 'F']
['C', 'D', 'G']
['C', 'D', 'H']
['C', 'D', 'I']
['C', 'E', 'F']
['C', 'E', 'G']
['C', 'E', 'H']
['C', 'E', 'I']