How do I combine similar elements in a list in python?

292 Views Asked by At

As a practice exercise, I'm trying to generate a color pallette from an image, and am using Python to turn the raw extracted RGB channels into a pallette. This is fairly simple using a combination of lists and dictionaries, but I'd like to be able to put a constraint the the number of colors in the pallette by combining similar color channels, unless they both have a large presence in the image.

Say I have a dictionary with my RGB values and their counts:

countsR = {"255": 301, "250": 16, "10": 589, "14": 124, "8": 132}

This pallette requires 3 bits to index into. Ideally I would run some function, combine_channels(dd, max_distance), which does some terrible O(N^2) things, and outputs something like this:

print(combine_channels(countsR, 10))
>>> {"255": 317, "10": 845}

Which now may be indexed by only one bit!

It would also be good if it could keep track of which things it replaced with what, maybe in another dictionary:

countsR, replacements = combine_channels(countsR, 10)
print(replacements)
>>> {"255": ["250"], "10": ["8", "14"]}

Any ideas on what this may look like?

2

There are 2 best solutions below

5
On BEST ANSWER

In many cases there's multiple different grouping options as the comments show. One easy option is to iterate through channels in numerical order forming groups and if a channel can't be fitted to existing group create a new one. It won't result to maximal distance but it will guarantee minimum number of groups to be generated:

def combine_channels(channels, dist):
    result = {}
    replacements = {}
    groups = []
    group = []
    key = None

    # Iterate through channels in ascending numerical order
    for channel, count in sorted((int(k), v) for k, v in channels.items()):
        # Add new group in case that channel doesn't fit to current group
        if group and channel - key > dist:
            groups.append((key, group))
            group = []
            key = None

        # Add channel to group
        group.append((channel, count))

        # Pick a new key in case there's none or current channel is within
        # dist from first channel in the group
        if key is None or channel - group[0][0] <= dist:
            key = channel

    # Add last group in case it exists
    if group:
        groups.append((key, group))

    for key, group in groups:
        result[key] = sum(x[1] for x in group)
        replacements[key] = [x[0] for x in group if x[0] != key]

    return result, replacements

countsR1 = {"255": 301, "250": 16, "10": 589, "14": 124, "8": 132}
countsR2 = {"0": 10, "11": 20, "7": 30, "19": 40, "25": 50}
countsR3 = {"0": 5, "11": 10}
print(combine_channels(countsR1, 10))
print(combine_channels(countsR2, 10))
print(combine_channels(countsR3, 10))

Output:

({14: 845, 255: 317}, {14: [8, 10], 255: [250]})
({25: 90, 7: 60}, {25: [19], 7: [0, 11]})
({0: 5, 11: 10}, {0: [], 11: []})

Time complexity of above is O(n log n) since sorting is used.

1
On

Here's what I came up with:

def combine_channels(lst, max_dist):
    colors = list(set(lst)) # unique colors
    counts = dict()
    replacements = dict()
    all_repl = []

    for el in lst:
        counts[el] = counts.get(el, 0) + 1

    N = len(colors)
    dists = np.zeros((N, N))
    for i in range(N - 1):
        for j in range(i + 1, N):
            dist = abs(colors[i] - colors[j])
            dists[i, j] = dist
            if(dist < max_dist):
                if(colors[i] in all_repl or colors[j] in all_repl):
                    continue
                else:
                    if(counts.get(colors[i], 0) > counts.get(colors[j], 0)):
                        winner = colors[i]
                        loser = colors[j]
                    else:
                        winner = colors[j]
                        loser = colors[i]
                counts[winner] += counts.get(loser, 0)
                counts[loser] = 0
                if winner not in replacements:
                    replacements[winner] = list()
                replacements[winner].append(loser)
                all_repl.append(loser)
                if loser not in replacements:
                    continue
                else:
                    replacements[winner] = replacements[winner].extend(replacements[loser])
                    replacements.pop(loser, None)              
    print(replacements)

There's probably many more efficient ways to do it, and it doesn't satisfy the maximum distance requirement, but it works!