I would like to split a list of numbers and create two subsets based on two conditions:
- The sum of the numbers of the two subsets should have a particular ratio (e.g. 45%-55%)
- The count of the numbers of the two subsets, should be nearly the same (nearly means to diverge at most 5%)
What I have done by now is merely a probabilistic rather than a dev approach which could be used for small lists after adding a while loop that will keep the count condition. However I would like to have a more robust approach, without running the code 100 iterations in order to find (probably find) the best split.
import numpy as np
from scipy.stats import norm
def split_into_normal_distributions(numbers, target_percentage):
# Calculate the means and standard deviation of the two normal distributions
mean_1 = np.mean(numbers) * target_percentage
mean_2 = np.mean(numbers) *(1-target_percentage)
std = np.std(numbers)
# Initialize subsets
subset_1 = []
subset_2 = []
for num in numbers:
# Calculate probability densities for each number in both distributions
pdf_1 = norm.pdf(num, loc=mean_1, scale=std)
pdf_2 = norm.pdf(num, loc=mean_2, scale=std)
# Calculate the ratio of probabilities for assignment
ratio = pdf_1 / (pdf_1 + pdf_2)
pdf_sum = pdf_1 + pdf_2
# Assign numbers to subsets based on the calculated ratio
if np.random.rand() < ratio:
subset_1.append(num)
else:
subset_2.append(num)
return subset_1, subset_2
# Sample list of numbers
numbers = [10, 20, 30, 40, 50, 60, 70, 80,10,20,25,20,21,26,65,95,84,65,2,3,6,198,16,651,984,651,35,61,651,16,56,651,651,651,2,32,615,651,984,615,351,651,651,3,5]
# Split into two normal distributions with specified means and standard deviation
subset_1, subset_2 = split_into_normal_distributions(numbers, 0.4)
print("Subset 1 (40% mean):", subset_1, sum(subset_1)/sum(numbers), len(subset_1))
print("Subset 2 (60% mean):", subset_2, sum(subset_2)/sum(numbers), len(subset_2))
len(numbers)
Thank you
Here's what I came up with. This is a simple greedy approach to produce a split with even totals. Not quite what you asked for but hopefully it helps.