I have a dataset below which shows if a customer is a return customer or not. The end goal is for all returned customers, I need to map about 25% of them to 'yes 1 purchase' and 75% of them to 'yes >1 purchase'. I also need to set a seed to make sure the result does not change each time I re-run the process.
I researched on numpy random function and random seed function, but it seems they generate random numbers instead of randomly assign/map a proportion of data value to a specific category. Can anyone advise on how to do this?
import pandas as pd
import numpy as np
list_customer_name = ['customer1','customer2','customer3','customer4','customer5',
'customer6','customer7','customer8','customer9','customer10','customer11','customer12',
'customer13','customer14','customer15','customer16','customer17','customer18']
list_return_customer = ['yes','yes','yes','yes','yes','yes',
'yes','yes','yes','yes','yes','yes','yes','yes',
'yes','yes','no','no']
df_test = pd.DataFrame({'customer_name': list_customer_name,
'return_customer?':list_return_customer})
data looks like this
desired output looks like this - 25% of customers (4 customer highlighted in yellow) flagged "yes" in the "return_customers?" column are mapped to "yes 1 purchase", the remaining 75% of customers (12 customers highlighted in green) are mapped to "yes >1 purchase".


The following code seems to match your specifications:
Explanations:
I used the
randommodule and set the seed to and arbitrary value withrandom.seed(1234). Setting the seed allows random functions to behave the same every time we run the program.I defined the number of "yes >1 purchase" to allocate with the variable
number_of_yes_1_purchase. You can hardcode it or compute it depending on the length oflist_return_customer(but remember to round the result to have anint).With the
whileloop, I loop until I have allocated all of the "yes >1 purchase", so each time I allocate one I decrease the remaining number by one withnumber_of_yes_1_purchase -= 1I used
rand_index = random.randint(0, len(list_return_customer_final) - 1)to get a random index of the list to set to"yes 1 purchase". If this index is already a "yes 1 purchase" or a "no", I skip the current iteration withcontinue.The loop ends when
number_of_yes_1_purchasereaches 0.If you have any questions, don't hesitate to ask