I have a dataset below which shows if a customer is a return customer or not. The end goal is for all returned customers, I need to map about 25% of them to 'yes 1 purchase' and 75% of them to 'yes >1 purchase'. I also need to set a seed to make sure the result does not change each time I re-run the process.
I researched on numpy random function and random seed function, but it seems they generate random numbers instead of randomly assign/map a proportion of data value to a specific category. Can anyone advise on how to do this?
import pandas as pd
import numpy as np
list_customer_name = ['customer1','customer2','customer3','customer4','customer5',
'customer6','customer7','customer8','customer9','customer10','customer11','customer12',
'customer13','customer14','customer15','customer16','customer17','customer18']
list_return_customer = ['yes','yes','yes','yes','yes','yes',
'yes','yes','yes','yes','yes','yes','yes','yes',
'yes','yes','no','no']
df_test = pd.DataFrame({'customer_name': list_customer_name,
'return_customer?':list_return_customer})
data looks like this
desired output looks like this - 25% of customers (4 customer highlighted in yellow) flagged "yes" in the "return_customers?" column are mapped to "yes 1 purchase", the remaining 75% of customers (12 customers highlighted in green) are mapped to "yes >1 purchase".
The following code seems to match your specifications:
Explanations:
I used the
random
module and set the seed to and arbitrary value withrandom.seed(1234)
. Setting the seed allows random functions to behave the same every time we run the program.I defined the number of "yes >1 purchase" to allocate with the variable
number_of_yes_1_purchase
. You can hardcode it or compute it depending on the length oflist_return_customer
(but remember to round the result to have anint
).With the
while
loop, I loop until I have allocated all of the "yes >1 purchase", so each time I allocate one I decrease the remaining number by one withnumber_of_yes_1_purchase -= 1
I used
rand_index = random.randint(0, len(list_return_customer_final) - 1)
to get a random index of the list to set to"yes 1 purchase"
. If this index is already a "yes 1 purchase" or a "no", I skip the current iteration withcontinue
.The loop ends when
number_of_yes_1_purchase
reaches 0.If you have any questions, don't hesitate to ask