Generate random number in csv file with python

1.2k Views Asked by At

I am struggling with an issue regarding CSV files and Python. How would I generate a random number in a csv file row, based of a condition in that row.

Essentially, if the value in the first column is 'A' I want a random number between (1 and 5). If the value is B I want a random number between (2 and 6) and if the value is C, and random number between (3 and 7).

CSV File

Letter Color Random Number
A Green
C Red
B Red
B Green
C Blue

Thanks in advance

The only thing I have found was creating a new random number dataframe. But I need to create a random number for an existing df.

3

There are 3 best solutions below

0
On BEST ANSWER

Here is a simple way doing it without using pandas. this program modifies the third column by random number from a CSV file:

if the value in the first column is 'A' I want a random number between (1 and 5). If the value is B I want a random number between (2 and 6) and if the value is C, and random number between (3 and 7).

import csv
import random
letters_randoms = {
    'A': [1, 5],
    'B': [2, 6],
    'C': [3, 7],
}
rows = [] #result
with open('file.csv', 'r', encoding='utf-8') as file:
    reader = csv.reader(file)
    rows.append(next(reader))  # Skip the first line (header)
    for row in reader:
        letter = row[0].upper()
        row[2] = random.randint(letters_randoms[letter]
                                [0], letters_randoms[letter][1])# or just *letters_randoms[letter]
        rows.append(row)
# modify csv file
with open('file.csv', 'w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerows(rows)

Result:(file.csv)

LETTER,COLOR,Random Number
A,Green,3
c,Red,5
B,Red,2
B,Green,2
c,Blue,5
A,Purple,5
B,Green,3
A,Orange,3
c,Black,4
c,Red,5
0
On

You can use numpy's random module which will work on pandas series. First, create a series that maps each letter to the starting value of the random range. Map that to the "Letter" column and you'll have a series of random range start values. Use that with numpy.random.randint to generate the the new column.

>>> l_map = pd.Series([1,2,3], index=['A', 'B', 'C'], name="mapped")
>>> l_map
A    1
B    2
C    3
Name: mapped, dtype: int64
>>> l_code = df["Letter"].map(l_map)
>>> l_code
0    1
1    3
2    2
3    2
4    3
Name: Letter, dtype: int64
>>> df["Random Number"] = np.random.randint(l_code, l_code+5)
>>> df
  Letter  Color  Random Number
0      A  Green              5
1      C    Red              6
2      B    Red              2
3      B  Green              3
4      C   Blue              3
1
On

One of the ways is to use numpy.random.randint with numpy.select :

import pandas as pd
import numpy as np

df = pd.read_csv("inputfile.csv", sep=",")
#change the separator according to the actual format of your csv

categories = [df["Letter"].eq("A"),
              df["Letter"].eq("B"),
              df["Letter"].eq("C")]
​
#random.randint(low, high=None, size=None, dtype=int)
choices = [np.random.randint(1, 5+1),  #high is exclusive
           np.random.randint(2, 6+1),  #high is exclusive
           np.random.randint(3, 7+1)]  #high is exclusive

#numpy.select(condlist, choicelist, default=0)​
df["Random Number"] = np.select(categories, choices)

# Output :

print(df)

  Letter  Color  Random Number
0      A  Green              5
1      C    Red              6
2      B    Red              5
3      B  Green              5
4      C   Blue              6

If needed, you can use pandas.DataFrame.to_csv to generate a new (.csv) :

df.to_csv("output_file.csv", sep=",", index=False)