I'm studying data mining and my homework requires me to calculate the ratio of values in the dataset. This homework is in .ipynb and there are process i must follow.
Here are the requirements in order: enter image description here enter image description here The dataset is like this: enter image description here Here is my code for the 1st requirement
def str2dict(s: str) -> List[Dict]:
"""Convert given string to dict
return list(eval(s))
for i in range (0,shape[0]):
cate_col_df["payloads"][i] = str2dict(cate_col_df["payloads"][i])
cate_col_df = cate_col_df.explode("payloads")
Here is my code for the 2nd requirement
import copy
def missing_ratio(s):
# TODO:
# YOUR CODE HERE
tmp = s.isna().sum()
return (tmp / s.shape[0]).round(3)*100
raise NotImplementedError()
def num_values(s):
# TODO:
# YOUR CODE HERE
tmp = copy.deepcopy(s)
if isinstance(s[0], dict):
for i in range (0,shape[0]):
tmp[i] = str(tmp[i])
return len(tmp.dropna().unique())
raise NotImplementedError()
def value_ratios(s):
# TODO:
# YOUR CODE HERE
tmp = copy.deepcopy(s)
if isinstance(s[0], dict):
for i in range (0,shape[0]):
tmp[i] = str(tmp[i])
tmp1 = tmp.value_counts().to_dict()
u = tmp.dropna().unique()
for i in u:
tmp1[i] = round(tmp1[i]/shape[0],3)*100
return tmp1
raise NotImplementedError()
cat_col_info_df = cate_col_df.agg([missing_ratio, num_values, value_ratios])
cat_col_info_df
My code run fine when i didn't run the code for 1st requirement, (but it is a wrong answer), however, if i run both of them, the error will be 'KeyError: 0' I try to print the s variable and i can access s[0] but i don't know why it still annouce the error
The link to my dataset: https://github.com/MageOfFlowers/data/blob/master/rawdata-confidential/spaceX_laucnh.csv Please ask me if you need more information