I have below code which uses pandas dataframe. However when i convert Pandas dataframe to Koalas and run the below code I get error "Function sample currently does not support specifying exact number of items to return. Use frac instead"
df.loc[df.sample(int(len(df) * .05)).index, 'distance'] = None
I tried using below code which give me random record. But how do it get all records in dataframe and replace the distance with null value for 5 % records
df.sample(frac=0.05, random_state=1)
If you just want to keep 5% of the records in the distance column, you can use
when
with arand
random number:If you want to stick with koalas and not Spark, you can do this: