Dask: masking a dataframe based on multiple conditions to perform selective calculations

1.7k Views Asked by At

I'm looking to replace values on rows where multiple conditions are met when using dask. The pre-set value with which I'll perform the replacement is present in one column, and if the condition is met, then I'll replace the target value with the pre-set value.

I'd like to stay in dask rather than performing this action with another library if possible because of memory constraints when shifting dataframes around.

At the moment, I'm attempting to use the .mask command.

Where GrassDeadFMC >= 12 and Windspeed <= 10 then make GrassFMCoefficient equal to the value in GFMG12L10. ddf['GrassFMCoefficient'] = ddf['GFMG12L10'].mask(ddf['GrassDeadFMC'] >= 12 & ddf['WindSpeed'] <= 10)

The error I'm receiving is:

ValueError: Metadata inference failed in `and_`.

Original error is below:
------------------------
TypeError('cannot compare a dtyped [float32] array with a scalar of type [bool]')

A minimum executable script, which gives a slightly different error, but probably suffers from the same issue, I guess.

import dask.dataframe as dd
import pandas as pd
from random import randint
df = pd.DataFrame({'GrassFMCoefficient': [0 for x in range(10)],
                   'GFMG12L10': [randint(1, 50) for x in range(10)],
                   'GrassDeadFMC': [randint(1, 50) for x in range(10)],
                   'WindSpeed': [randint(1, 30) for x in range(10)]})
ddf = dd.from_pandas(df,npartitions=1)
ddf['GrassFMCoefficient'] = ddf['GFMG12L10'].mask(ddf['GrassDeadFMC'] >= 12 & ddf['WindSpeed'] <= 10)
print(ddf.head(10))

Any help on this would be appreciated.

1

There are 1 best solutions below

2
On BEST ANSWER

do you want result like this??

do you want result like this?

you have to isolate each condition with Bracket '()', ex. (condition1) & (condition2). it makes Boolean compare with Boolean too.

ddf['GrassFMCoefficient'] = ddf['GFMG12L10'].mask((ddf['GrassDeadFMC'] >= 12) & (ddf['WindSpeed'] <= 10))