How to treat integer attributes in WEKA i.e. number of bedrooms (cannot be float values)

20 Views Asked by At

Using WEKA for classification problem on dataset in arff file format.

I want to use SMOTE on my dataset since I have a class imbalance; however, whenever I do this, it generates 'impossible' attribute values for some of these new synthetic instances. For example, an attribute 'number_of_bedrooms' cannot be a float value, yet after applying SMOTE, some of the values will be 3.5 etc.

I am wanting to apply some sort of filter in WEKA so that this specific attribute can only be whole intger numbers. Do I need to discretize this attribute? Would this be right for an attribute like number of rooms?

If I do discretize, should this be one bin per number of rooms from the data that is in the set i.e. one bin for each of 1,2,3,4, or 5 bedrooms? Or should it be binned taking into account the target class info which would be more like 1, 2-3, and then 4+ bedrooms to aid with classification?

I have tried applying the following filters: (NOTE: all settings are default unless specified below. I am using the GUI and not coding in the terminal, formatting here wants the lines as code/blockquote)

weka.filters.unsupervised.attribute.Discretize
binRangePrecision = 0
bins = 10 (this was the default but I don't know whether to change it)
findNumBins = True

weka.filters.unsupervised.attribute.NumericToNominal

weka.filters.supervised.attribute.Discretize
binRangePrecision = 0

1

There are 1 best solutions below

0
fracpete On

You could use the weka.filters.unsupervised.attribute.NumericToNominal filter to convert your number of bedrooms numeric attribute into a nominal one. This filter simply turns numbers into their string representation to be used as labels of a nominal attribute.