How to load the WEKA pre-preprocessing steps to R?

405 Views Asked by At

I have used the WEKA GUI Java here to do the preprocessing of the data. I would like to use the same preprocessing steps now in R.

For example, I want to load the preprocessing of MultiFilter of WEKA GUI to R. I cannot find it in RWeka.

How to load the WEKA prepreprocessing steps to R?

enter image description here

enter image description here

enter image description here

1

There are 1 best solutions below

3
On

You can load WEKA GUI steps partially with RWeka or with Weka command line tools that are are far more extensive than the available functions in RWeka. So you can extend the RWeka with the command line commands through the system command in R. Luckily, the parameters in WEKA GUI and the WEKA commandline are the same. I recommend extracting the weka-src.jar with jar xf weka-src.jar to read the source.

There exist many functions for the MultiFilter

java weka.filters.MultiFilter --help
java weka.filters.unsupervised.attribute.PartitionedMultiFilter --help

where the second allows you specify the attribute range. Otherwise, they seem to be identical.

Then you can run your first discretize filter with

java weka.filters.unsupervised.attribute.Discretize -F -B 20 -M -1.0 -R 27 -i yourFile.arff

and then direct its output to next Discretize, eventually to NumericTransform and Resample. The command line provides fabulous instructions on the commands in the following way

java weka.filters.unsupervised.attribute.NumericTransform --help
java weka.filters.unsupervised.attribute.Remove --help
java weka.filters.unsupervised.instance.Resample --help
java weka.filters.supervised.instance.Resample --help

and you can check them from the directory structure or the index.

RWeka

RWeka package provides the functions

  • Discretize()
  • Normalize()
  • make_Weka_filter() to create R interfaces to Weka filters

and there is no NumericTransform and Remove functions. You need to use their arguments so not directly just by copy-pasting a java code from WEKA GUI. Perhaps, one solution could be use the system command and execute the Java code with it, without having to need to learn the RWeka itself. There seems to be some gap between the WEKA GUI and the R package.

Running Weka on Commandline

Even though the commands are missing through RWeka interface, you can also use the system commands in R. For example, you can run the remove command

java weka.filters.unsupervised.attribute.Remove -i yourfile.arff

such that

system("java weka.filters.unsupervised.attribute.Remove -i yourfile.arff")

I have the following setup here so we can run Discretize with the following way.

$ cat $WEKAINSTALL/data/iris.arff |tail
6.8,3.2,5.9,2.3,Iris-virginica
6.7,3.3,5.7,2.5,Iris-virginica
6.7,3.0,5.2,2.3,Iris-virginica
6.3,2.5,5.0,1.9,Iris-virginica
6.5,3.0,5.2,2.0,Iris-virginica
6.2,3.4,5.4,2.3,Iris-virginica
5.9,3.0,5.1,1.8,Iris-virginica
%
%
%
$ java weka.filters.unsupervised.attribute.Discretize -i $WEKAINSTALL/data/iris.arff |tail
'\'(6.46-6.82]\'','\'(2.96-3.2]\'','\'(5.13-5.72]\'','\'(2.26-inf)\'',Iris-virginica
'\'(6.82-7.18]\'','\'(2.96-3.2]\'','\'(4.54-5.13]\'','\'(2.26-inf)\'',Iris-virginica
'\'(5.74-6.1]\'','\'(2.48-2.72]\'','\'(4.54-5.13]\'','\'(1.78-2.02]\'',Iris-virginica
'\'(6.46-6.82]\'','\'(2.96-3.2]\'','\'(5.72-6.31]\'','\'(2.26-inf)\'',Iris-virginica
'\'(6.46-6.82]\'','\'(3.2-3.44]\'','\'(5.13-5.72]\'','\'(2.26-inf)\'',Iris-virginica
'\'(6.46-6.82]\'','\'(2.96-3.2]\'','\'(5.13-5.72]\'','\'(2.26-inf)\'',Iris-virginica
'\'(6.1-6.46]\'','\'(2.48-2.72]\'','\'(4.54-5.13]\'','\'(1.78-2.02]\'',Iris-virginica
'\'(6.46-6.82]\'','\'(2.96-3.2]\'','\'(5.13-5.72]\'','\'(1.78-2.02]\'',Iris-virginica
'\'(6.1-6.46]\'','\'(3.2-3.44]\'','\'(5.13-5.72]\'','\'(2.26-inf)\'',Iris-virginica
'\'(5.74-6.1]\'','\'(2.96-3.2]\'','\'(4.54-5.13]\'','\'(1.78-2.02]\'',Iris-virginica
$ 

Some useful information

  1. Use Weka in your Java code

  2. Download the Linux Developer version, unzip it and read the README with many fabulous examples about using WEKA particularly on command line.

  3. Wiki here

  4. Maybe irrelevant: Generating source code from WEKA classes