Getting weka to cross-validate a classifier from Ruby

612 Views Asked by At

Taking a hint from here! I'm utilising weka's library of classifiers from Ruby via RJB.

I want to be able to create a classifier from an .arff file and run 10 fold cross-validation with it to produce a confusion matrix as explained in the Weka wiki.

Below is the essential code involved.

# creating the classifier
Rjb::load("./weka.jar", jvmargs=["-Xmx2000M"])
classifier = Rjb::import("weka.classifiers.bayes.NaiveBayes").new

# importing the data
data_src = Rjb::import("java.io.FileReader").new("./the_data.arff")
data = Rjb::import("weka.core.Instances").new(data_src)

evaluation = Rjb::import("weka.classifiers.Evaluation").new(data)

folds = Rjb::import('java.lang.Integer').new(10)
rand = Rjb::import("java.util.Random").new(1)

evaluation.crossValidateModel(classifier, 
                              data, 
                              folds, 
                              rand )

print evaluation.toMatrixString()

From what I can tell from the weka wiki link above: this should work. But...

Fail: unknown method name `crossValidateModel' (RuntimeError)

Which from what I understand usually means that the method in question hasn't been supplied with the correct arguments, but I can't see how this would be the case.

The output of evaluation.java_methods includes crossValidateModel([Ljava.lang.String;Lweka.core.Instances;I[Ljava.lang.String;Ljava.util.Random;, Lweka.classifiers.Classifier;Lweka.core.Instances;ILjava.util.Random;[Ljava.lang.Object;])

which I'm not sure how to interpret.

Does anyone out there know what I need to do?


EDIT: although I wasn't able to solve the problem as posed here, it turns out that I was able to achieve what I wanted by starting over with JRuby as described here. Thanks to michaeltwofish for the tip :)

1

There are 1 best solutions below

1
On

Instead of using Rjb::import('java.lang.Integer').new(10) try to use plain 10.

You are calling a method crossValidateModel from Evaluation class. Which has a overload. see below. Notice that third parameter is int. You are using java.lang.Integer. In java int and Integer are not the same thing. Look for primitive types and Wrapper types in java if you are interested. Normally java is able to change between int and Integer since java 5. But you are calling from Rjb , I think java Integer is wrapped in some object for ruby purposes and that is confusing things.

From weka javadocs.

 crossValidateModel(Classifier, Instances, int)

Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances.

crossValidateModel(String, Instances, int, String[])

Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances.