I am trying to understand and implement classifiers A class in R is based on several UCIs and one of them (http://archive.ics.uci.edu/ml/datasets/Chronic_Kidney_Disease).
When trying to print a confusion matrix you are giving the error “all arguments must have the same length”.
What am I doing wrong?
library(caret)
library(dplyr)
library(e1071)
library(NLP)
library(tm)
ds = read.csv('kidney_disease.csv',
header = TRUE)
#Remover colunas inutiliz?veis
ds <- subset(ds, select = -c(age), classification =='ckd' )
x <- subset(ds, select = -classification) #make x variables
y <- ds$classification #make y variable(dependent)
# test on the whole set
#pred <- predict(model, subset(ds, select=-classification))
trainPositive<-x
testnegative<-y
inTrain<-createDataPartition(1:nrow(trainPositive),p=0.6,list=FALSE)
trainpredictors<-trainPositive[inTrain,1:4]
trainLabels<-trainPositive[inTrain,6]
testPositive<-trainPositive[-inTrain,]
testPosNeg<-rbind(testPositive,testnegative)
testpredictors<-testPosNeg[,1:4]
testLabels<-testPosNeg[,6]
svm.model<-svm(trainpredictors,y=NULL,
type='one-classification',
nu=0.10,
scale=TRUE,
kernel="radial")
svm.predtrain<-predict(svm.model,trainpredictors)
svm.predtest<-predict(svm.model,testpredictors)
# confusionMatrixTable<-table(Predicted=svm.pred,Reference=testLabels)
# confusionMatrix(confusionMatrixTable,positive='TRUE')
confTrain <- table(Predicted=svm.predtrain,Reference=trainLabels)
confTest <- table(Predicted=svm.predtest,Reference=testLabels)
confusionMatrix(confTest,positive='TRUE')
print(confTrain)
print(confTest)
#grid
Here are some of the first lines of the dataset I'm using:
id bp sg al su rbc pc pcc ba bgr bu sc sod pot hemo pcv wc
1 0 80 1.020 1 0 normal notpresent notpresent 121 36 1.2 NA NA 15.4 44 7800
2 1 50 1.020 4 0 normal notpresent notpresent NA 18 0.8 NA NA 11.3 38 6000
3 2 80 1.010 2 3 normal normal notpresent notpresent 423 53 1.8 NA NA 9.6 31 7500
4 3 70 1.005 4 0 normal abnormal present notpresent 117 56 3.8 111 2.5 11.2 32 6700
5 4 80 1.010 2 0 normal normal notpresent notpresent 106 26 1.4 NA NA 11.6 35 7300
6 5 90 1.015 3 0 notpresent notpresent 74 25 1.1 142 3.2 12.2 39 7800
rc htn dm cad appet pe ane classification
1 5.2 yes yes no good no no ckd
2 no no no good no no ckd
3 no yes no poor no yes ckd
4 3.9 yes no no poor yes yes ckd
5 4.6 no no no good no no ckd
6 4.4 yes yes no good yes no ckd
The error log:
> confTrain <- table (Predicted = svm.predtrain, Reference = trainLabels)
Table error (Predicted = svm.predtrain, Reference = trainLabels):
all arguments must be the same length
> confTest <- table (Predicted = svm.predtest, Reference = testLabels)
Table error (expected = svm.predtest, reference = testLabels):
all arguments must be the same length
>
> confusionMatrix (confTest, positive = 'TRUE')
ConfusionMatrix error (confTest, positive = "TRUE"):
'confTest' object not found
>
>
> print (confTrain)
Printing error (confTrain): object 'confTrain' not found
> print (confTest)
Printing error (confTest): object 'confTest' not found
I see a number of issues. First it seems that a lot of your data is of class character rather than numeric, which is required by the classifier. Let's pick some columns and convert to numeric. I will use
data.table
becausefread
is very convenient.Once the data is cleaned up, you can sample a training and test set with
createDataPartition
as in your original code.Then we can create the model and make the predictions.
Your main issue with the cross table was that the model can only predict for cases that don't have any
NA
s, so you have to subset the classification levels to those with predictions. Then you can evaluateconfusionMatrix
:Data