Getting "invalid type character" error with daisy

6.3k Views Asked by At

I have a data frame with mixed data types (integer, character, and logical) which I'm trying to cluster with daisy.

I'm using:

gower_dist <- daisy(relchoice, metric = "gower")

and getting:

Error in daisy(relchoice, metric = "gower") : 
invalid type character for column numbers 3, 4, 5, 7, 8, 10, 13, 14, 15, 16, 
21, 29, 31, 32invalid type character for column numbers 3, 4, 5, 7, 8, 10, 
13, 14, 15, 16, 21, 29, 31, 32invalid type character for column numbers 3, 
4, 5, 7, 8, 10, 13, 14, 15, 16, 21, 29, 31, 32invalid type character for 
column numbers 3, 4, 5, 7, 8, 10, 13, 14, 15, 16, 21, 29, 31, 32invalid type 
character for column numbers 3, 4, 5, 7, 8, 10, 13, 14, 15, 16, 21, 29, 31, 
32invalid type character for column numbers 3, 4, 5, 7, 8, 10, 13, 14, 15, 
16, 21, 29, 31, 32invalid type character for column numbers 3, 4, 5, 7, 8, 
10, 13, 14, 15, 16, 21, 29, 31, 32invalid type character for column numbers 
3, 4, 5, 7, 8, 10, 13, 14, 15, 16, 21, 29, 31, 32invalid type character for 
column numbers 3, 4, 5, 7, 8, 10, 13, 14, 15, 16, 21, 29, 31, 32invalid type 
character for column numbers 3, 4, 5, 7, 8, 10, 13, 14, 15, 16, 21, 29, 31, 
32

Would love some help with this.

2

There are 2 best solutions below

0
On

A quick way of solving multiple problematic columns is to make sure the data frame is declared with stringsAsFactors set to TRUE:

relchoice <- data.frame(..., stringsAsFactors = TRUE)
gower_dist <- daisy(relchoice, metric = "gower")

data.frame()'s stringsAsFactors parameter default was set to FALSE in R version 4.0.0+, so this needs to be set specifically.

0
On

I was able to fix this problem by converting categorical fields to a factor datatype, for example:

df$job <- as.factor(df$job)