R treats the subset as factor variables instead of numeric variables

100 Views Asked by At

In a comlex dataframe I am having a column with a net recalled salary inclusive NAs that I want to exclude plus a column with the year when the study was conducted ranging from 1992 to 2010, more or less like this:

q32 pgssyear
2000 1992
1000 1992
NA   1992
3000 1994
etc.

If I try to draw a boxplot like:

boxplot(dataset$q32~pgssyear,data=dataset, main="Recalled Net Salary per Month (PLN)",
    xlab="Year", ylab="Net Salary") 

it seems to work, however NAs might distort the calculations, so I wanted to get rid of them:

boxplot(na.omit(dataset$q32)~pgssyear,data=dataset, main="Recalled Net Salary per Month (PLN)",
    xlab="Year", ylab="Net Salary") 

Then I get a warning message that the length of pgsyear and q32 do not match, most likely cause I removed NAs from q32, so I tried to shorten the pgsyear, so that it does not include the rows that correspond to NAs from the q32 column:

   pgssyearprim <- subset(dataset$pgssyear, dataset$q32!= NA )

however then the pgsyearprim gets treated as a factor variable:

pgssyearprim
factor(0)       

and I get the same warning message if I introduce it to the boxplot formula...

Levels: 1992 1993 1994 1995 1997 1999 2002 2005 2008 2010
1

There are 1 best solutions below

1
On BEST ANSWER

Of course they wouldn't ... you removed some of the data only from the LHS with na.omit(dataset$q32)~pgssyear. Instead use !is.na(dataset$q32) as a subset argument