I am trying to impute missing values using the mi package in r and ran into a problem.
When I load the data into r, it recognizes the column with missing values as a factor variable. If I convert it into a numeric variable with the command
dataset$Income <- as.numeric(dataset$Income)
It converts the column to ordinal values (with the smallest value being 1, the second smallest as 2, etc...)
I want to convert this column to numeric values, while retaining the original values of the variable. How can I do this?
EDIT: Since people have asked, here is my code and an example of what the data looks like.
DATA:
96 GERMANY 6 1960 72480 73 50.24712 NA 0.83034767 0
97 GERMANY 6 1961 73123 85 48.68375 NA 0.79377610 0
98 GERMANY 6 1962 73739 98 48.01359 NA 0.70904115 0
99 GERMANY 6 1963 74340 132 46.93588 NA 0.68753213 0
100 GERMANY 6 1964 74954 146 47.89413 NA 0.67055298 0
101 GERMANY 6 1965 75638 160 47.51518 NA 0.64411484 0
102 GERMANY 6 1966 76206 172 48.46009 NA 0.58274711 0
103 GERMANY 6 1967 76368 183 48.18423 NA 0.57696055 0
104 GERMANY 6 1968 76584 194 48.87967 NA 0.64516949 0
105 GERMANY 6 1969 77143 210 49.36219 NA 0.55475352 0
106 GERMANY 6 1970 77783 227 49.52712 3,951.00 0.53083969 0
107 GERMANY 6 1971 78354 242 51.01421 4,282.00 0.51080717 0
108 GERMANY 6 1972 78717 254 51.02941 4,655.00 0.48773913 0
109 GERMANY 6 1973 78950 264 50.61033 5,110.00 0.48390087 0
110 GERMANY 6 1974 78966 270 48.82353 5,561.00 0.56562229 0
111 GERMANY 6 1975 78682 284 50.50279 6,092.00 0.56846030 0
112 GERMANY 6 1976 78298 301 49.22833 6,771.00 0.53536154 0
113 GERMANY 6 1977 78160 321 49.18999 7,479.00 0.55012371 0
Code:
Income <- dataset$Income
gives me a factor variable, as there are NA's in the data.If I try to turn it into numeric with
as.numeric(Income)
It throws away the original values, and replaces them with the rank of the column. I would like to keep the original values, while still recognizing missing values.
A problem every data manager from Germany knows: The column with the
NA
s conatins numbers with colons. ButR
only knows the english style of decimal points without digit grouping. So this column is treated as ordinally scaled character variable.Try to remove the colons and you'll get the numeric values.
By the way, even if we write decimal colons in Germany, Numbers like
3,951.00
syntactically don't make sense. They even don't make sense in other languages. See these examples of international number syntax.