Loading data with missing values as numeric data

928 Views Asked by At

I am trying to impute missing values using the mi package in r and ran into a problem.

When I load the data into r, it recognizes the column with missing values as a factor variable. If I convert it into a numeric variable with the command

dataset$Income <- as.numeric(dataset$Income)

It converts the column to ordinal values (with the smallest value being 1, the second smallest as 2, etc...)

I want to convert this column to numeric values, while retaining the original values of the variable. How can I do this?

EDIT: Since people have asked, here is my code and an example of what the data looks like.

DATA:

96  GERMANY 6   1960    72480   73  50.24712    NA  0.83034767  0
97  GERMANY 6   1961    73123   85  48.68375    NA  0.79377610  0
98  GERMANY 6   1962    73739   98  48.01359    NA  0.70904115  0
99  GERMANY 6   1963    74340   132 46.93588    NA  0.68753213  0
100 GERMANY 6   1964    74954   146 47.89413    NA  0.67055298  0
101 GERMANY 6   1965    75638   160 47.51518    NA  0.64411484  0
102 GERMANY 6   1966    76206   172 48.46009    NA  0.58274711  0
103 GERMANY 6   1967    76368   183 48.18423    NA  0.57696055  0
104 GERMANY 6   1968    76584   194 48.87967    NA  0.64516949  0
105 GERMANY 6   1969    77143   210 49.36219    NA  0.55475352  0
106 GERMANY 6   1970    77783   227 49.52712    3,951.00    0.53083969  0
107 GERMANY 6   1971    78354   242 51.01421    4,282.00    0.51080717  0
108 GERMANY 6   1972    78717   254 51.02941    4,655.00    0.48773913  0
109 GERMANY 6   1973    78950   264 50.61033    5,110.00    0.48390087  0
110 GERMANY 6   1974    78966   270 48.82353    5,561.00    0.56562229  0
111 GERMANY 6   1975    78682   284 50.50279    6,092.00    0.56846030  0
112 GERMANY 6   1976    78298   301 49.22833    6,771.00    0.53536154  0
113 GERMANY 6   1977    78160   321 49.18999    7,479.00    0.55012371  0

Code:

Income <- dataset$Income

gives me a factor variable, as there are NA's in the data.If I try to turn it into numeric with

as.numeric(Income)

It throws away the original values, and replaces them with the rank of the column. I would like to keep the original values, while still recognizing missing values.

1

There are 1 best solutions below

0
On

A problem every data manager from Germany knows: The column with the NAs conatins numbers with colons. But R only knows the english style of decimal points without digit grouping. So this column is treated as ordinally scaled character variable.

Try to remove the colons and you'll get the numeric values.

By the way, even if we write decimal colons in Germany, Numbers like 3,951.00 syntactically don't make sense. They even don't make sense in other languages. See these examples of international number syntax.