Aggregate function only occasionally working in R

110 Views Asked by At

I am doing some analysis on a 2 datasets that are split into different countries (same for both datasets just different numbers) but for 3 of the countries there is missing data. I am using the aggregate() function to fill in dummy values so that I can do my analysis without NAs popping up. However for some reason the function won't work when merging the new values back into the original data.

But if I clear my workspace and run it again it might work but only for 1 or 2 of the countries, or for 1 of the 2 datasets. I can't understand why it may work one time but not another, when I'm not changing the code any time. Any help would be greatly appreciated.

mil<-read.csv("C:/Data_millions.csv",header=TRUE)
per<-read.csv("C:/Data_percent.csv",header=TRUE)

##Fill in blanks for ZA
#Create dummy numbers for each category of age/age-gender
aggregate(data=mil,ZA~TypeOfPerson,mean,na.rm=TRUE)
#Merge output back into original data
ave_ZA<-ave(mil$ZA,mil$TypeOfPerson,FUN=function(x)mean(x,na.rm=TRUE))
mil$ZA<-ifelse(is.na(mil$ZA),ave_ZA,mil$ZA)

aggregate(data=per,ZA~TypeOfPerson,mean,na.rm=TRUE)
ave_ZA_per<-ave(per$ZA,per$TypeOfPerson,FUN=function(x)mean(x,na.rm=TRUE))
per$ZA<-ifelse(is.na(per$ZA),ave_ZA_per,per$ZA)

##Fill in blanks for BEWA
aggregate(data=mil,BEWA~TypeOfPerson,mean,na.rm=TRUE)
ave_BEWA<-ave(mil$BEWA,mil$TypeOfPerson,FUN=function(x)mean(x,na.rm=TRUE))
mil$BEWA<-ifelse(is.na(mil$BEWA),ave_BEWA,mil$BEWA)

aggregate(data=per,BEWA~TypeOfPerson,mean,na.rm=TRUE)
ave_BEWA_per<-ave(per$BEWA,per$TypeOfPerson,FUN=function(x)mean(x,na.rm=TRUE))
per$BEWA<-ifelse(is.na(per$BEWA),ave_ZA_per,per$BEWA)

##Fill in blanks for GR
aggregate(data=mil,GR~TypeOfPerson,mean,na.rm=TRUE)
ave_GR<-ave(mil$GR,mil$TypeOfPerson,FUN=function(x)mean(x,na.rm=TRUE))
mil$GR<-ifelse(is.na(mil$GR),ave_GR,mil$GR)

aggregate(data=per,GR~TypeOfPerson,mean,na.rm=TRUE)
ave_GR_per<-ave(per$GR,per$TypeOfPerson,FUN=function(x)mean(x,na.rm=TRUE))
per$GR<-ifelse(is.na(per$GR),ave_GR_per,per$GR)

Update: some example data and where it has not worked

Here is where there are still NAs: https://www.dropbox.com/s/bd9c9mjttdehbrt/missing.jpg?dl=0

Here is a link to my data: https://www.dropbox.com/s/vsiq9nr6ic3odmv/Data_millions.csv?dl=0

0

There are 0 best solutions below