My dataset looks somewhat like this:
ID job01 age_started_job01 job02 age_started_job02 job03 age_started_job03
1 "waiter" 18 "lawyer" 25 NA NA
2 "plumber" 18 "builder" 20 "foreman" 25
I'm trying to get to a format where I have this (note that I assume that people stayed in the same job as the previous year if they didn't start a new one that year):
ID job_age18 job19 job20 job21 job22 job23 job24 job25
1 "waiter" "waiter" "waiter" "waiter" "waiter" "waiter" "waiter" "lawyer"
2 "plumber" "plumber" "builder" "builder" "builder" "builder" "builder" "foreman"
And then convert it to long like this:
ID job age
1 "waiter" 18
1 "waiter" 19
1 "waiter" 20
1 "waiter" 21
1 "waiter" 22
1 "waiter" 23
1 "waiter" 24
1 "lawyer" 25
2 "plumber" 18
2 "plumber" 19
2 "builder" 20
2 "builder" 21
2 "builder" 22
2 "builder" 23
2 "builder" 24
2 "foreman" 25
The step from wide to long can be done with reshape2
and I know how to do it, but I cannot get from the first dataset to the intermediate format. I have tried something like this (ugly and with loops):
#create job variables
for (age in 18:25 ){
data[,paste("job_age",age, sep="")] <- NA
}
#populate each variable looping through job spells and ages
for (spell in c("01","02","03")){
for (age in 18:25){
#Place of employment
data[,paste("job_age",age, sep="")] <- ifelse(
data[,paste("age_started_job,spell,sep="")]==age &
!is.na(data[,paste("age_started_job",spell,sep="")]),
data[,paste("job",spell,sep="")],
data[,paste("job_age",age, sep="")])
}
}
Besides looking really ugly with loops and the like, it gives me an error saying that I provided 2 variables to replace 1 variables
.
Is there a good, elegant way to do it, perhaps with reshape2
and plyr
?
You may try
You could also try
data