I am using population-level American Community Survey data to look at factors that affect income from self-employment, with a primary interest in the female population. I want to create a variable to measure "husband's income." There is a variable pincp
that measures a person's total income, and dummy variables I created for married
and female
. All households are linked by a unique identifier serialno
. I am using Stata.
Universe: population age 18 and older whose primary job is self-employment. Must have earned at least $1000 from self-employment in past year, and under the 95th percentile for self-employed earnings.
Assuming that a married male in a household represents a husband**,
gen husb_income = pincp if female==0 & married==1
How do I copy the value of husb_income
for other observations with the same serialno
? If there is an (employed) married man in a household, I want husb_income
to reflect his income for all observations pertaining to that household.
** I know that this is a gratuitous assumption; I'm not concerned with that right now.
Keep the cases that are of the married male, drop all variables except
serialno
andpincp
. Renamepincp
tohusb_income
. Save it as a separate data set.Now, open the original data set, use
merge
command to merge the husband data back:Also, you may have more than 2 married males in the same household. If that happens, the above command will not work because it will become a many-to-many merging. In that case, you'd have to generate an extra couple indicator and incorporate that into the
merge
statement as an identifier right next toserialno
.