I am using population-level American Community Survey data to look at factors that affect income from self-employment, with a primary interest in the female population. I want to create a variable to measure "husband's income." There is a variable pincp that measures a person's total income, and dummy variables I created for married and female. All households are linked by a unique identifier serialno. I am using Stata.
Universe: population age 18 and older whose primary job is self-employment. Must have earned at least $1000 from self-employment in past year, and under the 95th percentile for self-employed earnings.
Assuming that a married male in a household represents a husband**,
gen husb_income = pincp if female==0 & married==1
How do I copy the value of husb_income for other observations with the same serialno? If there is an (employed) married man in a household, I want husb_income to reflect his income for all observations pertaining to that household.
** I know that this is a gratuitous assumption; I'm not concerned with that right now.
Keep the cases that are of the married male, drop all variables except
serialnoandpincp. Renamepincptohusb_income. Save it as a separate data set.Now, open the original data set, use
mergecommand to merge the husband data back:Also, you may have more than 2 married males in the same household. If that happens, the above command will not work because it will become a many-to-many merging. In that case, you'd have to generate an extra couple indicator and incorporate that into the
mergestatement as an identifier right next toserialno.