I have a survey dataset which contains household ids and individual ids within each household: individual 1 represents the interviewee him/herself. Some variable represents each individual's relationship to the interviewee (for example, 2 for spouse, 3 for parents and so on), the data structure like the following
???
Now what I want to do is detect the occurrence of certain values in var1
and, if it occurs, whether the values of var1
and var2
satisfy a certain condition.
For example, if var1
and var2
satisfy
(var1 == 3 & var2 == 1) | (var1 == 4 & var2 == 1)
then I can attach value 1 to a new generated variable, say var3
, for each individual in the same group (household in this case, to represent family structure) and 0 otherwise.
It seems not a big problem, and I suppose I should employ some
by group: egen
or
by group: gen
command, but I'm not sure. I used to apply commands like
gen l_w_p = 0
by hhid: replace l_w_p = 1 if (var1 == 3 & a2004 == 1) | (var2 == 4 & a2004 == 1)
by hhid: replace l_w_p = 2 if (var1 == 3 & a2004 == 2) & (var2 == 4 & a2004 == 2)
but it seems it doesn't work. Does that need some kind of loop?
@Dimitriy V. Masterov provided a good specific answer, but there is scope to address the question more generally.
As his answer shows,
egen
'smax()
function over groups to a true-or-false expression yielding 0 or 1, namely an indicator (or in a poor terminology popular in some fields, a dummy).A little thought shows that
egen
'smin()
function over groups to a true-or-false expression yielding 0 or 1, etc.The whole story is fleshed out in an FAQ How do I create a variable recording whether any members of a group (or all members of a group) possess some characteristic? (so a meta-lesson is to make use of the resources available to you).
One step away are problems about the other members of a group, also discussed in an FAQ How do I create variables summarizing for each individual properties of the other members of a group?
For fuller discussions that may be useful, see this article and this article
Two further comments:
a. In code like this
the
by:
prefix makes no difference to what is done. The code still works at individual level, and the prefix doesn't spread the operation to the group. That is why it "doesn't work", normally a fairly useless error report.b. Mild abstraction is useful in explaining problems, but abstraction in naming variables just makes your code more difficult to read. I wouldn't use variable names such as
var1
,var2
, etc., which just impose a burden of remembering what is what. Use evocative names such asany_unemployed
orany_married
or whatever. This is more than personal style, as when you are asking others to think about your code (as here), being able to read it easily is a great help.