I'm struggling to find out how to create a flag. My data set looks like this:
CPT PRODUCT DATE A B C D etc.
1 A date1 . . . .
1 A date2 . . . .
1 C date2 . . . .
1 B date3 . . . .
1 B date3 . . . .
2 A date3 . . . .
2 B date1 . . . .
2 B date1 . . . .
2 B date2 . . . .
2 C date2 . . . .
etc.
where cpt(i) represents each counterparty, product(i) represents the product bought by each cpt(i), date(i) represents the purchase date and A, B, C, D, etc. are other categorical/numerical variables of the data set. I'd like to create a flag every time if and only if there are two or more rows having the same cpt(i), the same product(i) and the same date(i). Thus, the other columns should not be considered. What I'd like to get is shown below:
CPT PRODUCT DATE A B C D flag
1 A date1 . . . . 0
1 A date2 . . . . 0
1 C date2 . . . . 0
1 B date3 . . . . 1
1 B date3 . . . . 1
2 A date3 . . . . 0
2 B date1 . . . . 1
2 B date1 . . . . 1
2 B date2 . . . . 0
2 C date2 . . . . 0
etc.
Any tips on how to get what I want? Cheers
Use a
BYstatement. The statement will implicitly create automatic flag variablesfirst.<byvar>andlast.<byvar>. When both variables are=1the implied condition is that the by group contains a single row. You are looking to flag the inverse condition (when a by group contains more than one row).