foreach and local commands

55 Views Asked by At

I am trying to complete a foreach loop with a local string. However, I get local not found error. I want to only keep observations that include the following postal codes.

I am following the local example in the Stata help file and my variable postal code is in string format. Where am I going wrong?

Here is the code:

local postal_cleveland "28086 28150 28152 28021 28090 28114 28073 28020 28017 28151 28169 28089 28038 28042 28136" 

foreach x in local postal_cleveland {

keep if strpos(postal_code, `x')

} 
local not found
r(111);
2

There are 2 best solutions below

4
Nick Cox On BEST ANSWER

I don't know which help file you're alluding to, but there are several errors here, and you have been thrown out by the first. The loop should start

foreach x of local postal_cleveland { 

The keywords of and in are not interchangeable.

What you typed with in is legal as any list can follow in, but when you get inside the loop Stata can only interpret local as the name of a string variable or scalar, but you have none such: hence the error message.

Another error has not yet bit you, but should be fixed now. When you are looking for a literal string inside a string variable, you must use " " to delimit the string; otherwise as just seen Stata can only guess that you are giving it the name of a string variable or scalar. So, you need

keep if strpos(postal_code, "`x'")

Yet another error has not yet bit you, but should be fixed now. As soon as you go

keep if strpos(postal_code, "28086")

Stata will follow your instructions and so observations with all the other postal codes will disappear. Then Stata will keep on going and on next seeing

keep if strpos(postal_code, "28150")

it again will follow your instructions. But there are no such observations: you dropped them by implication on the previous iteration. So, now you have no data whatsoever and cannot then do anything useful.

You could avoid the loop altogether and just go

keep if strpos(postal_code, "28086") | strpos(postal_code, "28150") | ... 

where the ... imply other calls with different codes. That is just my shorthand because I won't type them out here; it is not Stata syntax!

Or you could use a loop and build up to one (and only one) keep statement.

That could be

gen tokeep = 0 

foreach x of local postal_cleveland { 
    replace tokeep = 1 if strpos(postal_code, "`x'")
} 

keep if tokeep 

which I recommend as better style.

Otherwise put, keep acts instantly: it keeps observations satisfying the criteria specified and thus drops all others.

Detail: string here means string storage or variable type; string as a (display) format by coincidence only implies the same variables.

1
SultanOrazbayev On

The comment below will apply only in a very special case when the data can contain zip codes with the additional 4 digits (zip+4) and these additional digits are not separated from the main zip code. This is unlikely to be observed, but possible if for example the data was parsed from a website and the dash symbol was ignored. As such, the comment below should be safe to ignore, but just posting this to elaborate on the thought process.

The idea is that strpos(s1,s2) function returns non-zero if value if s2 appears anywhere within s1. If the variable containing zip codes has zip+4 without dashes (again, this is a very specific scenario, unlikely to be of general interest), then the function would return false matches.

For example:

Zip code 34265 is a location in Florida. Zip code 60103 is a location in Illinois. A more precise zip inside Illinois is 60103-4265 (it's a post office there). If somehow this was recorded as 601034265, then using strpos would indicate a match.

In general, to avoid similar issues with partial matches, for fixed-width data it makes sense to do a literal comparison using == rather than strpos.