Discretization of zip codes to US regions in R

Question

Discretization of zip codes to US regions in R

232 Views Asked by Michal At 17 August 2025 at 19:51

I would like to discretize data with zip codes into regions

I have character data

sample:

zip_code
'45654'
'12321'
'99453'

etc

I have 6 categories with rules:

region 1 - NE: 01000-19999

region 2 - SE: 20000-39999

region 3 - MW: 40000-58999,60000-69999

region 4 - SW: 70000-79999,85000-88499

region 5 - MT: 59000-59999,80000-84999,88900-89999

region 6 - PC: 90000-99999

I would like my output to be factor data:

region
'MW'
'NE'
'PC'

etc

Obviously, I know many ways to discretize the data, but none are clean and elegant (like loops, ifelse, etc)

Is there an elegant way to apply a case with 6 categories to discretize this data?

Original Q&A

There are 3 best solutions below

akrun On 12 November 2014 at 03:27

You could also try (Using @Scott Chamberlain's data)

  with(stack(regions), unique(ind[ave(values %in% zip_code, ind, FUN=I)]))
 #[1] NE SE MT
 #Levels: MT MW NE PC SE SW

Or

 library(dplyr)
 library(tidyr)
 unnest(regions, region) %>%
                     group_by(region) %>%
                     filter(x %in% zip_code)

 # region x
 #1   NE  1
 #2   SE  6
 #3   MT 15

Or

 r1 <- vapply(regions, function(x) any(x %in% zip_code), logical(1))
 names(r1)[r1]
 #[1] "NE" "SE" "MT"

jlhoward On 12 November 2014 at 06:14

Here is a data.table solution using foverlaps(...) and the full US zip code database in package zipcode for the example. Note that your definitions of the ranges are deficient: for instance there are zip codes in NH that are outside the NE range, and PR is completely missing.

library(data.table)  # 1.9.4+
library(zipcode)
data(zipcode)        # database of US zip codes (a data frame)

zips    <- data.table(zip_code=zipcode$zip)
regions <- data.table(region=c("NE" , "SE", "MW", "MW", "SW", "SW", "MT", "MT", "MT", "PC"),
                      start =c(01000,20000,40000,60000,70000,85000,59000,80000,88900,90000),
                      end   =c(19999,39999,58999,69999,79999,88400,59999,84999,89999,99999))
setkey(regions,start,end)
zips[,c("start","end"):=list(as.integer(zip_code),as.integer(zip_code))]
result <- foverlaps(zips,regions)[,list(zip_code,region)]
result[sample(1:nrow(result),10)]   # random sample of the result
#    zip_code region
#  1:    27113     SE
#  2:    36101     SE
#  3:    55554     MW
#  4:    91801     PC
#  5:    20599     SE
#  6:    90250     PC
#  7:    95329     PC
#  8:    63435     MW
#  9:    60803     MW
# 10:    07040     NE

foverlaps(...) works this way: suppose a data.table x has columns a and b that represent a range (e.g., a <= b for all rows), and a data.table y has columns c and d that similarly represent a range. Then foverlaps(x,y) finds, for each row in x, all the rows in y which have overlapping ranges.

In your case we set up the y argument as the regions, where the ranges are the beginning and ending zipcodes for each (sub) region. Then we set up x as the original zip code database using the actual zip codes (converted to integer) for both the beginning and end of the range.

foverlaps(...) is extremely fast. In this case the full US zip code database (>44,000 zipcodes) was processed in about 23 milliseconds.

**sckott** · Accepted Answer

Okay, messy but this can work. I assume you'll have to use character objects since some zip codes start with 0. Obs. replace these numbers with your zip codes.

zip_code <- c('1','6','15')
regions <- list(NE = as.character(1:3), 
        SE = as.character(4:6), 
        MW = as.character(7:9), 
        SW = as.character(10:12), 
        MT = as.character(13:15), 
        PC = as.character(16:19))
sapply(zip_code, function(x) names(regions[sapply(regions, function(y) x %in% y)]))

 1    6   15 
"NE" "SE" "MT"

Discretization of zip codes to US regions in R

There are 3 best solutions below

Related Questions in R

Related Questions in CASE

Related Questions in GEOCODING

Related Questions in DISCRETIZATION

Trending Questions

Popular # Hahtags

Popular Questions