Extract centroid latitude and longitude for each county in the U.S. using R

712 Views Asked by At

How can I get a data frame listing each state and county in the U.S. along with the latitude and longitude of the center of each county?

A version of this question has been asked many times over the past 10-15 years and many of the answers use packages or functions that are no longer available (see example here using the defunct R package GISTools).

I have tried using a more recent answer by @sebdalgarno from the link above that used the state of North Carolina as an example.

library(sf)
library(ggplot2)
library(regos) # not in the original answer but needed for this example

# I transform to utm because st_centroid is not recommended for use on long/lat 
nc <- st_read(system.file('shape/nc.shp', package = "sf")) %>% 
  st_transform(32617)

# using rgeos
sp_cent <- gCentroid(as(nc, "Spatial"), byid = TRUE)

# using sf
sf_cent <- st_centroid(nc)

# plot both together to confirm that they are equivalent
ggplot() + 
  geom_sf(data = nc, fill = 'white') +
  geom_sf(data = sp_cent %>% st_as_sf, color = 'blue') + 
  geom_sf(data = sf_cent, color = 'red') 

However, the problem with the answer above is two-fold: 1) it does not work with other states (I tried Virginia by replacing nc with va, see below) and 2) the example is only for one state and I want all 50.

# I transform to utm because st_centroid is not recommended for use on long/lat 

va <- st_read(system.file('shape/va.shp', package = "sf")) %>% 
  st_transform(32617)

# Error: `dsn` must point to a source, not an empty string.

I would like a data frame that has four columns which include state name, county name, latitude, and longitude of the center of each county.

2

There are 2 best solutions below

0
tassones On BEST ANSWER

In case anyone else comes across this question @JindraLacko answer was very helpful. Below is some code to answer the specific question regarding the requested data frame.

library(tidyverse) # to extract lat and long from a geometry object
library(dplyr) # data wrangling
library(sf) # for getting centroids
library(tigris) # for getting county shapefile
library(cdlTools) # converting state fips to state name

# includes all US territories

centroids <- counties(resolution = "20m") %>% 
  st_centroid()

# convert state fips to state name

centroids$STATE_NAME <- fips(centroids$STATEFP, to = 'Name')

# isolating the lower 48 states and Washington DC

centroids$STATEFP <- as.numeric(centroids$STATEFP)

lower48_cent <- centroids[centroids$STATEFP %in% c(1,3:14,16:56),]

# Extracting centroid latitude and longitude from the geometry column

lower48_cent <- lower48_cent %>%
  mutate(lat = unlist(map(lower48_cent$geometry,2)),
         lon = unlist(map(lower48_cent$geometry,1)))

# subsetting down to the final data frame of state name, county, lat and long centroids

lower48_cent <- as.data.frame(lower48_cent[,c(19,6,20,21)])
lower48_cent <- lower48_cent[,1:4]

View(lower48_cent)
0
Jindra Lacko On

The "gold standard" package of working with US admin are data is {tigris}, although you may also consider the closely related package {tidycensus}.

To get the results you asked for consider this piece of code:

library(dplyr)
library(sf)

centroids <- tigris::counties(resolution = "20m") %>% 
  st_centroid()

plot(st_geometry(centroids), pch = 4)

us counties centroids

And for further reading on the matter consider this excellent resource by the package author: https://walker-data.com/census-r/