Long/Lat points keep ending up in Kansas - sf, tidycensus

51 Views Asked by At

I am trying to make a point file using longitude and latitudes, and then use st_join with st_within to match them to census tracts. But the points keep ending up in Kansas. If you are using the tidycensus library w/ the API then there is reproducible code:

Code for dummy data points mostly in Colorado, and tract boundaries in Colorado and Kansas:

library(tidycensus)
library(sf)
library(dplyr)
library(tidyverse)

# Set seed for reproducibility
set.seed(42)

# Generate dummy data for points in New York
points <- data.frame(
  longitude = runif(300, min = -109, max = -102),  # Approximate longitude boundaries of Colorado
  latitude = runif(300, min = 36.993076, max = 41)  # Approximate latitude boundaries of Colorado
)

# Print the first few rows of the dummy data
points <- st_as_sf(points, coords = c("longitude", "latitude"), crs = "ESRI:102003")

tract2010 <- get_decennial(geography = "tract", variables = "P001001", year = 2010,
                          state = as.list(c("Colorado", "Kansas")), geometry = TRUE)

tract2010$state_code <- substr(tract2010$GEOID, 1, 2)
table(tract2010$state_code)

# make same CRS
tract2010 <- st_transform(tract2010, st_crs(points))`

Mapped it in leaflet to make sure the points were in the right place:

# test where it is
library(leaflet)
leaflet() %>%
  addTiles() %>%
  addMarkers(data = points)

enter image description here

Ran the join and checked the matches. From the table, all points are in state code 20 (Kansas)

#spatial join
points <- st_join(points, tract2010, join = st_within)
table(points$state_code, useNA = "always")
1

There are 1 best solutions below

4
L Tyrone On BEST ANSWER

You are defining the coordinate system of your points data as ESRI:102003 but the longitude and latitude of your points data are in either WGS84 or NAD83. I have reproduced your entire code for the sake of clarity and annotated the extra step you needed. The following assumes your original points data are NAD83 (EPSG:4269), add the correct EPSG code if this is not correct:

library(tidycensus)
library(sf)
library(dplyr)
library(tidyverse)
library(ggplot2)

# Set seed for reproducibility
set.seed(42)

# Generate dummy data for points 
points <- data.frame(
  longitude = runif(300, min = -109, max = -102),  # Approximate longitude boundaries of Colorado
  latitude = runif(300, min = 36.993076, max = 41)  # Approximate latitude boundaries of Colorado
)

# NAD83 points to ESRI:102003
points <- st_as_sf(points, coords = c("longitude", "latitude")) %>%
  st_set_crs(4269) %>% # This is the bit you missed
  st_transform("ESRI:102003")

# Get census tracts
tract2010 <- get_decennial(geography = "tract", variables = "P001001", year = 2010,
                           state = as.list(c("Colorado", "Kansas")), geometry = TRUE)

# Create new state_code variable
tract2010$state_code <- substr(tract2010$GEOID, 1, 2)

# Transform
tract2010 <- st_transform(tract2010, st_crs(points))

# Spatial join
points <- st_join(points, tract2010_1, join = st_within)

ggplot() +
  geom_sf(data = tract2010) +
  geom_sf(data = points,
          aes(colour = state_code))

result