Spliting the location column into Zipcode, Latitude and Longitude

549 Views Asked by At

I have a column in my data frame which has the zipcode, latitude and Longitude

Location

"10007 (40.71363051943297, -74.00913138370635)"
"10002 (40.71612146793143, -73.98583147024613)"
"10012 (40.72553802086304, -73.99789641059084)"
"10009 (40.72664935898081, -73.97911148500697)"

I need to separate them into three different columns like Zipcode, Latitude and Longitude.

I tried to doing this

extract(Location, c("Zip-Code","Latitude", "Longitude"), "\\(([^,]+), ([^)]+)\\)")

I want to use the latitude and longitude to plot the map using ggmap

Thanks

2

There are 2 best solutions below

0
On BEST ANSWER

Basic regex extraction:

library(purrr)

c("10007 (40.71363051943297, -74.00913138370635)", "10002 (40.71612146793143, -73.98583147024613)",
  "10012 (40.72553802086304, -73.99789641059084)", "10009 (40.72664935898081, -73.97911148500697)") %>%
  stringi::stri_match_all_regex("([[:digit:]]+)[[:space:]]+\\(([[:digit:]\\.\\-]+),[[:space:]]+([[:digit:]\\.\\-]+)\\)") %>%
  map_df(dplyr::as_data_frame) %>%
  dplyr::select(zip=V2, latitude=V3, longitude=V4)
## # A tibble: 4 × 3
##     zip          latitude          longitude
##   <chr>             <chr>              <chr>
## 1 10007 40.71363051943297 -74.00913138370635
## 2 10002 40.71612146793143 -73.98583147024613
## 3 10012 40.72553802086304 -73.99789641059084
## 4 10009 40.72664935898081 -73.97911148500697

More readable:

library(purrr)
library(stringi)
library(dplyr)
library(purrr)

dat <- c("10007 (40.71363051943297, -74.00913138370635)",
         "10002 (40.71612146793143, -73.98583147024613)",
         "10012 (40.72553802086304, -73.99789641059084)", 
         "10009 (40.72664935898081, -73.97911148500697)")

zip <- "([[:digit:]]+)"
num <- "([[:digit:]\\.\\-]+)"
space <- "[[:space:]]+"
lp <- "\\("
rp <- "\\)"
comma <- ","

match_str <- zip %s+% space %s+% lp %s+% num %s+% comma %s+% space %s+% num %s+% rp

dat %>%
  stri_match_all_regex(match_str) %>%
  map_df(as_data_frame) %>%
  select(zip=V2, latitude=V3, longitude=V4)
1
On
s.tmp = "10007 (40.71363051943297, -74.00913138370635)"

For ZIP:

gsub('([0-9]+) .*', '\\1', s.tmp)

For latitude:

gsub('.*\\((.*),.*', '\\1', s.tmp)

For longitude:

gsub('.*, (.*)\\).*', '\\1', s.tmp)