Issues geocoding with R and Google Maps

2.5k Views Asked by At

I've been running the following excellent code from...

https://www.shanelynn.ie/massive-geocoding-with-r-and-google-maps/

It works like a dream, but... randomly it stops mid process and throws an error. This happens at different points using the same data set. I've taken one of the addresses that threw an error and manually ran it through the code and it worked fine?? I think it might be a server or time out issue that is causing this. Has anyone else used this code and have you had similar issues? Did you find a solution?

The error always looks something like...

contacting http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false...Information from URL : http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false
Error in geo_reply$status : $ operator is invalid for atomic vectors
In addition: Warning messages:
1: In readLines(connect, warn = FALSE) :
  cannot open URL 'http://maps.googleapis.com/maps/api/geocode/json?address=NICHOLS,%20ACT,%202613,%20AUSTRALIA&sensor=false': HTTP status was '500 Internal Server Error'
2: In geocode(address, output = "all", messaging = TRUE, override_limit = TRUE) :
geocoding failed for "NICHOLS, ACT, 2613, AUSTRALIA".
if accompanied by 500 Internal Server Error with using dsk, try google.

My address is in a data table like (around 2,000 records)...

| MAIL_STATE | MAIL_SUBBURB | MAIL_POSTCODE | | ---------- | ------------ | ------------- | | ACT | NICHOLLS | 2613 |

addresses is created by using the following code...

addresses = paste0(data$MAIL_SUBURB,", ",data$MAIL_STATE,", ",data$MAIL_POSTCODE,", AUSTRALIA", sep = "")

The full code which utilises addressses is below...

#define a function that will process googles server responses for us.
getGeoDetails <- function(address){   
#use the gecode function to query google servers
geo_reply = geocode(address, output='all', messaging=TRUE, override_limit=TRUE)
#now extract the bits that we need from the returned list
answer <- data.frame(lat=NA, long=NA, accuracy=NA, formatted_address=NA, address_type=NA, status=NA)
answer$status <- geo_reply$status

#if we are over the query limit - want to pause for an hour
while(geo_reply$status == "OVER_QUERY_LIMIT"){
print("OVER QUERY LIMIT - Pausing for 24 hours at:") 
time <- Sys.time()
print(as.character(time))
Sys.sleep(60*60*24)
geo_reply = geocode(address, output='all', messaging=TRUE, override_limit=TRUE)
answer$status <- geo_reply$status
}

#return Na's if we didn't get a match:
if (geo_reply$status != "OK"){
return(answer)
}   
#else, extract what we need from the Google server reply into a dataframe:
answer$lat <- geo_reply$results[[1]]$geometry$location$lat
answer$long <- geo_reply$results[[1]]$geometry$location$lng   
if (length(geo_reply$results[[1]]$types) > 0){
answer$accuracy <- geo_reply$results[[1]]$types[[1]]
}
answer$address_type <- paste(geo_reply$results[[1]]$types, collapse=',')
answer$formatted_address <- geo_reply$results[[1]]$formatted_address

return(answer)
}

#initialise a dataframe to hold the results
geocoded <- data.frame()
# find out where to start in the address list (if the script was interrupted before):
startindex <- 1
#if a temp file exists - load it up and count the rows!
tempfilename <- paste0(infile, '_temp_geocoded.rds')
if (file.exists(tempfilename)){
print("Found temp file - resuming from index:")
geocoded <- readRDS(tempfilename)
startindex <- nrow(geocoded)
print(startindex)
}



# Start the geocoding process - address by address. geocode() function takes care of query speed limit.
for (ii in seq(startindex, length(addresses))){
print(paste("Working on index", ii, "of", length(addresses)))
#query the google geocoder - this will pause here if we are over the limit.
result = getGeoDetails(addresses[ii]) 
print(result$status)     
result$index <- ii
#append the answer to the results file.
geocoded <- rbind(geocoded, result)
#save temporary results as we are going along
saveRDS(geocoded, tempfilename)
}
1

There are 1 best solutions below

5
On

Personally, I like this version.

# Geocoding a csv column of "addresses" in R

#load ggmap
library(ggmap)

# Select the file from the file chooser
fileToLoad <- file.choose(new = TRUE)

# Read in the CSV data and store it in a variable 
origAddress <- read.csv(fileToLoad, stringsAsFactors = FALSE)

# Initialize the data frame
geocoded <- data.frame(stringsAsFactors = FALSE)

# Loop through the addresses to get the latitude and longitude of each address and add it to the
# origAddress data frame in new columns lat and lon
for(i in 1:nrow(origAddress))
{
  # Print("Working...")
  result <- geocode(origAddress$addresses[i], output = "latlona", source = "google")
  origAddress$lon[i] <- as.numeric(result[1])
  origAddress$lat[i] <- as.numeric(result[2])
  origAddress$geoAddress[i] <- as.character(result[3])
}
# Write a CSV file containing origAddress to the working directory
write.csv(origAddress, "geocoded.csv", row.names=FALSE)

enter image description here