Web Crawler using R

602 Views Asked by At

I want to build a webcrawler using R program for website "https://www.latlong.net/convert-address-to-lat-long.html", which can visit the website with the parameter for address and then fetch the generated latitude and longitude from the site. And this would repeat for the length of the dataset which I have.

Since I am new to web crawling domain, I would seek guidance.

Thanks in advance.

1

There are 1 best solutions below

0
KamRa On

In the past I have used an API called IP stack (ipstack.com).

Example: a data frame 'd' that contains a column of IP addresses called 'ipAddress'

for(i in 1:nrow(d)){
  #get data from API and save the text to variable 'str'
  lookupPath <- paste("http://api.ipstack.com/", d$ipAddress[i], "?access_key=INSERT YOUR API KEY HERE&format=1", sep = "")
  str <- readLines(lookupPath)

  #save all the data to a file
  f <- file(paste(i, ".txt", sep = ""))
  writeLines(str,f)
  close(f)

  #save data to main data frame 'd' as well:
  d$ipCountry[i]<-str[7]
  print(paste("Successfully saved ip #:", i))
}

In this example, I was specifically after the Country location of each IP, which appears on line 7 of the data returned by the API (hence the str[7])

This API lets you lookup 10,000 addresses per month for free, which was enough for my purposes.