Submit To Google Or Wikipedia Search Form Using R

1.3k Views Asked by At

I am trying to use R to navigate to a particular Wikipedia page based on a string value. Since I don't have the exact Wikipedia URLs for the list of keywords I am looking up (e.g., "Prog rock" as a search term goes to a URL ending in Progressive_rock), my thought was to pass the keywords to a Google "Feeling Lucky" search, and then scrape the HTML of the resulting Wikipedia page.

In the process of trying this, I found that I was having trouble submitting any form with R. Can anyone post a reproducible example for running a Google query with an R session and returning the HTML of the top page, or a Wikipedia search based on search terms?

I have been using Hadley's excellent rvest package for most of my web scraping, but haven't been able to get this aspect to work even using the example adapted from the manual for rvest:

goog<-html_session("https://www.google.com")

search <- html_form(html("https://www.google.com"))[[1]]

search.mod<-set_values(search, q = "My little pony")

submit_form(goog, search.mod, submit='btnI')

Which returns:

 Error: length(url) == 1 is not TRUE

I tried Wikipedia search directly as well with the same luck:

url<-"http://en.wikipedia.org/wiki/Main_Page"

wiki<-html_session(url)

search.form<-html_form(wiki)[[1]]

form.mod<-set_values(search.form, search="Frank Zappa")

submit_form(wiki,form.mod,submit='go')

Which returns the same error. I suspect I am making some sort of incredibly simple mistake, but I can't figure out what it is.

Many of the examples online to submit search forms appear to use the httr, RCurl, and RSelenium packages, but I haven't found a specific example on Google or Wikipedia that works, and many of the examples appear to be outdated since Google changed the format of their "I'm Feeling Lucky" search. I also looked at the WikipediR package as suggested in a similar question (Sumbit queries on wikipedia through R) but it doesn't appear to have a search function.

1

There are 1 best solutions below

1
On

to submit a search on Wikipedia or google, you don't need the html_form as they both provide a way to pass the query in the url. For instance if you are looking for "apple" in Wikipedia, just type

http://en.wikipedia.org/wiki/Special:Search/apple

This will redirect to the apple page as there is a page with this name.

http://en.wikipedia.org/wiki/Special:Search/Prog_rock

will also find the right page as there is an automatic redirection in place.

If you are looking for "Progressiv Rock", it will not find any page but will display some suggestions, that you can try to parse:

http://en.wikipedia.org/wiki/Special:Search/Progressiv_rock