What is the correct syntax for a "term1 OR term2" query for a custom search engine

100 Views Asked by At

The situation:

I am using a google custom search engine, google cloud platform, and R to search the number of webpages a given term appears in. Because of the existence of synonyms for a given term, I'm trying to obtain the number of webpages a given term or its synonym(s) appear in ("or" is used in its non exclusive meaning here).

The problem:

I tried to formulate the query in several different ways, that all produced incoherent results (i.e. number of webpages with "term1" > number of webpages with '"term1" or "term2"').

Here are the formulations I tried (here term1 = Alsophis antiguae, term2 = Alsophis leucomelas):

URL_1 <- paste0(URL, key, "&cx=", cx, "&q=\'", 
                URLencode("Alsophis antiguae | Alsophis leucomelas") , "\'")

URL_2 <- paste0(URL, key, "&cx=", cx, "&q=\'", 
                URLencode("Alsophis antiguae OR Alsophis leucomelas") , "\'")

URL_3 <- paste0(URL, key, "&cx=", cx, "&q=\'", 
                "Alsophis%20antiguae", "OR", 
                "Alsophis%20leucomelas", "\'")

URL_4 <- paste0(URL, key, "&cx=", cx, "&q=\'", 
                "Alsophis%20antiguae", "%20OR%20", 
                "Alsophis%20leucomelas", "\'")

URL_5 <- paste0(URL, key, "&cx=", cx, "&q=\'", 
                "Alsophis%20antiguae", "\'", "OR", 
                "\'", "Alsophis%20leucomelas", "\'")

After generating a URL, I run the following line:

js <- fromJSON(base::url(URL_1))

The question:

What is the correct syntax to search for '"term1" OR "term2"'. Could you please provide the query chunk of a URL as an example (e.g. "&q='Alsophis%20antiguae'")?

Many thanks in advance

1

There are 1 best solutions below

0
On

In case anybody is interested, the correct syntax is \"term1\"+OR\"term2\". This syntax also seems to work with AND.

The double quotation marks work fine whereas single quotation marks seem to produce erroneous results. So single quotation marks should be used to delimit the character string, not within the query.

The following code produces a working query

term1 <- "aaaa"
term2 <- "bbbb"
dummy.query <- paste0("\"", term1, "\"", 
                      "+OR", # works with "+AND"
                      "\"", term2, "\"") 

Then, if key is your API key and cx the ID of your custom google search engine, the complete working URL is:

URL = paste0("https://www.googleapis.com/customsearch/v1?key=", key, 
             "&cx=", cx, 
             "&q=", dummy.query)