How to get proper encoding using browseURL()?

215 Views Asked by At

I'm basically trying to browse a URL with japanese letters in it. This question builds up on my first question from yesterday. My code now generates the right URL and if I just take the URL and put into my browser I get the right result, but if I try to automate the process by integrating browseURL() I get a wrong result.

E.g. I am trying to call following URL:

http://www.google.com/trends/trendsReport?hl=en-US&q=VWゴルフ %2B VWポロ %2B VWパサート %2B VWティグアン&date=1%2F2010 68m&cmpt=q&content=1&export=1

if I now use

browseURL(http://www.google.com/trends/trendsReport?hl=en-US&q=VWゴルフ %2B VWポロ %2B VWパサート %2B VWティグアン&date=1%2F2010 68m&cmpt=q&content=1&export=1)

I can see in the browser that it browsed

www.google.com/trends/trendsReport?hl=en-US&q=VW%E3%83%BB%EF%BD%BDS%E3%83%BB%EF%BD%BD%E3%83%BB%EF%BD%BD%E3%83%BB%EF%BD%BDt%20%2B%20VW%E3%83%BB%EF%BD%BD%7C%E3%83%BB%EF%BD%BD%E3%83%BB%EF%BD%BD%20%2B%20VW%E3%83%BB%EF%BD%BDp%E3%83%BB%EF%BD%BDT%E3%83%BB%EF%BD%BD[%E3%83%BB%EF%BD%BDg%20%2B%20VW%E3%83%BB%EF%BD%BDe%E3%83%BB%EF%BD%BDB%E3%83%BB%EF%BD%BDO%E3%83%BB%EF%BD%BDA%E3%83%BB%EF%BD%BD%E3%83%BB%EF%BD%BD&date=1%2F2010%2068m&cmpt=q&content=1&export=1

which seems to be an encoding mistake. I already tried

browseURL(URL, encodeIfNeeded=TRUE)

but that doesnt seem to change a thing and as far as I interpret the function it also shouldnt because this function is there to generate those "%B" letters, which makes it even more surprising that I get them even when encodeIfNeeded = FALSE.

Any help is highly appreciated!

> sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 8 (build 9200)

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=Japanese_Japan.932           LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C                    LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] tools_3.2.1
1

There are 1 best solutions below

3
On BEST ANSWER

I think this will get around the issue:

library(httr)
library(curl)

gt_url <- "http://www.google.com/trends/trendsReport?hl=en-US&q=VWゴルフ %2B VWポロ %2B VWパサート %2B VWティグアン&date=1%2F2010 68m&cmpt=q&content=1&export=1"

# ensure the %2B's aren't getting in the way then
# ask httr to carve up the url and put it back together
parts <- parse_url(URLdecode(gt_url))
browseURL(build_url(parts))

That gives this (too long to paste but I want to make sure OP gets to see the whole content).

I also now see why you have to do it this way (both download.file and GET with write_disk don't work due to the javascript redirect).