How to webscrape site with request payload?

419 Views Asked by At

If you open up this webpage, there's a green "export" button: http://mics.unicef.org/surveys

If you click it in a web browser, the file surveys_catalogue.csv begins downloading. My goal is to replicate this download (of the full, unfiltered csv file) within R.

When I inspect element in chrome, it looks like this page has a request payload header, which I can't figure out how to implement within R?

1

There are 1 best solutions below

0
On

You might be better off with:

library(jsonlite)
library(tibble)
library(dplyr)

res <- GET("http://mics.unicef.org/api/survey")

content(res, as="text") %>%
  fromJSON(flatten=TRUE) %>%
  as_tibble() %>%
  glimpse()
## Observations: 312
## Variables: 11
## $ round             <chr> "MICS1", "MICS1", "MICS1", "MICS1", "MICS1",...
## $ region            <chr> "Central and Eastern Europe and the Commonwe...
## $ country           <chr> "Croatia", "Kyrgyzstan", "Turkey", "Turkmeni...
## $ country_in_filter <chr> "Croatia", "Kyrgyzstan", "Turkey", "Turkmeni...
## $ year              <chr> "1996", "1995", "1995", "1995", "1996", "199...
## $ status            <chr> "Completed", "Completed", "Completed", "Comp...
## $ reports           <list> [<Final, https://mics-surveys-prod.s3.amazo...
## $ archive           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ extra_info        <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ...
## $ dataset.status    <chr> "Not available", "Not available", "Not avail...
## $ dataset.url       <chr> "", "", "", "", "", "", "", "", "", "", "", ...

Same data but also more data.

That URL is what's used to build the top filter row (the site makes a few XHR requests to build the table and the filter row). The CSV "export" is an extra step you really don't need since you can grab the XHR URL directly (as I did here).