Background
The UN Secretary-General and other organs issue hundreds of reports to the General Assembly each year, and there is no unified list of these reports, like there are for other documents. There is, however, a simplified url for reading these reports using their document codes http://undocs.org/[document code]
with the document codes having the format A/[Session]/[Document Number]
. An example document code would be "A/71/1" and the url for accessing it would be "https://undocs.org/A/71/1".
I'm trying to download all of these documents for the past 15 years, but instead of manually typing in each of these, I'd like to set up a Google Apps Script to do it for me.
Problem
When I try to use the simple method UrlFetchApp.fetch("http://undocs.org/A/71/1");
for example, it fetches an error page saying that I am using an unauthorized method of accessing the page. This is the same page that shows up if you block cookies or sometimes when you try to access the page in an incognito window.
Now, I'm not looking to hack into the UN, but simply to download some PDFs that are up for public access. I need to figure out what sort of parameters I need to pass with the .fetch()
method for the request to be authorized by the page.
Note: I scoured the undocs.org site looking for any guidance, and I found none.
tl;dr
Trying to access United Nations Official Document System using the UrlFetchApp from Google Apps Script, but I can't figure out how to get the request to be authorized.
Short answer - I don't think you'll be able to get it with a one-line
fetch
.If you look at the HTML returned when you fetch
https://undocs.org/A/71/1
, you'll see that it embeds a frame that gets its content fromhttps://daccess-ods.un.org/access.nsf/Get?OpenAgent&DS=A/71/1&Lang=E
. Then, if you look at the HTML returned by that frame, you'll see two things:https://documents-dds-ny.un.org/prod/ods_mother.nsf?Login&Username=freeods2&Password=1234
https://documents-dds-ny.un.org/doc/UNDOC/GEN/N16/206/02/PDF/N1620602.pdf?OpenElement
I presume that the first link sets a cookie indicating that the login has occurred, which the second link then verifies before returning the content.
Things you could try:
A multi-step
fetch
, where you first get the content fromundocs.org
, parse it to get the link to the actual PDF, then login and fetch the PDF. Google Apps Script would have to persist cookies between fetches though.Write your script in different tool (such as Python).
Use a spider/crawler tool to navigate the UN site as if it was a real human.