I want to download a webpage, say http://www.stackoverflow.com with nodejs. Meaning that I have an offline copy of the static page. It has to download the resources (like styles, javascript files, images etc) and update the references to local ones.
In any case I want an offline page that once opened looks exactly like the real page. Just like what happens when I choose file->save in a web browser.
Basically I want to replicate the function of
wget --page-requisites
(Although this does not download css and images properly)
The background is that I want to execute Javascript on an external website. This is (rightly) not possible due to cross-domain-policies. To avoid this, I just want to download the website and statically host it myself, execute my Javascript analysis-code and then delete it.
I'm sort of spit-balling on a solution that could work for this:
A package like js dom could be used to grab all the page's script, link, img, etc's source URLs. You could then GET and save each of those resources to your local environment and replace their src attributes with a new URL that points to your local copy. Then you could stringify the resulting HTML and save that as well. Then just serve the containing directory statically in Node.
Maybe just running
wget --page-requisites
from within node is the easiest solution?I'll be interested to know what the final solution to this is. Hopefully something I said helps.