I'm trying to copy a website for offline view without any dependencies.
I want to copy the HTML without SCRIPT tags (javascript specifically) and without external scripts (.js).
Been trying to do that with WGET --ignore-tags
and HTTrack and it didn't work as expected. Scripts are copied as a whole.
Calling Chrome in Headless mode e.g.
chrome --headless --disable-gpu --dump-dom https://www.chromestatus.com/
will dump the HTML of a rendered DOM without any JavaScript.This post describes how a crawler was built using Headless Chrome and Puppeteer.