Chrome extension - get Html DOM before load js on browser

1.5k Views Asked by At

I'm developing a chrome extension that needs to block the load of the html page, do some validations on the javascript, that cames in the page, in my content script, and proceed(or not) with the loading of the page.

In my manifest with "run_at": "document_start", the content scrip get a empty html and can't do the validation. With run_at at document_end, it alredy executed js that comes in the page, and just after that my extension does the validation of it...

Is there a way to set like a DOMContentBeforeLoad in my content script or something? I'm really out of options..

Thanks

2

There are 2 best solutions below

8
Brian On

I think to do what you are doing you are going to have to do what you did with document_start, then load the html page via an ajax call and parse it yourself.

The browsers typically don't load all the scripts and then execute them, this happens asynchronously in the order of the page, so there isn't a point you can catch it at where the javascript will have loaded but nothing will have executed (unless you control the content of the page as well).

0
Tim Perry On

Take a look at how TopLevel.js works: https://github.com/kristopolous/TopLevel (interesting source at https://github.com/kristopolous/TopLevel/blob/master/toplevel.js)

It's a library you explicitly include in your page. When it's reached in the page and run it immediately document.write()'s a <plaintext> element with style='display: none', which immediately stops the browser parsing the rest of the page at all, and hides the plain text result (plaintext is a deprected element that stops interpreting page content, and treats all the HTML as vanilla unparsed plain text: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/plaintext).

Toplevel then parses the text content of the <plaintext> element itself (and does some templating, which is the point of the library), and document.write()'s the resulting new content to the page by hand.

You should be able to do something similar: inject a <plaintext> element to stop the page being parsed by the browser, parse it yourself (or do whatever you want with it), and then potentially write out whatever you like (including the original content) to the page once you're happy.