Extract HTML content from GWT page

516 Views Asked by At

I want to parse the content of an HTML page written in GWT. I tried it to parse it using Jericho HTML content parser but the problem is that the page source does not have content. After doing some research on GWT pages, i came to know that GWT pages are written in JAVA and GWT compiler creates a complex structure of js pages from java code to display the HTML content on browser.

is there a way i can parse these type of pages?

2

There are 2 best solutions below

0
On

If the code is compiled in OBF - Obfuscated mode (code is usually compiled in this mode for production use) it will be VERY difficult, as JS files created are non-human readable.

This link might be helpful to make you understand GWT Compiler better.

EDIT:

Here you go. This might also be helpful. It is mentioned here how to De-obfuscate the Javascript.

EDIT2:

GWT-Penetration-Testing-Toolset - Check this tool.

7
On

Just like with (m)any "single-page web app" (including e.g. Twitter, which is not built with GWT), you have to run the JavaScript code and then scrape the DOM.

This can be easily (everything's relative) done using HtmlUnit, PhantomJS or similar tools.