Is it easier to scrape the AMP versions of webpages?

391 Views Asked by Guy4444 At 08 April 2019 at 00:47

I'm working on a web-scraper that aggregates newspaper articles. I know AMP protocol mandates a stripped-down version of Javascript, and I also know that Javascript (in part) enables website administrators to detect/prevent scraping. So logically, I figured it would be easier to scrape AMP websites. However, one the other hand, if this is true, I presume StackOverflow would be on top of it, but I haven't found a single thread reaffirming my inference. Am I correct or am I overlooking something?

Original Q&A

There are 1 best solutions below

Haddock-san On 08 April 2019 at 20:16

I would say that AMP pages are definitely easier to scrape due to the fact that there is virtually no custom JS code. Many sites insert content with JS or AJAX. AMP limits the amount of libraries you can use and thus has less amount of them compared to a regular site.

Furthermore, if you want to scrape content written in JavaScript, you should can Selenium. If not, PHP is the way to go (IMHO) or BeautifulSoup in Python.

Happy scraping!

Is it easier to scrape the AMP versions of webpages?

There are 1 best solutions below

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Related Questions in WEB-CRAWLER

Related Questions in AMP-HTML

Related Questions in WEB-MINING

Trending Questions

Popular # Hahtags

Popular Questions