For web scraping and xml parsing, which is best library to learn

1k Views Asked by Harry Brar At 03 February 2020 at 10:25

I am getting confused with multiple libraries for the same work. I want to learn to one library which will handle both xml and html parsing. Do elementtree is compatible for html parsing. I heard about lxml, xml.elementtree, beautifulsoup, minidom, scrapy. Can anybody help me.

Original Q&A

There are 1 best solutions below

Milos K On 03 February 2020 at 10:44 BEST ANSWER

Scrapy is used for scraping web pages (extracting data from web pages) hence the name.

Beautiful Soup is library for parsing/pulling data from XML and HTML files.

xml.elementtree provides object representation of the XML file and it is a XML processing module of Python XML package. It is neat to use for parsing and manipulating data in XML format.

lxml is as they claim compatible yet superior to elementtree of the Python XML module but essentially does the same however, I never used it for parsing of HTML files.

In my experience I used Scrapy for fetching data from various user panels that did not have any kind of API for pulling the data. However, parsing of HTML files I mostly did with Beautiful Soup as it is really neat and easy to use. Regarding XML parsing I mostly used Python XML package however, I never had any complicated XML parsing to perform so Python XML package covered everything I need.

The right tool really depends on your requirements. If you need library to parse XML and HTML files both I would go with Beautiful Soup as it is really easy to use and you have vast documentation online.

For web scraping and xml parsing, which is best library to learn

There are 1 best solutions below

Related Questions in BEAUTIFULSOUP

Related Questions in SCRAPY

Related Questions in ELEMENTTREE

Related Questions in MINIDOM

Related Questions in CELEMENTTREE

Trending Questions

Popular # Hahtags

Popular Questions