Python: Can't get urllib2 to correctly read a webpage

209 Views Asked by At

I am trying to get the full webpage of

'http://www.bloomberg.com/markets/economic-calendar/'

but for some reason I cannot find the function which will return the links as strings. I would like to transform all announcements on that Bloomberg page into a CSV file but I am not sure how. The CSV file would contain things like:

Mon 12.2 Gallup US Consumer Spending Measure [Report][Bullet8:30 AM ET

Ben Bernanke Speaks 8:30 AM ET

PMI Manufacturing Index [Report][djStar]8:58 AM ET

ISM Mfg Index [Report][Star]10:00 AM ET

Construction Spending [Report][djStar]10:00 AM ET

Construction Spending [Report][djStar]10:00 AM ET

4-Week Bill Announcement [Report][Bullet11:00 AM ET

(which was just a copy and paste from the website).

What is the best way or best library to use?

1

There are 1 best solutions below

0
On

Since you asked for a tutorial on web scraping, you should basically look for (in the given order)

  1. Url retrieving (i. e. reading from a web-page given it's url) [refer urllib library]
  2. Html Parsing (making sense of the html & quickly accessing required content) [refer Beautifulsoup v4]
  3. Processing the data acquired & in your case, dumping to a csv file. [refer csv library]