How do I scrape the data from the Google Docs table on this web page?

2.8k Views Asked by At

I'm trying to use Python to scrape the data from the table on this web page.

http://www.dividendyieldhunter.com/exchanged-traded-debt-issues-sorted-alphabetically/

I tried using requests and bs4. I get the raw HTML but it looks like the data is hidden. What should I be trying ?

1

There are 1 best solutions below

6
On BEST ANSWER

That particular page is loading the data from a URL in an iFrame in this code:

<iframe id="pageswitcher-content" frameborder="0" marginheight="0" marginwidth="0" src="https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&amp;gid=0" style="display: block; width: 100%; height: 100%;"></iframe>

You would need to further request the HTML from the URL in the src attribute at:

https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&amp;gid=0

Then you could scrape the table with the class="waffle".

NOTE: Take care with the URL query parameters that come from the raw URL as in the example below.

For example the &amp; near the end must be converted to a single & character for the requests module to find the proper URL, e.g.

import requests
res=requests.get("https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&gid=0")
print(res.text)