How do I get complete web page code at once?

187 Views Asked by At

I am trying to crawl the commit page of Github to do some analysis. The page is here

YARN-8569

However, there are two tags called "js-diff-progressive-container" and each one has many child tags. See below

html page snapshot

When I use urllib2.Request() and urllib2.urlopen() to get html page and use beautifulsoup to parse the html code, it seems that I can only get the first "js-diff-progressive-container" tag and its child tag. For the second one I will get a tag which class is "js-diff-progressive-retry". The parsing code is here:

for tag in soup.find_all('div', class_='js-diff-progressive-container'):
    print 1
    for div in tag.find_all('div'):
        id = div.get('id')
        if id:
            id = id.split('-')
            print id
            if id[0] == 'diff':
                div2 = div.find_all('div')
                class_div = div2[0]
                if class_div.get('data-path'):
                    changed_class.append(class_div.get('data-path'))

Someone told me that I cannot get all the html code at once since this tag is loaded dynamically. How can I get the whole html page code?

0

There are 0 best solutions below