I want to crawl the link: http://data.eastmoney.com/hsgt/index.html
But I found the XHR documents are all without data, but EventSteam, so how can I crawl the complete information of the page.
For example, I want to crawl -94.67亿元 on the page.
my code is below:
import requests
import pandas as pd
from pyquery import PyQuery
from lxml import etree
import time
response = requests.get(url='http://data.eastmoney.com/hsgt/index.html',
headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'})
response.encoding = 'GB2312'
# this shows False
'-94.67' in response.text
I then try to install dryscape but failed, it said I have no web server file.
Many thanks for the help.
As you mention the XHR requests, managed by the javascript running in the client, aren't being executed. This is down to the fact that the requests` package doesn't execute javascript and isn't trying to mimic a web browser. You should look into an alternative approach. There are quite a lot. You have many options, and I'd suggest you reading pages like the following for more context on the problem.
And, additionally, maybe look at something like dryscrape. I haven't used it myself, by it seems like something akin to this
is what you are after. Have fun.