I have been writing a piece of code that will retrieve a list of items and their corresponding prices from the Steam Marketplace (for the game Unturned). I am using BeautifulSoup (bs4) and requests library. This is my code so far:
for page_num in range(1,10):
website = 'http://steamcommunity.com/market/search?appid=304930#p'+str(page_num)+'_popular_desc'
r = requests.get(website)
doc = r.text.split('\n')
soup = BeautifulSoup(''.join(doc), "html.parser")
names = soup.findAll("span", { "class" : "market_listing_item_name" })
for item in range(len(names)):
items.append(names[item].contents[0])
costs = soup.findAll("span", { "class" : "normal_price" })
for cost in range(len(costs)):
prices.append(costs[cost].contents[0])
Expected Output:
Festive Gift Present : $0.32 USD
Halloween Gift Present : $0.26 USD
Carbon Fiber Mystery Box : $0.47 USD
Festive Hat : $1.67 USD
Nuclear Matamorez : $0.39 USD
... and so on
The problem with this code is, it is only getting the names of the first page. If I type the URL manually with different numbers in place of page_num it changes the page, and also the HTML document changes. However, the code doesn't seem to get the results from the second page and so on. requests is fetching the correct URL each time, but the HTML doc returns the same?
Page 2, 3, etc, are requested via
ajax
(or similar), so the source code isn't present when you first load the page. To bypass this we can sniff theajax
url and parse the source directly, in this case,json
encoded, i.e:PS: Steam will temporary ban your ip after ~50 requests