I built this web scraper to scrape steam for item names and their lowest price. It works, but it only scrapes the first page (which is fine, I'm working on that). But what is interesting is that when I change the url http://steamcommunity.com/market/search?q= to be http://steamcommunity.com/market/search?q=#p2 (which is the url for the second page of items), I get the exact same output, which is the items from the first page. Any help would be appreciated.
Here is the full code:
import urllib2
from bs4 import BeautifulSoup
page_num = 1
url = 'http://steamcommunity.com/market/search?q='
open_url = urllib2.urlopen(url).read()
market_page = BeautifulSoup(open_url)
for i in market_page('div', {'class' : 'market_listing_row market_recent_listing_row market_listing_searchresult'}):
item_name = i.find_all('span', {'class' : 'market_listing_item_name'})[0].get_text()
price = i.find_all('span')[1].get_text()
page_num += 1
print item_name + ' costs ' + price
You should inspect the REST calls that the page makes using Chrome or Firefox. It looks like the correct end points and parameters are something like this:
http://steamcommunity.com/market/search?query=&start=10&count=10&search_descriptions=0&sort_column=quantity&sort_dir=desc