BeautifulSoup and MechanicalSoup won't read website

203 Views Asked by At

I am dealing with BeautifulSoup and also trying it with MechanicalSoup and I have got it to load with other websites, but when I request that the website be requested it takes a long time and then never really gets it. Any ideas would be super helpful.

Here is the BeautifulSoup code that I am writing:

import urllib3
from bs4 import BeautifulSoup as soup

url = 'https://www.apartments.com/apartments/saratoga-springs-ut/1-bedrooms/?bb=hy89sjv-mN24znkgE'

http = urllib3.PoolManager()

r = http.request('GET', url)

Here is the Mechanicalsoup code:

import mechanicalsoup

browser = mechanicalsoup.Browser()

url = 'https://www.apartments.com/apartments/saratoga-springs-ut/1-bedrooms/'
page = browser.get(url)
page

What I am trying to do is gather data on different cities and apartments, so the url will change to have be 2-bedrooms and then 3-bedrooms then it will move to a different city and do the same thing there, so I really need this part to work.

Any help would be appreciated.

2

There are 2 best solutions below

0
On BEST ANSWER
import urllib3
import requests
from bs4 import BeautifulSoup as soup

headers = requests.utils.default_headers()
headers.update({
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'
})

url = 'https://www.apartments.com/apartments/saratoga-springs-ut/1-bedrooms/'

r = requests.get(url, headers=headers)

rContent = soup(r.content, 'lxml')

rContent

Just as Tim said, I needed to add headers to my code to ensure that it was being read as not from a bot.

0
On

You see the same thing if you use curl or wget to fetch the page. My guess is they are using browser detection to try to prevent people from stealing their copyrighted information, as you are attempting to do. You can search for the User-Agent header to see how to pretend to be another browser.