BeautifulSoup and MechanicalSoup won't read website

221 Views Asked by Ben Sorensen At 27 April 2021 at 21:03

I am dealing with BeautifulSoup and also trying it with MechanicalSoup and I have got it to load with other websites, but when I request that the website be requested it takes a long time and then never really gets it. Any ideas would be super helpful.

Here is the BeautifulSoup code that I am writing:

import urllib3
from bs4 import BeautifulSoup as soup

url = 'https://www.apartments.com/apartments/saratoga-springs-ut/1-bedrooms/?bb=hy89sjv-mN24znkgE'

http = urllib3.PoolManager()

r = http.request('GET', url)

Here is the Mechanicalsoup code:

import mechanicalsoup

browser = mechanicalsoup.Browser()

url = 'https://www.apartments.com/apartments/saratoga-springs-ut/1-bedrooms/'
page = browser.get(url)
page

What I am trying to do is gather data on different cities and apartments, so the url will change to have be 2-bedrooms and then 3-bedrooms then it will move to a different city and do the same thing there, so I really need this part to work.

Any help would be appreciated.

Original Q&A

There are 2 best solutions below

Ben Sorensen On 27 April 2021 at 21:26 BEST ANSWER

import urllib3
import requests
from bs4 import BeautifulSoup as soup

headers = requests.utils.default_headers()
headers.update({
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36'
})

url = 'https://www.apartments.com/apartments/saratoga-springs-ut/1-bedrooms/'

r = requests.get(url, headers=headers)

rContent = soup(r.content, 'lxml')

rContent

Just as Tim said, I needed to add headers to my code to ensure that it was being read as not from a bot.

Tim Roberts On 27 April 2021 at 21:10

You see the same thing if you use curl or wget to fetch the page. My guess is they are using browser detection to try to prevent people from stealing their copyrighted information, as you are attempting to do. You can search for the User-Agent header to see how to pretend to be another browser.

BeautifulSoup and MechanicalSoup won't read website

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in WEB-SCRAPING

Related Questions in BEAUTIFULSOUP

Related Questions in MECHANICALSOUP

Trending Questions

Popular # Hahtags

Popular Questions