I am just learning and new to scraping Yesterday I was able to scrape craigslist with a beautiful soup. Today I am unable to.
Here is my code to scrape the first page of rental housing search result on CL.
from requests import get
from bs4 import BeautifulSoup
#get the first page of the san diego housing prices
url = 'https://sandiego.craigslist.org/search/apa?hasPic=1&availabilityMode=0&sale_date=all+dates'
response = get(url) # link exlcudes posts with no picures
html_soup = BeautifulSoup(response.text, 'html.parser')
#get the macro-container for the housing posts
posts = html_soup.find_all('li', class_="result-row")
print(type(posts)) #to double check that I got a ResultSet
print(len(posts)) #to double check I got 120 (elements/page)
The html_soup is not the same as it is in the actual url. It actually has the following in there:
<script>
window.cl.specialCurtainMessages = {
unsupportedBrowser: [
"We've detected you are using a browser that is missing critical features.",
"Please visit craigslist from a modern browser."
],
unrecoverableError: [
"There was an error loading the page."
]
};
</script>
Any help would be much appreciated.
I am not sure if I've potentially been 'blocked' somehow from scraping. I read this article about proxies and rotating IP addresses, but I do not want to break rules if I've been blocked, and also do not want to spend money on this. Is it not allowed to scrape craigslist? I have seen so many educational tutorials on it so thought it was okay.