I am trying to web scrape the data (price + brand) from this website. The code actually works but I can only see the data on my sublime text editor and cannot convert it into a CSV file. Additionally, I get this error message:
AttributeError: 'NoneType' object has no attribute 'div'
Here is my code:
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
PATH = "/Users/Ziye/Desktop/Python/chromedriver"
def get_html(url):
driver = webdriver.Chrome(PATH)
driver.get(url)
return driver.page_source
def main ():
rows = []
url = "https://www.yoox.com/de/damen/kleidung/shoponline/michael%20kors_md#/Md=403&d=10321&dept=clothingwomen&gender=D&page=2&season=X"
html = get_html(url)
soup = BeautifulSoup(html, "lxml")
cards = soup.find_all("div", {"class": "col-8-24"})
print(len(cards))
for card in cards:
print(card.find(class_="itemData text-center").div.get_text())
print(card.find(class_="price").get_text())
row = {'Brand': card.find(class_="brand font-bold text-uppercase").get_text(),
'Price': card.find(class_="price").get_text()}
rows.append(row)
df = pd.DataFrame(rows)
df.to_csv('file.csv', index=False)
if __name__ == "__main__":
main()
To get items from second page, replace
#
in URL for?
:Prints:
And creates
data.csv
(screenshot from LibreOffice):EDIT: To get oldprice, newprice and fullprice into separate columns:
Prints: