I am trying to get the origin of this product with Beautifulsoup. I am trying to select the div in which the product data is ordered but I can't.
Later I tried to obtain other div in the code, one of the firsts in the code, but had the same problem. Then I ran prettify and the div I am searching didn't even appear. How can I get this data?
Here is the code I tried:
import urllib.request
from bs4 import BeautifulSoup
urlpage = 'https://www.esselungaacasa.it/ecommerce/nav/auth/supermercato/home.html?freevisit=true#!/negozio/prodotto/5397031?productCode=417932&productType=GROCERY&menuItemId=300000000002399'
page = urllib.request.urlopen(urlpage)
soup = BeautifulSoup(page, 'html.parser')
results = soup.findAll('div', attrs={'class': 'dettaglio'})
I wish I could get all the content of that div so later I can scrape the paragraphs inside it (the 'Origine' paragraph, specifically). Thank you!
The page makes requests for that content to this url:
https://www.esselungaacasa.it/ecommerce/resources/auth/displayable/breadcrumbs/300000000002399
This requires headers authentication which seems to comprise as shown below (tested multiple times) . The values are only valid for less than a couple of minutes so you need to see if you can obtain them from a prior request and dynamically update them.
The json contains html which you can extract and parse with BeautifulSoup.
You can see an example of the json response here
Content html is inside the list called
informations. Sample shown: