DataLayer with python

773 Views Asked by At

is it possible to extract value from data layer? This is url = https://www.repubblica.it/cronaca/2021/04/09/news/vaccini_ecco_chi_sono_gli_oltre_due_milioni_di_italiani_che_hanno_ricevuto_una_dose_fuori_dalle_liste_delle_priorita_-295650286/?ref=RHTP-BH-I0-P1-S1-T1

I need to extract "dateModified" from <script type="application/ld+json"

Thank you!

import requests
from bs4 import BeautifulSoup
import json

url='https://www.repubblica.it/cronaca/2021/04/09/news/vaccini_ecco_chi_sono_gli_oltre_due_milioni_di_italiani_che_hanno_ricevuto_una_dose_fuori_dalle_liste_delle_priorita_-295650286/?ref=RHTP-BH-I0-P1-S1-T1'


r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')

script = soup.find_all('script')[0]

print(script.find('dateModified'))
1

There are 1 best solutions below

1
On BEST ANSWER

Yes, you need to use the .string attribute and dump that to json.loads.

Here's how:

import json

import requests
from bs4 import BeautifulSoup

url='https://www.repubblica.it/cronaca/2021/04/09/news/vaccini_ecco_chi_sono_gli_oltre_due_milioni_di_italiani_che_hanno_ricevuto_una_dose_fuori_dalle_liste_delle_priorita_-295650286/?ref=RHTP-BH-I0-P1-S1-T1'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')
print(json.loads(soup.find_all('script')[0].string)["dateModified"])

Output:

2021-04-09T10:26:13Z