I am trying to scrap news article listed in this url, all article are in span.Card-title. But this gives blank output. Is there any to resolve this?
from bs4 import BeautifulSoup as soup
import requests
cnbc_url = "https://www.cnbc.com/search/?query=green%20hydrogen&qsearchterm=green%20hydrogen"
html = requests.get(cnbc_url)
bsobj = soup(html.content,'html.parser')
day = bsobj.find(id="root")
print(day.find_all('span',class_='Card-title'))
for link in bsobj.find_all('span',class_='Card-title'):
print('Headlines : {}'.format(link.text))
The problem is that content is not present on page when it loads initially, only afterwards is it fetched from server using url like this
https://api.queryly.com/cnbc/json.aspx?queryly_key=31a35d40a9a64ab3&query=green%20hydrogen&endindex=0&batchsize=10&callback=&showfaceted=false&timezoneoffset=-240&facetedfields=formats&facetedkey=formats%7C&facetedvalue=!Press%20Release%7C&needtoptickers=1&additionalindexes=4cd6f71fbf22424d,937d600b0d0d4e23,3bfbe40caee7443e,626fdfcd96444f28
and added to page.
Take a look at /json.aspx endpoint in devtools, data seems to be there.