I have a strange problem with my script which extracts some dates from a webpage.
Here is the script :
# import library
import json
import re
import requests
from bs4 import BeautifulSoup
import datetime
# Request to website and dowload HTML contents
url = 'https://www.coteur.com/cotes-basket.php'
#page = requests.get(url)
response = requests.get(url)
#soup = BeautifulSoup(page.text, 'html.parser')
soup = BeautifulSoup(response.text, 'html.parser')
s = soup.find("table", id="mediaTable").find_all('script', type='application/ld+json')
date_hour = [json.loads(re.search(r'>(.+)<', str(j), re.S).group(1))["startDate"] for j in s]
#print(date_hour)
date_hour = [sub.replace("T", " ") for sub in date_hour]
print(len(date_hour))
print(date_hour)
This code is functional. It returns the startDate
element inside tag script.
But there is a doubling with one date (in the webpage, I count 24 basket events but my list is length 25). In the webpage you can see 3 events which start at 00:00
but my script extract 4 dates with 00:00
Maybe you have a idea why the site does not display these extra entries?
It does not display where there are no odds. This is due to a script which runs and removes those were no odds from view. I think currently that is script identified by
script:nth-child(25)
, which starts$(document).on('click'
.... This has a test on odds.length and if 0 there is row removal.You can test by disabling javascript and reload page. You will get same result as your python request (where js doesn't run). The row is present. Re-enable js and the row will disappear.
You can view whether there are odds by going Recontres (main table) for a given match > Cotes (also see Prognostics). If you do this with js disabled you can follow the Recontres links for all matches and see whether there are odds. In prognostics there should be odds based calculations that aren't both 0.
There is no way, from the response you get with
requests
to distinguish the row(s) that will be missing on the page. I am not sure you can even make additional requests to check the odds info as it is missing for all without js. You would likely need to switch to selenium/browser automation. You then wouldn't really need BeautifulSoup at all.There is a small outside chance you might find an API/other site that pulls the same odds and you could cross-reference.