Here is my BeautifulSoup code:
from bs4 import BeautifulSoup
import requests
html = requests.get("https://vt.tiktok.com/ZSLvos3x2/").text
soup = BeautifulSoup(html, 'html.parser')
image = soup.find("meta", {"property":"og:image"})
print(image)
The result's content is empty:
<meta content="" data-rh="true" property="og:image"/>
However, Facebook's Sharing Debugger can read it:
<meta property="og:image" content="https://p16-sign-va.tiktokcdn.com/tos-maliva-p-0068/e025f28037a84ad4b86d9437ba70ad2d_1683178221~tplv-photomode-video-share-card:1200:630:20.jpeg?x-expires=1695362400&x-signature=MNuRNoO2lAxX61zDfuqG5mKnI74%3D">
Some suggest that this is because it's a JS problem, however this doesn't explain why trying it multiple times will make it success:
try = 1
not_get_data = True
while (try <= 5 and not_get_data):
print('Try:', try)
html = requests.get(url=url).text
soup = BeautifulSoup(html, 'lxml')
Why is that?
The result's content is empty which means this content comes from JavaScript and you have to use Selenium or a similar browser automation Python library. after getting the js content you can parse the HTML using BeautifulSoup then use regex for get og image url
using regex
full code:
also u can use
image = soup.find("meta", {"property":"og:image"})instead of regex