I am trying to extract some links and text in a .json file from a web-page.
I have parsed the HTML tbody > tr > td, and each td contains <a href="TextWithUrlBehind">Something</a>
But this TextWithUrlBehind in Inspect Element is clickable, it has a link attached to it.
It is not a well-known <a href=https//...>
So, my extraction of href is str: TextWithUrlBehind, then text(also str):Something in the .json file
The code looks like this:
rows = test_results_table.find_all("tr")
# Iterate over each anchor tag
for row in rows:
first_cell = row.find("td")
if first_cell:
anchor_tag = first_cell.find("a", href=True)
self._debug_print("Anchor tag content:", anchor_tag)
if anchor_tag:
href = anchor_tag["href"]
text = anchor_tag.get_text(strip=True)
links.append({"href": href, "text": text})
self._debug_print("Content extracted:", {"href": href, "text": text})
else:
self._debug_print("No anchor tag found in cell:", first_cell)
else:
self._debug_print("No table cell found in row:", row)
I do not understand how that link is attached in HTML, and I don't know how beautifulsoup built-in functions can help me to get that link.
The output will be a dictionary in which the keys are clickable texts and the values are the links these texts lead to if clicked.