How to extract href link from a <li class='item'> tag?

66 Views Asked by marlon At 24 February 2021 at 21:41

<li class="item">
  "*"
  <a title="test" href="/item/a test/948507/#viewPageContent">a test</a>
   ...

I have the following code, but it didn't do the job.

entryLi = soup.findAll('li', attrs={'class': 'item'})
for entry in entryLi:
    text = entry.text     
    href = entry.find('a')['href']

I don't want all other hrefs, but only want href under the <li> tag.

Original Q&A

There are 1 best solutions below

Juan Medina On 24 February 2021 at 22:04

Python Implementation

soup = BeautifulSoup(html_doc, 'html.parser')
linkList = []
aList = []
for liNode in soup.find_all('li'):
    for aNode in liNode.find_all('a'):
       aList.append(aNode)
       linkList.append(aNode.get('href'))

So inside aNode you have the full HTML Object and inside linkList you'll have the links only.

@marlon for the HTML object coming from the link, you could do:

for link in linkList:
    with open(link) as fp:
        soup1 = BeautifulSoup(fp, 'html.parser')

How to extract href link from a <li class='item'> tag?

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in BEAUTIFULSOUP

Trending Questions

Popular # Hahtags

Popular Questions