"*" a test ... I have the following code, but it didn't do the job. " /> "*" a test ... I have the following code, but it didn't do the job. " /> "*" a test ... I have the following code, but it didn't do the job. "/>

How to extract href link from a <li class='item'> tag?

66 Views Asked by At

https://i.stack.imgur.com/VC02I.png

<li class="item">
  "*"
  <a title="test" href="/item/a test/948507/#viewPageContent">a test</a>
   ...

I have the following code, but it didn't do the job.

entryLi = soup.findAll('li', attrs={'class': 'item'})
for entry in entryLi:
    text = entry.text     
    href = entry.find('a')['href']

I don't want all other hrefs, but only want href under the <li> tag.

1

There are 1 best solutions below

7
Juan Medina On

Python Implementation

soup = BeautifulSoup(html_doc, 'html.parser')
linkList = []
aList = []
for liNode in soup.find_all('li'):
    for aNode in liNode.find_all('a'):
       aList.append(aNode)
       linkList.append(aNode.get('href'))

So inside aNode you have the full HTML Object and inside linkList you'll have the links only.

@marlon for the HTML object coming from the link, you could do:

for link in linkList:
    with open(link) as fp:
        soup1 = BeautifulSoup(fp, 'html.parser')