BeautilfulSoup find_all method returns the same elements

32 Views Asked by At

Hi here is my soup object:

<td class="kategorie">
 <div data-navi-cat="c5ff5b1d0dc93c">
  Herren
 </div>
 <div data-navi-cat="c5ff5b1d0dc95f">
  Frauen
 </div>
 <div data-navi-cat="c5ff5b1d0dc978">
  A-Jugend (U19)
 </div>
 <div data-navi-cat="c5ff5b1d0dc98c">
  B-Jugend (U17)
 </div>
 <div data-navi-cat="c5ff5b1d0dc9a2">
  C-Jugend (U15)
 </div>
 <div data-navi-cat="c5ff5b1d0dc9b1">
  U17-Juniorinnen
 </div>
 <div data-navi-cat="c5ff5b1d0dc9b6">
  Futsal
 </div>
 <div data-navi-cat="c5ff5b1d0dc9bd">
  eSport
 </div>
</td>

How can I get all the c-codes and its corresponding text from the object? For example: c-code: "c5ff5b1d0dc93c" and its corresponding text: "Herren" for the first row...

My code looks like this (categories is the soup object):

for category in categories.find_all('div'):
    category = categories.find('div')
    print(category)

I only receive the information of the first row....

<div data-navi-cat="c5ff5b1d0dc93c">Herren</div>
<div data-navi-cat="c5ff5b1d0dc93c">Herren</div>
<div data-navi-cat="c5ff5b1d0dc93c">Herren</div>
<div data-navi-cat="c5ff5b1d0dc93c">Herren</div>
<div data-navi-cat="c5ff5b1d0dc93c">Herren</div>
<div data-navi-cat="c5ff5b1d0dc93c">Herren</div>
<div data-navi-cat="c5ff5b1d0dc93c">Herren</div>
<div data-navi-cat="c5ff5b1d0dc93c">Herren</div>

1

There are 1 best solutions below

0
On

What happens?

  • categories holds your html
  • in your loop you do category = categories.find('div') - find('div') always returns the first occurrence, so category will always be <div data-navi-cat="c5ff5b1d0dc93c">Herren</div>

You should do category = element.get_text() to get the text and code = element.get('data-navi-cat') to get the code.

Example

from bs4 import BeautifulSoup
html = '''<td class="kategorie">
 <div data-navi-cat="c5ff5b1d0dc93c">
  Herren
 </div>
 <div data-navi-cat="c5ff5b1d0dc95f">
  Frauen
 </div>
 <div data-navi-cat="c5ff5b1d0dc978">
  A-Jugend (U19)
 </div>
 <div data-navi-cat="c5ff5b1d0dc98c">
  B-Jugend (U17)
 </div>
 <div data-navi-cat="c5ff5b1d0dc9a2">
  C-Jugend (U15)
 </div>
 <div data-navi-cat="c5ff5b1d0dc9b1">
  U17-Juniorinnen
 </div>
 <div data-navi-cat="c5ff5b1d0dc9b6">
  Futsal
 </div>
 <div data-navi-cat="c5ff5b1d0dc9bd">
  eSport
 </div>
</td>'''

soup = BeautifulSoup(html, "lxml")
for element in soup.find_all('div'):
    category = element.get_text()
    code = element.get('data-navi-cat')
    print(category, code)

Output

  Herren
  c5ff5b1d0dc93c

  Frauen
  c5ff5b1d0dc95f

  A-Jugend (U19)
  c5ff5b1d0dc978

  B-Jugend (U17)
  c5ff5b1d0dc98c