Why is the output repeated when I parse a string using PyQuery?

69 Views Asked by At

Why is the output repeated when I parse a string using PyQuery in Spyder?

Here is my code:

from pyquery import PyQuery as pq
html = """

    <ul>
        <li>first-item</li>
        <li><a href="link2.html">second item</a></li>
        <li><a href="link3.html">third item</a></li>
        <li><a href="link4.html">fourth item</a></li>
        <li><a href="link5.html">fifth item</a></li>        
    </ul>

"""
doc = pq(html)
print(type(doc))
print(doc('li'))

Here is the output:

<class 'pyquery.pyquery.PyQuery'>
<a href="link2.html">second item</a></li>
        <li class="item=-0 active"><a href="link3.html"><span class="" bold="">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html><a href="link3.html"><span class="" bold="">third item</span></a></li>
        <li class="item-1 active"><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html><a href="link4.html">fourth item</a></li>
        <li class="item-0"><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html><a href="link5.html">fifth item</a></li>        
    </ul>
</div>
</body></html>

However, according to my textbook the output should be

<li class="item-0">first item</li>
<li class="item-1"><a href="link2.html">second item</a></li>
<li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
<li class="item-1 active"><a href="link4.html">fourth item</a></li>
<li class="item-0"><a href="link5.html">fifth item</a></li>

I have tried very hard to find the answer to the problem on the Internet, but there is no similar problem on the forum or Github. I hope you can help me, I will be very grateful.

1

There are 1 best solutions below

2
On

You don't search the right tag. You want to have all the <li> elements, so you should search for li, not for a

Thus, you would have :

from pyquery import PyQuery as pq
html = """
    <ul>
        <li>first-item</li>
        <li><a href="link2.html">second item</a></li>
        <li><a href="link3.html">third item</a></li>
        <li><a href="link4.html">fourth item</a></li>
        <li><a href="link5.html">fifth item</a></li>        
    </ul>
"""
doc = pq(html)
print(type(doc))
print(doc('li'))

This gives me :

<class 'pyquery.pyquery.PyQuery'>
<li>first-item</li>
<li><a href="link2.html">second item</a></li>
<li><a href="link3.html">third item</a></li>
<li><a href="link4.html">fourth item</a></li>
<li><a href="link5.html">fifth item</a></li> 

I tested independantly of any context, just with the snippet you gave. If there is still something going wrong when applying this, the error must come from elsewhere in your code.