• first" />
  • first" />
  • first"/>

    Why is the output repeated when I parse a string using PyQuery?

    83 Views Asked by At

    Why is the output repeated when I parse a string using PyQuery in Spyder?

    Here is my code:

    from pyquery import PyQuery as pq
    html = """
    
        <ul>
            <li>first-item</li>
            <li><a href="link2.html">second item</a></li>
            <li><a href="link3.html">third item</a></li>
            <li><a href="link4.html">fourth item</a></li>
            <li><a href="link5.html">fifth item</a></li>        
        </ul>
    
    """
    doc = pq(html)
    print(type(doc))
    print(doc('li'))
    

    Here is the output:

    <class 'pyquery.pyquery.PyQuery'>
    <a href="link2.html">second item</a></li>
            <li class="item=-0 active"><a href="link3.html"><span class="" bold="">third item</span></a></li>
            <li class="item-1 active"><a href="link4.html">fourth item</a></li>
            <li class="item-0"><a href="link5.html">fifth item</a></li>        
        </ul>
    </div>
    </body></html><a href="link3.html"><span class="" bold="">third item</span></a></li>
            <li class="item-1 active"><a href="link4.html">fourth item</a></li>
            <li class="item-0"><a href="link5.html">fifth item</a></li>        
        </ul>
    </div>
    </body></html><a href="link4.html">fourth item</a></li>
            <li class="item-0"><a href="link5.html">fifth item</a></li>        
        </ul>
    </div>
    </body></html><a href="link5.html">fifth item</a></li>        
        </ul>
    </div>
    </body></html>
    

    However, according to my textbook the output should be

    <li class="item-0">first item</li>
    <li class="item-1"><a href="link2.html">second item</a></li>
    <li class="item-0 active"><a href="link3.html"><span class="bold">third item</span></a></li>
    <li class="item-1 active"><a href="link4.html">fourth item</a></li>
    <li class="item-0"><a href="link5.html">fifth item</a></li>
    

    I have tried very hard to find the answer to the problem on the Internet, but there is no similar problem on the forum or Github. I hope you can help me, I will be very grateful.

    1

    There are 1 best solutions below

    2
    imperosol On

    You don't search the right tag. You want to have all the <li> elements, so you should search for li, not for a

    Thus, you would have :

    from pyquery import PyQuery as pq
    html = """
        <ul>
            <li>first-item</li>
            <li><a href="link2.html">second item</a></li>
            <li><a href="link3.html">third item</a></li>
            <li><a href="link4.html">fourth item</a></li>
            <li><a href="link5.html">fifth item</a></li>        
        </ul>
    """
    doc = pq(html)
    print(type(doc))
    print(doc('li'))
    

    This gives me :

    <class 'pyquery.pyquery.PyQuery'>
    <li>first-item</li>
    <li><a href="link2.html">second item</a></li>
    <li><a href="link3.html">third item</a></li>
    <li><a href="link4.html">fourth item</a></li>
    <li><a href="link5.html">fifth item</a></li> 
    

    I tested independantly of any context, just with the snippet you gave. If there is still something going wrong when applying this, the error must come from elsewhere in your code.