Pandas read_html issue with &nbsp

335 Views Asked by ejyoung At 28 July 2025 at 22:49

I'm using pandas read_html to read an html file and I'm running into an issue with nonbreaking spaces. I have data in a column of resulting data frame that should contains a string like "ABCDEF G" (three spaces between F and G). Instead I'm getting "ABCDEF G" (one space between F and G). When I inspect the html file it shows "ABCDEF G" so for some reason these three nonbreaking spaces are being changed to one space only. All single nonbreaking spaces in the html are working fine. Is there a way to get around this so it retains the three spaces between F and G?

Original Q&A

There are 1 best solutions below

ejyoung On 18 March 2021 at 23:08

It's not elegant but for now I'm doing

 with open(htmllink, 'r') as r: 
        data = r.read().replace('&nbsp;&nbsp;&nbsp;', '___')

Then coming back and replacing the underscores with three spaces. Still looking for a better way to do this but it should work for now.

Pandas read_html issue with &nbsp

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in HTML

Related Questions in PANDAS

Related Questions in NON-BREAKING-CHARACTERS

Trending Questions

Popular # Hahtags

Popular Questions