Converting HTML to plain text that looks like it was copied from a browser using Python

179 Views Asked by Luyu Huang At 29 July 2025 at 12:01

I want to convert HTML to plain text in Python, I hope the results to look like they were copied from the browser. I tried many libraries like html2text, html-text and BeautifulSoup, But none of them get the results I want. For example, the following HTML:

<div>aaa</div> <div>AAA</div>
<div><br></div>
<div>bbb</div> <div>BBB</div>
<div><br></div>
<div>ccc</div> <div>CCC</div>

looks like this in the browser:

aaa
AAA

bbb
BBB

ccc
CCC

But when I use html2text, the result is

aaa

AAA



bbb

BBB



ccc

CCC

the result of html-text is

aaa
AAA
bbb
BBB
ccc
CCC

and BeautifulSoup just removes the tags:


aaa AAA

bbb BBB

ccc CCC

well I also tried soup.get_text('\n') and soup.get_text('\n', strip=True) but couldn't get correct results.

Does anyone have a good way to solve the problem? Thank you very much.

Original Q&A

There are 2 best solutions below

FaranAiki On 28 December 2019 at 11:45

As @dabingsou said

This code is the generic solution using function

from simplified_scrapy.simplified_doc import SimplifiedDoc 

def print_html(html): # this is the function code
    return SimplifiedDoc(html).replaceReg(SimplifiedDoc(html).html,"</div>","\n").replaceReg(html,"<.*>","")

# let's say the html is 
html = """
<div> Hello, World! </div>
<div> By Faran </div>
"""

print_html(html)

The result will be

Hello, World!
By Faran

dabingsou On 28 December 2019 at 10:40

what about this.

from simplified_scrapy.simplified_doc import SimplifiedDoc 
html = '''<div>aaa</div> <div>AAA</div>
<div><br></div>
<div>bbb</div> <div>BBB</div>
<div><br></div>
<div>ccc</div> <div>CCC</div>'''
doc = SimplifiedDoc(html)
html = doc.replaceReg(doc.html,"</div>","\n")
html = doc.replaceReg(html,"<.*>","")
print(html)

result：

aaa
AAA

bbb
BBB

ccc
CCC

Converting HTML to plain text that looks like it was copied from a browser using Python

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in HTML

Related Questions in BEAUTIFULSOUP

Related Questions in HTML-TO-TEXT

Trending Questions

Popular # Hahtags

Popular Questions