How to deal with "=2E" in get_body()?

74 Views Asked by At
import email
import email.policy
import sys
msg = email.message_from_string(sys.stdin.read(), policy=email.policy.default)
print(msg.get_body('plain').get_payload())

input.eml

MIME-Version: 1.0
From: my from
To: [email protected]
Subject: my subject
Content-Type: multipart/mixed;
 boundary="----=_Part_2296279_969698842.1679155313994"
Date: Sat, 18 Mar 2023 16:01:53 +0000 (UTC)

------=_Part_2296279_969698842.1679155313994
Content-Type: multipart/alternative;
 boundary="----=_Part_2296278_601255348.1679155313994"

------=_Part_2296278_601255348.1679155313994
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

email app=2E this is a test =2E
------=_Part_2296278_601255348.1679155313994
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit

<!DOCTYPE html>
<html>
<body>
this is html
</body>
</html>
------=_Part_2296278_601255348.1679155313994--

------=_Part_2296279_969698842.1679155313994--

get_body() shows things like =2E. Is it a dot? How to automatically convert such escaped strings to the actual characters?

$ ./main.py < input.eml
email app=2E this is a test =2E
1

There are 1 best solutions below

0
user1424739 On

The solution is to provide decode=True to get_payload().

msg.get_body('plain').get_payload(decode=True).decode('utf-8')

I have to say the default option chosen by the author of get_payload() is poor. This is not the first time that I find that python's API is poorly designed.