I have this code but I don't actually get the email text.
Have I got to decode the email text?
import sys
import imaplib
import getpass
import email
import email.header
from email.header import decode_header
import base64
def read(username, password, sender_of_interest):
# Login to INBOX
imap = imaplib.IMAP4_SSL("imap.mail.com", 993)
imap.login(username, password)
imap.select('INBOX')
# Use search(), not status()
# Print all unread messages from a certain sender of interest
if sender_of_interest:
status, response = imap.uid('search', None, 'UNSEEN', 'FROM {0}'.format(sender_of_interest))
else:
status, response = imap.uid('search', None, 'UNSEEN')
if status == 'OK':
unread_msg_nums = response[0].split()
else:
unread_msg_nums = []
data_list = []
for e_id in unread_msg_nums:
data_dict = {}
e_id = e_id.decode('utf-8')
_, response = imap.uid('fetch', e_id, '(RFC822)')
html = response[0][1].decode('utf-8')
email_message = email.message_from_string(html)
data_dict['mail_to'] = email_message['To']
data_dict['mail_subject'] = email_message['Subject']
data_dict['mail_from'] = email.utils.parseaddr(email_message['From'])
#data_dict['body'] = email_message.get_payload()[0].get_payload()
data_dict['body'] = email_message.get_payload()
data_list.append(data_dict)
print(data_list)
# Mark them as seen
#for e_id in unread_msg_nums:
#imap.store(e_id, '+FLAGS', '\Seen')
imap.logout()
return data_dict
So I do this:
print('Getting the email text bodiies ... ')
emailData = read(usermail, pw, sender_of_interest)
print('Got the data!')
for key in emailData.keys():
print(key, emailData[key])
The output is:
mail_to [email protected]
mail_subject Get json file
mail_from ('Pedro Rodriguez', '[email protected]')
body [<email.message.Message object at 0x7f7d9f928df0>, <email.message.Message object at 0x7f7d9f928f70>]
How to actually get the email text?
Depending on what exactly you mean by "the text", you probably want the
get_bodymethod. But you are thoroughly mangling the email before you get to that point. What you receive from the server isn't "HTML" and converting it to a string to then callmessage_from_stringon it is roundabout and error-prone. What you get are bytes; use themessage_from_bytesmethod directly. (This avoids all kinds of problems when the bytes are not UTF-8; themessage_from_stringmethod only really made sense back in Python 2, which didn't have explicitbytes.)The use of a
policyselects the (no longer very) newEmailMessage; you need Python 3.3+ for this to be available. The older legacyemail.Messageclass did not have this method, but should be avoided in new code for many other reasons as well.This could fail for multipart messages with nontrivial nested structures; the
get_bodymethod without arguments can return amultipart/alternativemessage part and then you have to take it from there. You haven't specified what your messages are expected to look like so I won't delve further into that.More fundamentally, you probably need a more nuanced picture of how modern email messages are structured. See What are the "parts" in a multipart email?