Getting rid of certain text from the body of an email using Python

470 Views Asked by At

I'm trying to parse the body of a forwarded email using the following Python code

import imapclient
import os
import pprint
import pyzmail
import email

#my email info
EMAIL_ADRESS = os.environ.get('DB_USER')
EMAIL_PASSWORD = os.environ.get('PYTHON_PASS')

#login to my email
imap0bj =  imapclient.IMAPClient('imap.gmail.com', ssl = True)
imap0bj.login(EMAIL_ADRESS, EMAIL_PASSWORD )
print("ok")


pprint.pprint(imap0bj.list_folders())
#Selecting my Inbox
imap0bj.select_folder('INBOX', readonly = True)

#Getting UIDs from Inbox
UIDs = imap0bj.search(['SUBJECT', 'Contact FB Applicant', 'ON', '16-Oct-2020'])
print(UIDs)


rawMessages = imap0bj.fetch(UIDs, ['BODY[]'])
message = pyzmail.PyzMessage.factory(rawMessages[9999][b'BODY[]'])

message.text_part != None
#Body of the email returned as a string
msg = message.text_part.get_payload().decode(message.text_part.charset)

print(msg)

imap0bj.logout()

This code outputs a string similar to this

   ---------- Forwarded message ---------
    From: Someone <[email protected]>
    Date: Wed, Oct 14, 2020 at 1:23 PM
    Subject: Fwd:  Contact FB Applicant
    To: <[email protected]>
    
    
    
    
   ---------- Forwarded message ---------
    From: Someone <[email protected]>
    Date: Wed, Oct 14, 2020 at 1:23 PM
    Subject: Fwd:  Contact FB Applicant
    To: <[email protected]>
    
    
    The following applicant filled out the form via Facebook.  Contact
    immediately.
    
    Some Guy
    999999999999
    [email protected]

But I don't want the "Forwarded message" parts. I just want it from "The following applicant..." and onwards which is the info I care about. How do I get rid of the other stuff? I'd really appreciate the help. Thank you!

2

There are 2 best solutions below

0
On BEST ANSWER

You can use io.StringIO

Here's how you would use it.

from io import StringIO

# your code goes here
...
...

msg = message.text_part.get_payload().decode(message.text_part.charset)

sio = StringIO(msg)

sio.seek(msg.index('The following applicant'))

for line in sio:
  print(line)

How it works:

StringIO allows you to treat your string as a stream (file). StringIO.seek moves streams position to a particular place. (0 is the beginning of the stream) str.index returns 1st location of a string within a string. Putting it all together: you move the beginning of the stream to the 1st occurrence of the string you want, and then just read from the stream.

0
On

Judging from this format, you need to read line by line. If you encounter a line that starts with '---', like line[:3]='---' You ignore it and the lines after it until you read an empty line, If it starts with '---' again, repeat the process Then the first non-empty line should be "The following applicant..."

You can burry this code in an infinite loop and break, here is pseudo-code

while True:
  line = read next line
  if length(line) ==0: continue
  if line[:3] = '---'
    while true:
      line = read next line
      if line:
        break
      else:
        continue
  else:
    break
read lines and print everthing from here

On the assumption that read line function records how many lines it has read and which line is about to get read.