I am encountering a problem extracting meta data from a PST file.
As you can see in the code I am using pypff to read the PST file. I need to extract the following data from the emails: sender, recipient, subject, time and date and of course the email content.
But apparently I'm too stupid for that, because I just can't find the recipient.
I'm asking you professionals for help, maybe you know a better way to do this. I have already thought about "unpacking" all .msg from the PST into a folder and then itterrating over it. But I wouldn't know how to do that either.
Thanks in advance for your answers and help.
# Retrieving E-Mails from a PST file
#File opening
#Fist we load the libraries
import pypff
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
#Then we open the file: the opening can neverthless be quite long
#depending on the size of the archive.
pst = pypff.file()
pst.open("PathTo.pst")
# Metadata extraction
#It is possible to navigate through the structure using the functions
#offered by the library, from the root:
root = pst.get_root_folder()
#To extract the data, a recursive function is necessary:
def parse_folder(base):
messages = []
for folder in base.sub_folders:
if folder.number_of_sub_folders:
messages += parse_folder(folder)
print(folder.name)
for message in folder.sub_messages:
print(message.transport_headers)
messages.append({
"subject": message.subject,
"sender": message.sender_name,
"datetime": message.client_submit_time,
})
return messages
messages = parse_folder(root)