I wrote a python script to fetch all of my gmail. I have hundreds of thousands of old emails, of which about 10,000 were unread.
After successfully fetching all of my email, I find that gmail has marked all the fetched emails as "read". This is disastrous for me since I need to check all unread emails only.
How can I recover the information about which emails were unread? I dumped each mail object into files, the core of my code is shown below:
m = imaplib.IMAP4_SSL("imap.gmail.com")
m.login(user,pwd)
m.select("[Gmail]/All Mail")
resp, items = m.uid('search', None, 'ALL')
uids = items[0].split()
for uid in uids:
resp, data = m.uid('fetch', uid, "(RFC822)")
email_body = data[0][1]
mail = email.message_from_string(email_body)
dumbobj(uid, mail)
I am hoping there is either an option to undo this in gmail, or a member inside the stored mail objects reflecting the seen-state information.
For anyone looking to prevent this headache, consider this answer here. This does not work for me, however, since the damage has already been done.
Edit: I have written the following function to recursively "grep" all strings in an object, and applied it to a dumped email object using the following keywords:
regex = "(?i)((marked)|(seen)|(unread)|(read)|(flag)|(delivered)|(status)|(sate))"
So far, no results (only an unrelated "Delivered-To"). Which other keywords could I try?
def grep_object (obj, regex , cycle = set(), matched = set()):
import re
if id(obj) in cycle:
return
cycle.update([id(obj)])
if isinstance(obj, basestring):
if re.search(regex, obj):
matched.update([obj])
def grep_dict (adict ):
try:
[ [ grep_object(a, regex, cycle, matched ) for a in ab ] for ab in adict.iteritems() ]
except:pass
grep_dict(obj)
try:grep_dict(obj.__dict__)
except:pass
try:
[ grep_object(elm, regex, cycle, matched ) for elm in obj ]
except: pass
return matched
grep_object(mail_object, regex)
I'm having a similar problem (not with gmail), and the biggest problem for me was to make a reproducible test case; and I finally managed to produce one (see below).
In terms of the
Seen
flag, I now gather it goes like this:\Seen
flag will return empty (i.e. it will not be present, as related to the email message).UNSEEN
which contains a list of ids (or uids) of emails in that folder that are new (do not have the\Seen
flag)BODY.PEEK
, then\Seen
on a message is not set; if you fetch them withBODY
, then\Seen
is set(RFC822)
doesn't set\Seen
(unlike your case with Gmail)In the test case, I try to do
pprint.pprint(inspect.getmembers(mail))
(in lieu of yourdumpobj(uid, mail)
) - but only after I'm certain\Seen
has been set. The output I get is posted in mail_object_inspect.txt - and as far as I can see, there is no mention of 'new/read/seen' etc. in none of the readable fields; furthermoremail.as_string()
prints:Even worse, there is no mention of "fields" anywhere in the
imaplib
code (below filenames are printed if they do not contain case-insensitive "field" anywhere):... so I guess that information was not saved with your dumps.
Here is a bit on reconstructing the test case. The hardest was to find a small IMAP server, that can be quickly ran with some arbitrary users and emails, but without having to install a ton of stuff on your system. Finally I found one: trivial-server.pl, the example file of Perl's Net::IMAP::Server; tested on Ubuntu 11.04.
The test case is pasted in this gist, with two files (with many comments) that I'll try to post abridged:
Net::IMAP::Server
server (has a terminal output paste at end of file with a telnet client session)imaplib
client (has a terminal output paste at end of file, of itself operating with the server)
trivial-serverB.pl
First, make sure you have
Net::IMAP::Server
- note, it has many dependencies, so the below command may take a while to install:Then, in the directory where you got
trivial-serverB.pl
, create a subdirectory with SSL certificates:Finally run the server with administrative properties:
Note that the
trivial-serverB.pl
has a hack which will let a client to connect without SSL. Here istrivial-serverB.pl
:testimap.py
With the server above running in one terminal, in another terminal you can just do:
The code will simply read fields and content from the one (and only) message the server above presents, and will eventually restore (remove) the
\Seen
field.References