I am working on a project, where I need to identify emails sent by real humans as opposed to bulk mails, notifications and newsletters. Is there any definite way of doing that? Is there any information in email header which can help. I am working on top of Gmail IMAP so I already have non-spam emails.
Any help in this regard is appreciated. Thanks!
There isn't a clear way to distinguish bulk mail from personalised mailings. Unlike with spam, most bulk mail is requested/expected, so the sender doesn't do odd things to get round spam filters, which means these emails often blend in fairly well.
However, there are some trends that you can look for. If you want to do it reliably, you will probably need to apply some scoring system, like spam-filters do.
You will also need to accept that you are bound to get a substantial proportion of false positives and false negatives.
Some things that are common to bulk mail that appear less often in personalised correspondence:
<table></table>
or<ul><li></li></ul>
structure. i.e. the stuff that something like Dreamweaver would put in, rather than a mail client.