I need to move away mails older than given time - let it be 24h = 86400s. I use old good procmail for multiple other purposes on that machine, so I wanted to use is as well for this purpose. It also behaves well under the load (~1 000 000 small automated messages per day).
It took me a while to get to this ugly solution (excerpt from bigger procmailrc file):
- Grab Date: field using formail
- Grab current date in UNIX format (seconds)
- bash convert the mail date to unix format
- compare values using bash
- return result to procmail using exit code. Together:
MAILDATE_RFC=`formail -zxDate:`
DATE_UNIX=`date "+%s"`
:0
* ? MAILDATE_UNIX=`date -d "$MAILDATE_RFC" "+%s"` ; if ( (( ($DATE_UNIX-$MAILDATE_UNIX) > 86400)) ) then exit 0; else exit 1; fi
! account_for_outdated_mails
In this case I need to use the "Date:" field, as this contains the local time at which the mail was generated (it can take multiple days to get to my machine). We are 100% sure that "Date:" field exists and contains RFC-style date (those are automated messages in separated mail network).
My solution looks pretty ugly:
- Getting the comparison result from bash using exit codes looks pretty bad. Might be inefficient as well.
- I would like to calculate the MAILDATE_RFC still in procmail but it seems I cannot use any variable as the argument to generate another variable:
MAILDATE_UNIX=`date -d "$MAILDATE_RFC" "+%s"`
does not work.
The only optimization I am aware of would be to push the whole process of getting MAILDATE_RFC, MAILDATE_UNIX and DATE_UNIX processed in bash script and doing it in one bash session instead of 3.
My question: Is there a better way to do it? Maybe more efficient?
What you say doesn't work actually does. Here's a quick demo.
testing.rc:Test run, in a fresh Ubuntu 20.04 Docker image:
This also demonstrates how to use scoring to do the calculation. It's perhaps somewhat intimidating, but saves an external process, and so should be more efficient than doing the calculation in Bash.
In some more detail,
123^0 regexsays to add123to the score just once if the message matches the regexregex(in the recipe above, we use the regex^which of course always matches; every message contains a beginning. You could change the 0 to e.g. 1 to say to add for every match, or etc - see theprocmailscman page for proper documentation). The$modifier says to expand any variables in the recipe itself.If you are not using GNU
date, you don't havedate -d; in that case, probably refer to your platform's man page for how to calculate a date stamp for an arbitrary date. How to convert date string to epoch timestamp with the OS X BSD `date` command? has a discussion for MacOS, which should also work for any other *BSD platform.If you really wanted to make this more efficient, and can be sure that the
Date:header really always uses the RFC-mandated format, you could even parse the date in Procmail. Something likeThe
\/token in a regex says to save the matched text after it into the special variableMATCH. We then copy that variable todateand perform additional matching to extract its parts.Performing the necessary arithmetic to convert this into seconds since January 1, 1970 should be doable at this point, I hope. If you need complete per-day accuracy, you would also need to extract the time and the time zone and adjust to the correct day if it's not in your preferred time zone, or perhaps UTC (that would be
+0000at the very end); but this is just a sketch, anyway, because I think I have a better idea altogether.Namely, save the messages to the correct folder as they arrive, then just forward or discard or archive older folders when you no longer need them.
This will save to an mbox file named like
inbox-2022-06-10based on the extractedDate:header. (Again, you could avoid the external processes if you really wanted to squeeze out the last bit of performance, using the date parsing sketch above. And again, if you can't have a message from a different time zone land in the previous or next day's folder, you need to recalculate the date for your time zone.)