I got a bunch of old, inherited mbox files which I want to convert to maildir. Problem: The mboxes are not totally RFC compliant. There are several mailboxes missing the empty line before the "^From " line in some (but not all) mails which causes mb2md not to separate these mails from each other.
Example:
...
Text of mail 1
... bla....
To unsubscribe, visit https:...
From fetchmail Fri Nov 8 18:35:54 CET 2002 ## ^missing empty line above
...
Text of mail 2
...
Now I'm searching an easy way to insert an empty line before any line matching "^From " - but only when not preceded by an empty line. A kind of stream-edit is must, because mailboxes could be really huge.
I use sed regularly - but I'm not familiar with multiline matching. Tried several things (cut'npaste with modifications) today without success :(
Last try was
sed -E ':a;N;$!ba;s/\n(..*)\nFrom /\n\1\n\nFrom /g' /tmp/testfile
that only matched the last occurrence of the pattern!?
sed/awk-experts - do you have any hint for me?
Yes. Regex is greedy. The
.*
matches everything, then after it has matched everything, a last single\nFrom
is matched. Match everything except a newline, to match one line.If you do not want to read the whole file into memory, you have to read at least two lines in memory. Below I put the previous line into hold space - append current line with previous line on each line read to check the condition. After checking it, the previous line is printed.
and a oneliner: