sed to process an OFX, extracting payee from <MEMO> and printing on <NAME>

196 Views Asked by At

I'm processing a OFX (bank transactions) file. My bank doesn't use the <NAME> tag to specify the payee, but this information is a substring of <MEMO> tag.

So, my file is something like:

...ofx headers and other stuff
...line below is a transaction
<STMTTRN>
    <TRNTYPE>OTHER</TRNTYPE>
    <DTPOSTED>20160609120000</DTPOSTED>
    <TRNAMT>-4.00</TRNAMT>
    <FITID>2016060914000</FITID>
    <CHECKNUM>000000700132</CHECKNUM>
    <REFNUM>700.132</REFNUM>
    <MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO>
</STMTTRN>
...continues other transactions and end of file

I would like to match every <MEMO> tag, extract the payee name (Walmart 2th street in this example) and write a new line with a <NAME>. My output would be like:

...ofx headers and other stuff
...line below is a transaction
<STMTTRN>
    <TRNTYPE>OTHER</TRNTYPE>
    <DTPOSTED>20160609120000</DTPOSTED>
    <TRNAMT>-4.00</TRNAMT>
    <FITID>2016060914000</FITID>
    <CHECKNUM>000000700132</CHECKNUM>
    <REFNUM>700.132</REFNUM>
    <MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO>
    <NAME>Walmart 2th street</NAME>
</STMTTRN>
...continues other transactions and end of file

Another tool as awk can be a solution.

2

There are 2 best solutions below

2
Cyrus On BEST ANSWER

With GNU sed:

sed -r 's/.*<MEMO>.* [0-9]{2}:[0-9]{2} (.*)<.*/&\n    <NAME>\1<\/NAME>/' file

Output:

<STMTTRN>
    <TRNTYPE>OTHER</TRNTYPE>
    <DTPOSTED>20160609120000</DTPOSTED>
    <TRNAMT>-4.00</TRNAMT>
    <FITID>2016060914000</FITID>
    <CHECKNUM>000000700132</CHECKNUM>
    <REFNUM>700.132</REFNUM>
    <MEMO>Credit Card Payment - 09/06 18:37 Walmart 2th street</MEMO>
    <NAME>Walmart 2th street</NAME>
</STMTTRN>

If you want to edit your file "in place" use sed's option -i.

1
Filipe On

Complementing @Cyrus answer to deal with no ascii chars:

I gave up on non ascii chars and now it's working:

iconv -f "windows-1252" -t "UTF-8" file-ansi.ofx -o file-utf8.ofx
rm file-ansi.ofx
sed 'y/áÁàÀãÃâÂéÉêÊíÍóÓõÕôÔúÚüÜçÇ/aAaAaAaAeEeEiIoOoOoOuUuUcC/' -i file-utf8.ofx
sed -i -r 's/.*<MEMO>.* [0-9]{2}:[0-9]{2} (.*)<.*/&\n                  <NAME>\1<\/NAME>/' file-utf8.ofx 

My output:

<MEMO>Cartao de Credito - 09/06 18:37 Walmart 2th</MEMO>
<NAME>Walmart 2th street</NAME>