At the moment I'm working on a import script that imports bank account data into a MySQL database using php. I found some mt940 pharser but those didn't do what I wanted to do or didn't meet the current MT940 standard. So I wrote my own simple class that parses me the data that I need.
The problem no is, and this might even not be a MT940 issue, is that I must filter double transaction. And in basic, this is very simple, if exact the same transaction already exists in the database, don't import again. So that's what I have done.
But now the fun part: Some transactions might happen twice at the same day. For example someone does the same transaction twice at the same day [someone might like me and just send me the money twice ;)]. While importing this for the first time, there is no problem. In one file each transaction is a transaction.
But now the problem: Because the transaction are not unique (MT940 doesn't send any unique's for a transaction), it is hard to filter out a double transaction from a unique transaction. So if I have downloaded two MT940 files from my bank account. And one of the two transactions is in the first file, and another one is in the second file. When importing the second file, it will now tel me that the transaction is a double transaction.
So.. I'm struggling with this, and probably I'm not the only one. Also big accountancy program's use MT940 structures and must deal with the same issue.
Who knows a way to handle with this?
Additional information
In the transaction itself (:61:), the following information is available:
- Date
- Amount
- Bank account number
- Bank account name
- Bank account address
- Transaction description
It might happen that if someone does a transaction twice, all this information is the same.
Example of two double transaction (of course I modified the name and bankaccount of the transaction): (this one was double in one MT940 file, but are really two transactions).
:61:130311C000000000029,95N122NONREF
0123456789
:86:/ORDP//NAME/JANSEN W H/ADDR/SOMEROAD 1 9569 GS THECITY/REMI/N
OKIA 3310/ISDT/2013-03-11
Let's say that you have 3 transactions: A, B and C.
A and B are duplicates, therefore B must be discarded
C is a different transaction, but it's identical to A in every aspect.
By the data you provided in the question, it's true that A = B = C. They're absolutely identical. There are no ways to determine, by reading their content, who's genuine and who's duplicate. Everyone of the following scenarios may be valid:
As you can see, there's no algorithm that can decide 100% which case is the correct one. And since we're talking about money, everything less than 100% certainty can't be accepted.
What to do then?
If a computer can't determine it, let the humans kick in. When you record a conflict, compile a list. Process everything that does not have conflicts and when you've finished the batch, send the conflict list to an human operator, and let him/her sort out the duplicates from the genuine transactions (even if I still wonder HOW can they sort them if they're identical in every aspect)
If you find out that the operators know HOW to sort them, try to deduce their human algorithm, if possible.