Python - aggregating financial transactions with closest time in window

231 Views Asked by At

Suppose I collected (in a list) all trades that occurred within a certain period of time (say first 5mins after 11AM) for n stocks (I'll make n=2 for simplicity and adapt later). Say we have firm AAA and firm BBB (if it helps, liststocks=['AAA', 'BBB']). The list will look somehow like:

    trades=[['AAA', '2011-01-03', '11:03:51', 21.5],['BBB', '2011-01-03','11:03:57', 31.5],
['AAA', '2011-01-03', '11:04:20', 21.55],
['BBB', '2011-01-03','11:04:19', 32.01], ['BBB', '2011-01-03','11:04:52', 31.7]]

i.e., 2 trades for stock AAA and 3 trades for stock BBB. Picking the last trade of each stock causes a lack of synchronicity problem. The idea is to pick the last trade of each stock and find the earliest (['AAA', '2011-01-03', '11:04:20', 21.55]). Then pick transactions of all other stocks with time as close as possible to '11:04:20', which would cause us to choose ['BBB', '2011-01-03','11:04:19', 32.01]. The output should be a list like:

    C=[['AAA', '2011-01-03', '11:04:20', 21.55],['BBB', '2011-01-03','11:04:19', 32.01]]

Thanks a lot!

2

There are 2 best solutions below

3
On

Its not that hard if you use sorted with the key parameter.

Here is the code if you don't want to read, I'll explain after:

from datetime import datetime

trades=[['AAA', '2011-01-03', '11:03:51', 21.5],['BBB', '2011-01-03','11:03:57', 31.5],
['AAA', '2011-01-03', '11:04:20', 21.55],
['BBB', '2011-01-03','11:04:19', 32.01], ['BBB', '2011-01-03','11:04:52', 31.7]]

trades=[[i[0], datetime.strptime(i[1]+" "+i[2], "%Y-%m-%d %H:%M:%S"), i[3]] for i in trades]

most_liquid, *others, least_liquid = sorted(set(i[0] for i in trades), key=trades.count)

A=sorted((i for i in trades if i[0]==least_liquid), key=lambda n: n[1])[-1]
B=sorted((i for i in trades if i[0]==most_liquid), key=lambda n: abs(n[1]-A[1]))[0]

What this does is it first converts each trade from using a string representation of the time to a datetime object. It does this with the datetime.strptime class method. Then it calculates the liquidity of the stocks by sorting trades. The *others generalizes to n stocks. Then it just filters the less_liquid trades and then sorts them by the time parameter. Then it filters by name being more_liquid and sorts by the absolute difference between it and the A trade.

So the object you want are A and B. They won't be exactly what you specified, since they will have datetime instead of strings, but that should be easy to fix with the datetime.strftime function.

0
On

A solution for 2 stocks is

    from datetime import *
    trades=[['AAA', '2011-01-03', '11:03:51', 21.5],['BBB', '2011-01-03','11:03:57', 31.5],
['AAA', '2011-01-03', '11:04:20', 21.55],
['BBB', '2011-01-03','11:04:19', 32.01], ['BBB', '2011-01-03','11:04:52', 31.7]]

stocknames = ['AAA','BBB']
A=[]
lastofeach=[]
for stock in stocknames:
    for t in trades:
        if t[0]==stock:
            A.append(t)
    A.sort(key=lambda e:(e[1], e[2]))
    lastofeach.append(A[-1])
    A[:]=[]
lastofeach.sort(key=lambda e:e[2])  

lastofeach=[[i[0], datetime.strptime(i[1]+" "+i[2], "%Y-%m-%d %H:%M:%S"), i[3]] for i in lastofeach]
trades=[[i[0], datetime.strptime(i[1]+" "+i[2], "%Y-%m-%d %H:%M:%S"), i[3]] for i in trades]


A=lastofeach[0]
B=(sorted((i for i in trades if i[0]!=A[0]), key=lambda n: abs(n[1]-A[1]))[0])
C=[A,B]
print (C)

(for the exact same answer just apply datetime.strftime(A[1],"%Y-%m-%d %H:%M:%S"), split it and save.

A solution for n>2 would be very much appreciated. Any big O optimizations are welcome too.

Thanks