DataFrame Row Sequences Using Sports Geospatial Tracking Data

72 Views Asked by At

I have some X Y Geospatial Cordinate Data For a Rugby match but there is no data on where the Ball is located. This is my problem! Therefore we have to determine using the X Y Player movements and some event data where the ball is moving for each 0.1 seconds time interval. I have mapped the Event Data (Shown by columns 'Team', 'Player', 'Event', 'Outcome') to the XY Player Tracking Data (Shown by 'X_Plr1', 'Y_Plr1'). I'm looking to fill in the ball positon columndepending on th event information. An assumption we can make it that the direction is linear. If the ball is passed from player 1 to player 2 between two x.y positions (14.11, 25.55) at index 1 and travelling in linear direction to Player two at index 5. So the example above would cover a PASS. However from index position 5 to index position 9 the ball is instead being dribbled by Player2 and hence the balls position would simply mirror where Player 2 had moved from Index 5 to Index 9. I'm trying to determine how to fill Ball Position column depending on what the event data says in essence! Any help would be great!

0  Team    Player    Event    Outcome    Time    X_Plr1    X_Plr2  Y_Plr1  Y_Plr2   Ball Position 
1  Team1   Player1   Pass     Success    0.0     14.11     4.99    25.55   30.01
2  NaN     NaN       NaN      NaN        0.1     14.21     5.03    25.34   30.06
3  NaN     NaN       NaN      NaN        0.2     14.31     5.07    26.11   30.10
4  NaN     NaN       NaN      NaN        0.3     14.33     5.11    26.01   30.12
5  Team1   Player2   Received Success    0.4     14.35     5.13    26.24   30.44
6  NaN     NaN       NaN      NaN        0.1     14.21     5.10    26.13   30.97
7  NaN     NaN       NaN      NaN        0.2     14.31     5.07    25.99   31.11
8  NaN     NaN       NaN      NaN        0.3     14.33     5.11    25.97   31.23
9  Team1   Player2   Pass     Success    0.4     14.21     5.16    25.92   31.89
1

There are 1 best solutions below

0
On

Here is a potential solution. I tried to make the function names descriptive so it is easier to see the approach. Essentially loop through only the df rows where there is an event, and calculate ball position for the block of rows that starts with one event and ends with the next event after it. If an event block is for a pass, then use linear interpolation between the two player positions, as you described. If an event block is for a player that received a pass and is dribbling, then fill in ball position with the dribbling player's position. The output is a new column, BallPosition, which contains a two value tuple with the X and Y position of the ball. Here is the code:

def event_players(df, ev_ind1, ev_ind2):
    return (df.loc[ev_ind1, 'Player'], df.loc[ev_ind2, 'Player'])

def interp_ball_position(df, tvec, ev_ind1, pname1, ev_ind2, pname2, ndec=2):
    return np.around(
        np.interp(
            tvec, [tvec[0], tvec[-1]], [df.loc[ev_ind1, pname1], df.loc[ev_ind2, pname2]]
        ), ndec
    )

def pass_ball_position(df, ev_ind1, ev_ind2, pname_colmap):
    # Player names for start and end of pass event, respectively
    pname1, pname2 = event_players(df, ev_ind1, ev_ind2)
    # Use Time column to interpolate ball position
    # this is not needed if Time is always equally spaced with no gaps
    tvec = df.loc[ev_ind1:ev_ind2, 'Time'].values
    b_pos_x = interp_ball_position(df, tvec, \
                                   ev_ind1, 'X_'+pname_colmap[pname1], \
                                   ev_ind2, 'X_'+pname_colmap[pname2]
    )
    b_pos_y = interp_ball_position(df, tvec, \
                                   ev_ind1, 'Y_'+pname_colmap[pname1], \
                                   ev_ind2, 'Y_'+pname_colmap[pname2]
    )
    df.loc[ev_ind1:ev_ind2, 'BallPosition'] = pd.Series( \
            list(map(tuple, \
                     np.stack((b_pos_x, b_pos_y), axis=-1)
            )), index=df.loc[ev_ind1:ev_ind2].index
    )

def dribble_ball_position(df, ev_ind1, ev_ind2, pname_colmap):
    # Player names for start and end of dribble event, respectively
    pname1, pname2 = event_players(df, ev_ind1, ev_ind2)
    if pname1 == pname2:
        # Next event after Received assumed to be same player
        # otherwise modify accordingly
        df.loc[ev_ind1:ev_ind2, 'BallPosition'] = pd.Series( \
                list(zip(df.loc[ev_ind1:ev_ind2, 'X_'+pname_colmap[pname1]], \
                         df.loc[ev_ind1:ev_ind2, 'Y_'+pname_colmap[pname1]] \
                )), \
                index=df.loc[ev_ind1:ev_ind2].index
        )

# Assume dataframe always starts with an event, either 'Pass' or 'Received', but
# it doesn't have to end with one
# Map player names to location column suffix
pname_colmap = {'Player1': 'Plr1', 'Player2': 'Plr2'}

# List of row indices for all events in df
event_inds = df[df['Event'].notna()].index.to_list()
last_ind = df.index[-1]
# Last row in df is always an event so Ball Position is filled in completely
event_inds = event_inds if last_ind in event_inds else event_inds.append(last_ind)

# Loop through event pairs
for ev_num, ev_ind1 in enumerate(event_inds[:-1]):
    if df.loc[ev_ind1, 'Event'] == 'Pass':
        pass_ball_position(df, ev_ind1, event_inds[ev_num+1], pname_colmap)
    elif df.loc[ev_ind1, 'Event'] == 'Received':
        dribble_ball_position(df, ev_ind1, event_inds[ev_num+1], pname_colmap)
print(df)

The code block has some comments regarding assumptions, etc.

Output df:

enter image description here