Create a dictionary from Pandas column conditioned on values from another column

108 Views Asked by At

I have a tennis dataset that looks like the following:

tourney_id = ['French Open 2018','French Open 2018','Wimbledon 2018','Wimbledon 2018','Australian Open 2019','Australian Open 2019','US Open 2019','US Open 2019']
player_name = ['Novak Djokovic','Roger Federer','Andy Murray','Rafael Nadal','John Isner','Novak Djokovic','Andy Murray','Roger Federer']
match_num = [103, 103, 217, 217, 104, 104, 243, 243]

df = pd.DataFrame(list(zip(tourney_id, player_name, match_num)),
            columns =['TournamentID','Name','MatchID'])

I want to create a dictionary where the keys are the players and the items are also the players (opponents). So it would look like the following based on my dataset:

{'Novak Djokovic': ['Roger Federer','John Isner'],
 'Roger Federer': ['Novak Djokovic','Andy Murray'],
 'Andy Murray': ['Rafael Nadal','Roger Federer'],
 'Rafael Nadal': ['Andy Murray'],
 'John Isner': ['Novak Djokovic']}

I want to identify players who have played each other when they have the same values for both TournamentID and MatchID.

The last thing I tried was: df.set_index(['TournamentID','MatchID'])['Name'].to_dict() but that's not quite what I'm looking for.

What can I try next?

5

There are 5 best solutions below

1
On BEST ANSWER

Here's how you can do it:

import pandas as pd

tourney_id = ['French Open 2018','French Open 2018','Wimbledon 2018','Wimbledon 2018','Australian Open 2019','Australian Open 2019','US Open 2019','US Open 2019']
player_name = ['Novak Djokovic','Roger Federer','Andy Murray','Rafael Nadal','John Isner','Novak Djokovic','Andy Murray','Roger Federer']
match_num = [103, 103, 217, 217, 104, 104, 243, 243]

df = pd.DataFrame(list(zip(tourney_id, player_name, match_num)),
                columns=['TournamentID', 'Name', 'MatchID'])

# Create a DataFrame with pairs of players in each match
df_pairs = pd.merge(df, df, how='inner', on=['TournamentID', 'MatchID'])
df_pairs = df_pairs[df_pairs['Name_x'] != df_pairs['Name_y']]  # Remove rows where a player is paired with themselves

# Group by each player and aggregate the opponents into a list
opponents_dict = df_pairs.groupby('Name_x')['Name_y'].agg(list).to_dict()

print(opponents_dict)

Output:

{
  "Andy Murray": [
    "Rafael Nadal",
    "Roger Federer"
  ],
  "John Isner": [
    "Novak Djokovic"
  ],
  "Novak Djokovic": [
    "Roger Federer",
    "John Isner"
  ],
  "Rafael Nadal": [
    "Andy Murray"
  ],
  "Roger Federer": [
    "Novak Djokovic",
    "Andy Murray"
  ]
}
0
On
opponents_dict = {key: group['Name'].tolist() 
for key, group in df.groupby(['TournamentID', 'MatchID'])['Name'] 
    if len(group) > 1}
        print(opponents_dict)
0
On

Using set operations:

out = {}

for _, g in df.groupby(['TournamentID', 'MatchID'])['Name']:
    for n in g:
        out.setdefault(n, set()).update(set(g)-{n})

Output:

{'Novak Djokovic': {'John Isner', 'Roger Federer'},
 'Roger Federer': {'Andy Murray', 'Novak Djokovic'},
 'John Isner': {'Novak Djokovic'},
 'Andy Murray': {'Rafael Nadal', 'Roger Federer'},
 'Rafael Nadal': {'Andy Murray'}}

You can also use networkx to compute a graph of each player's opponents, then loop over the nodes to get the direct neighbors:

import networkx as nx

G = nx.compose_all(nx.complete_graph(set(g)) for _, g in
                   df.groupby(['TournamentID', 'MatchID'])['Name'])

out = {n: list(nx.neighbors(G, n)) for n in G}

Output:

{'Novak Djokovic': ['Roger Federer', 'John Isner'],
 'Roger Federer': ['Novak Djokovic', 'Andy Murray'],
 'John Isner': ['Novak Djokovic'],
 'Andy Murray': ['Rafael Nadal', 'Roger Federer'],
 'Rafael Nadal': ['Andy Murray']}

Graph:

enter image description here

0
On

With collections.defaultdict object (to accumulate players in same match):

from collections import defaultdict

plays = defaultdict(list)
for _, (n1, n2) in df.groupby(['TournamentID','MatchID'])['Name']:
    plays[n1].append(n2)
    plays[n2].append(n1)

plays = dict(plays)
print(plays)

{'Andy Murray': ['Roger Federer', 'Rafael Nadal'],
 'John Isner': ['Novak Djokovic'],
 'Novak Djokovic': ['John Isner', 'Roger Federer'],
 'Rafael Nadal': ['Andy Murray'],
 'Roger Federer': ['Novak Djokovic', 'Andy Murray']}
0
On

One option using itertools.permutations:

s = df.groupby('MatchID')['Name'].agg(lambda x: list(itertools.permutations(x,2)))

d = {}
for l in s:
    for k,v in l:
        d.setdefault(k,[]).append(v)

Output:

{'Novak Djokovic': ['Roger Federer', 'John Isner'],
 'Roger Federer': ['Novak Djokovic', 'Andy Murray'],
 'John Isner': ['Novak Djokovic'],
 'Andy Murray': ['Rafael Nadal', 'Roger Federer'],
 'Rafael Nadal': ['Andy Murray']}