python reset mutable global dataframe between function calls

148 Views Asked by At

I have two dataframes, 'matches_df' and 'ratings_df'. The matches dataframe stores the players, date and winner of matches of a two player game. The rating dataframe stores the current rating each player, starting at an arbitrary value. I want to update this frame, before later reseting it.

matches_df

date | player_1 | player_2 | winner
1/11    'A'         'B'        'A'
2/11    'C'         'B'        'C'
3/11    'A'         'D'        'A'
4/11    'A'         'C'        'C'

ratings_df

player | rating
'A'       1000
'B'       1000
'C'       1000
'D'       1000

I have an algorithm update ratings that does the following (sudocode).

def update_ratings(match,parameter):
    #(1) use current ratings to predict the likelihood of either player winning the match 
    #(2) using the outcome of the match to update player ratings 
    #(3) update the two players current ratings in the global dataframe based on the result of the match. 
    #(4) Return the square of the forecast's prediction error.

I want to compare the performance of different parameter values in the model's predictive accuracy. However, I am struggling to either make copies of the 'ratings' dataframe or reset the ratings data frame between function calls. I am using the following code to calculate the performance of a given parameter value:

def calc_brier(parameter,matches_df):
    #reset dataframe to initial values (1000 for all players)
    start_ratings = np.repeat(1000.0,len(unique_players))
    ratings_df = pd.DataFrame(data=[start_ratings],columns=unique_players)
    brier = 0
    for index, row in matches_df.iterrows():
        brier += update_ratings(row,parameter)
    return brier

However, this does not give correct results. The global ratings dataframe is not reset upon calling the 'calc_brier' function, and as a result my calc_brier function is inconsistent if called multiple times with the same parameters. What should I do to either correctly reset the global ratings dataframe before/after calling 'calc_brier', or use an alternative structure to achieve my ultimate goal of comparing the performance of different parameter values?

1

There are 1 best solutions below

0
On BEST ANSWER

It works if I use a dictionary rather than a dataframe to store ratings. Here's the version that works (with the ratings df now a dictionary with names as keys and ratings as values initiated at 1000). Not sure what was wrong with original code.

def calc_brier(parameter):
    for player in unique_players:
        ratings_dict[player]=1000.0
    brier = 0
    for index, row in matches_df.iterrows():
        brier += update_ratings(row,k_factor)
    return brier