Is there a way to run a loop for the pingouin.anova analysis?

306 Views Asked by At

I am sure this is simple but I am still learning python. I need help figuring out how to iterate over columns in a pandas dataframe and run the pingouin analysis for each. As of now, I can run

pg.anova(data=df, dv='variable1', between='Group', detailed=True)

While I get the results that I want, I have 180 variables and so to be able to automate this would go a long way. If there were a way to also add the p-value results as a vector to the dataframe, I would be most grateful. Alternatively, being able to save the results in another file will be fine as long as I can tie the anova results to each variable name.

1

There are 1 best solutions below

0
On BEST ANSWER

The results from anova are pandas Dataframes, so it's easy to append these into a single DataFrame, you only have to create a new variable to identify each dependent variable:

import pingouin as pg
import pandas as pd

# load data
df = pg.read_dataset('mixed_anova')
# list of dependent variables
dep_vars = ['Scores', 'Subject']
# List with anova results 
list_results = []

for dv in dep_vars:
  # run anova and create dv variable to identify dependent variable
  aov = pg.anova(data=df, dv=dv, between='Group', detailed=True)\
    .assign(dv=dv)
  # append to list of results
  list_results.append(aov)

# concat all results into a DataFrame
df_results = pd.concat(list_results, axis=0)

# Export to Excel
df_results.to_excel('results.xlsx')
df_results

anova results

The DataFrame df_results contains all the Anova results and the variable dv help you to identify the dependent variable. The results are also exported to Excel. If you want only the p-values and dep variable, then you can filter this df:

df_results.query('Source=="Group"').filter(['p-unc', 'dv'])

enter image description here