Python PanelOLS different statistics with single categorical and multiple dummy columns

235 Views Asked by OJT At 28 July 2025 at 08:19

I am trying to balance check on a Pandas DataFrame using an OLS with entity fixed effects. An example DataFrame is below:

county	year	treatment_vs_control	age	gender
Jefferson	2022	1	24	M
Jackson	2022	1	31	M
Jefferson	2022	0	28	F
Jackson	2022	1	24	null
Adams	2022	0	72	F

First I try to run the model with the gender field as-is.

model_as_is = PanelOLS.from_formula(
    formula="treatment_vs_control ~ age + gender + EntityEffects",
    data=df
).fit()

model_as_is.summary

I get an F statistics of ~3.05 with a p value of 0.0001.

Then, I try to run the model with one-hot encoded dummy gender columns. The DataFrame looks like below:

county	year	treatment_vs_control	age	gender_m	gender_f
Jefferson	2022	1	24	1	0
Jackson	2022	1	31	1	0
Jefferson	2022	0	28	0	1
Jackson	2022	1	24	0	0
Adams	2022	0	72	0	1

My model now looks like:

model_dummy = PanelOLS(
    dependent = df["treatment_vs_control"], 
    exog = df[["age", "gender"]], 
    entity_effects=True, 
    time_effects=False,
).fit()

model_dummy.summary

My F statistic is now ~2.61 with a p value of 0.0002.

If I try to simply keep a single gender column but make it numeric instead of string-type, I get even a third statistical breakdown.

Why might this happen?

Original Q&A

Python PanelOLS different statistics with single categorical and multiple dummy columns

There are 0 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in REGRESSION

Related Questions in LINEAR-REGRESSION

Related Questions in LINEARMODELS

Trending Questions

Popular # Hahtags

Popular Questions