I have a Pandas DataFrame like (abridged):
age | gender | control | county | |
---|---|---|---|---|
11877 | 67.0 | F | 0 | AL-Calhoun |
11552 | 60.0 | F | 0 | AL-Coosa |
11607 | 60.0 | F | 0 | AL-Talladega |
13821 | NaN | NaN | 1 | AL-Mobile |
11462 | 59.0 | F | 0 | AL-Dale |
I want to run a linear regression with fixed effects by county entity (not by time) to balance check my control and treatment groups for an experimental design, such that my dependent variable is membership in the treatment group (control
= 1) or not (control
= 0).
In order to do this, so far as I have seen I need to use linearmodels.panel.PanelOLS
and set my entity field (county
) as my index.
So far as I'm aware my model should look like this:
# set index on entity effects field:
to_model = to_model.set_index(["county"])
# implement fixed effects linear model
model = PanelOLS.from_formula("control ~ age + gender + EntityEffects", to_model)
When I try to do this, I get the below error:
ValueError: The index on the time dimension must be either numeric or date-like
I have seen a lot of implementations of such models online and they all seem to use a temporal effect, which is not relevant in my case. If I try to encode my county
field using numerics, I get a different error.
# create a dict to map county values to numerics
county_map = dict(zip(to_model["county"].unique(), range(len(to_model.county.unique()))))
# create a numeric column as alternative to county
to_model["county_numeric"] = to_model["county"].map(county_map)
# set index on numeric entity effects field
to_model = to_model.set_index(["county_numeric"])
FactorEvaluationError: Unable to evaluate factor `control`. [KeyError: 'control']
How am I able to implement this model using the county
as a unit fixed effect?
Assuming you have multiple entries for each
county
, then you could use the following. The key step is to use agroupby
transform
to create a distinct numeric index for each county which can be used as a fake time index.This just shows that the model can be estimated.