Here are reproducible codes:
import pandas as pd
# Outer is entity, inner is time
entity = list(map(chr,range(65,91)))
time = list(pd.date_range('1-1-2014',freq='A', periods=4))
index = pd.MultiIndex.from_product([entity, time])
df = pd.DataFrame(np.random.randn(26*4, 2),index=index, columns=['y','x'])
from linearmodels.panel import PanelOLS
mod = PanelOLS(df.y, df.x, entity_effects=True)
res = mod.fit(cov_type='clustered', cluster_entity=True)
print(res)
This yields result of
-0.1425 and
0.1396 for parameter estimation and SE estimation.
df = df.reset_index()
lm = smf.ols('y ~ x - 1 + C(level_0)', df).fit(cov_type='cluster', cov_kwds={'groups': df['level_0']})
print(lm.params['x'], lm.bse['x'])
This yields results of -0.14249279008084645 and 0.16390753835717325, which are not even close for the SE estimated values.
partial answer
statsmodels cluster robust standard errors have an "use_correction" option which makes the standard errors very close but still different.
I am using a random seed for reproducibility
np.random.seed(9865378)And linearmodels has a
auto_df=Falsefit option that brings it's standard errors close to those of statsmodels default at 2 decimals.