Data Head :
country isced retired age female couple saddepressed \
0 Belgium Isced-97 code 2 0 61 0 1 0
1 Belgium Isced-97 code 3 0 61 1 1 0
2 Belgium Isced-97 code 5 1 74 0 1 0
3 Belgium Isced-97 code 5 0 58 1 0 1
4 Belgium Isced-97 code 5 1 85 0 0 0
eligibleSR id age2 ... Estonia Croatia isced1 isced2 isced3 \
0 0 809 3721 ... 0 0 0 1 0
1 0 810 3721 ... 0 0 0 0 1
2 1 811 5476 ... 0 0 0 0 0
3 0 812 3364 ... 0 0 0 0 0
4 1 814 7225 ... 0 0 0 0 0
isced4 isced5 isced6 retired_female_interaction predicted_retired
0 0 0 0 0 0.232326
1 0 0 0 0 0.231212
2 0 1 0 0 0.963570
3 0 1 0 0 0.196197
4 0 1 0 0 1.079642
[5 rows x 37 columns]
Data info :
country Code isced retired age female couple saddepressed eligibleSR id ... Portugal Slovenia Estonia Croatia isced1 isced2 isced3 isced4 isced5 isced6
0 Belgium BE Isced-97 code 2 0 61 0 1 0 0 809 ... 0 0 0 0 0 1 0 0 0 0
1 Belgium BE Isced-97 code 3 0 61 1 1 0 0 810 ... 0 0 0 0 0 0 1 0 0 0
2 Belgium BE Isced-97 code 5 1 74 0 1 0 1 811 ... 0 0 0 0 0 0 0 0 1 0
3 Belgium BE Isced-97 code 5 0 58 1 0 1 0 812 ... 0 0 0 0 0 0 0 0 1 0
4 Belgium BE Isced-97 code 5 1 85 0 0 0 1 814 ... 0 0 0 0 0 0 0 0 1 0
5 rows × 36 columns
I want to create an OLS model with the code below, but I am not sure how to define the countries in the independent variables.
import statsmodels.api as sm
independent_vars = [
'Germany', 'Sweden', 'Netherland', 'Spain', 'Italy', 'France', 'Denmark',
'Greece', 'Switzerland', 'Belgium', 'Israel', 'CzechRepublic', 'Poland',
'Luxembourg', 'Hungary', 'Portugal', 'Slovenia', 'Estonia', 'Croatia',
'female', 'couple', 'retired', 'eligibleSR'
]
dependent_var = 'saddepressed'
X = sm.add_constant(data[independent_vars])
y = data[dependent_var]
model = sm.OLS(y, X).fit()
print(model.summary())
When I run the above badly I get an output like below.
OLS Regression Results
==============================================================================
Dep. Variable: saddepressed R-squared: 0.037
Model: OLS Adj. R-squared: 0.036
Method: Least Squares F-statistic: 60.78
Date: Fri, 24 Nov 2023 Prob (F-statistic): 4.04e-264
Time: 20:53:44 Log-Likelihood: -13913.
No. Observations: 35017 AIC: 2.787e+04
Df Residuals: 34994 BIC: 2.807e+04
Df Model: 22
Covariance Type: nonrobust
=================================================================================
coef std err t P>|t| [0.025 0.975]
---------------------------------------------------------------------------------
const -7.562e+10 2.34e+11 -0.324 0.746 -5.34e+11 3.82e+11
Germany 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Sweden 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Netherland 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Spain 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Italy 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
France 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Denmark 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Greece 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Switzerland 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Belgium 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Israel 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
CzechRepublic 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Poland 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Luxembourg 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Hungary 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Portugal 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Slovenia 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Estonia 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
Croatia 7.562e+10 2.34e+11 0.324 0.746 -3.82e+11 5.34e+11
female 0.0887 0.004 22.081 0.000 0.081 0.097
couple -0.0316 0.004 -7.262 0.000 -0.040 -0.023
retired 0.0272 0.007 3.717 0.000 0.013 0.041
eligibleSR -0.0113 0.007 -1.548 0.122 -0.026 0.003
==============================================================================
Omnibus: 9822.655 Durbin-Watson: 1.894
Prob(Omnibus): 0.000 Jarque-Bera (JB): 20454.670
Skew: 1.750 Prob(JB): 0.00
Kurtosis: 4.330 Cond. No. 9.78e+14
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 1.19e-25. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
Within the above research, how can I create the following question in my database?
Estimate with OLS a model relating depression (measured by the variable saddepressed ) with being retired, age, being a couple, gender, education and country of residence.
