OLS Regression with dummy variable

63 Views Asked by At

enter image description here

Data Head :

   country            isced  retired  age  female  couple  saddepressed  \
0  Belgium  Isced-97 code 2        0   61       0       1             0   
1  Belgium  Isced-97 code 3        0   61       1       1             0   
2  Belgium  Isced-97 code 5        1   74       0       1             0   
3  Belgium  Isced-97 code 5        0   58       1       0             1   
4  Belgium  Isced-97 code 5        1   85       0       0             0   

   eligibleSR   id  age2  ...  Estonia  Croatia  isced1  isced2  isced3  \
0           0  809  3721  ...        0        0       0       1       0   
1           0  810  3721  ...        0        0       0       0       1   
2           1  811  5476  ...        0        0       0       0       0   
3           0  812  3364  ...        0        0       0       0       0   
4           1  814  7225  ...        0        0       0       0       0   

   isced4  isced5  isced6  retired_female_interaction  predicted_retired  
0       0       0       0                           0           0.232326  
1       0       0       0                           0           0.231212  
2       0       1       0                           0           0.963570  
3       0       1       0                           0           0.196197  
4       0       1       0                           0           1.079642  

[5 rows x 37 columns]

Data info :

country Code    isced   retired age female  couple  saddepressed    eligibleSR  id  ... Portugal    Slovenia    Estonia Croatia isced1  isced2  isced3  isced4  isced5  isced6
0   Belgium BE  Isced-97 code 2 0   61  0   1   0   0   809 ... 0   0   0   0   0   1   0   0   0   0
1   Belgium BE  Isced-97 code 3 0   61  1   1   0   0   810 ... 0   0   0   0   0   0   1   0   0   0
2   Belgium BE  Isced-97 code 5 1   74  0   1   0   1   811 ... 0   0   0   0   0   0   0   0   1   0
3   Belgium BE  Isced-97 code 5 0   58  1   0   1   0   812 ... 0   0   0   0   0   0   0   0   1   0
4   Belgium BE  Isced-97 code 5 1   85  0   0   0   1   814 ... 0   0   0   0   0   0   0   0   1   0
5 rows × 36 columns

I want to create an OLS model with the code below, but I am not sure how to define the countries in the independent variables.

import statsmodels.api as sm
   
    independent_vars = [
        'Germany', 'Sweden', 'Netherland', 'Spain', 'Italy', 'France', 'Denmark', 
        'Greece', 'Switzerland', 'Belgium', 'Israel', 'CzechRepublic', 'Poland', 
        'Luxembourg', 'Hungary', 'Portugal', 'Slovenia', 'Estonia', 'Croatia', 
        'female', 'couple', 'retired', 'eligibleSR'
    ]
    
    
    dependent_var = 'saddepressed'
    
    X = sm.add_constant(data[independent_vars])
    y = data[dependent_var]
    
    model = sm.OLS(y, X).fit()
    
    print(model.summary())

When I run the above badly I get an output like below.

                            OLS Regression Results                            
==============================================================================
Dep. Variable:           saddepressed   R-squared:                       0.037
Model:                            OLS   Adj. R-squared:                  0.036
Method:                 Least Squares   F-statistic:                     60.78
Date:                Fri, 24 Nov 2023   Prob (F-statistic):          4.04e-264
Time:                        20:53:44   Log-Likelihood:                -13913.
No. Observations:               35017   AIC:                         2.787e+04
Df Residuals:                   34994   BIC:                         2.807e+04
Df Model:                          22                                         
Covariance Type:            nonrobust                                         
=================================================================================
                    coef    std err          t      P>|t|      [0.025      0.975]
---------------------------------------------------------------------------------
const         -7.562e+10   2.34e+11     -0.324      0.746   -5.34e+11    3.82e+11
Germany        7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Sweden         7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Netherland     7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Spain          7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Italy          7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
France         7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Denmark        7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Greece         7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Switzerland    7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Belgium        7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Israel         7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
CzechRepublic  7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Poland         7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Luxembourg     7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Hungary        7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Portugal       7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Slovenia       7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Estonia        7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
Croatia        7.562e+10   2.34e+11      0.324      0.746   -3.82e+11    5.34e+11
female            0.0887      0.004     22.081      0.000       0.081       0.097
couple           -0.0316      0.004     -7.262      0.000      -0.040      -0.023
retired           0.0272      0.007      3.717      0.000       0.013       0.041
eligibleSR       -0.0113      0.007     -1.548      0.122      -0.026       0.003
==============================================================================
Omnibus:                     9822.655   Durbin-Watson:                   1.894
Prob(Omnibus):                  0.000   Jarque-Bera (JB):            20454.670
Skew:                           1.750   Prob(JB):                         0.00
Kurtosis:                       4.330   Cond. No.                     9.78e+14
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 1.19e-25. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

Within the above research, how can I create the following question in my database?

Estimate with OLS a model relating depression (measured by the variable saddepressed ) with being retired, age, being a couple, gender, education and country of residence.

0

There are 0 best solutions below