How to Run a Geographically Weighted Logistic Regression in R?

228 Views Asked by At

I have the following data set containing socioeconomic variables:

> glimpse(df)
Rows: 730,099
Columns: 9
$ id     <int> 25500, 25501, 25502, 25503, 25504, 25505, 25506, 25507, 25508, 25509, 25510, 25511, 25512, 25513, 25514, 25515, 25516, 255…
$ Prov   <fct> Al Hoceïma, Al Hoceïma, Al Hoceïma, Al Hoceïma, Al Hoceïma, Al Hoceïma, Al Hoceïma, Al Hoceïma, Al Hoceïma, Al Hoceïma, Al…
$ Age    <dbl> 45, 15, 65, 55, 35, 75, 45, 25, 55, 40, 35, 50, 70, 20, 30, 75, 50, 35, 50, 40, 35, 70, 70, 35, 35, 75, 55, 35, 25, 50, 30…
$ Edu    <dbl> 5, 4, 0, 0, 0, 0, 0, 16, 0, 5, 4, 0, 0, 6, 0, 0, 0, 0, 0, 0, 9, 0, 0, 14, 4, 0, 0, 3, 5, 0, 9, 0, 0, 0, 15, 0, 0, 0, 14, 3…
$ Kids   <int> 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 3, 4, 1, 0, 2, 0, 3, 2, 2, 1, 0, 5, 0, 2, 0, 0, 0, 1, 0, 2, 2, 1, 0, 5, 0, 0, 3, 0, 0, 1, 0,…
$ Mil    <fct> Urban, Rural, Rural, Rural, Rural, Rural, Rural, Rural, Rural, Rural, Rural, Rural, Urban, Urban, Rural, Urban, Urban, Rur…
$ DSize  <dbl+lbl> 2, 1, 4, 4, 1, 1, 3, 2, 3, 4, 1, 6, 3, 2, 5, 3, 4, 2, 4, 2, 3, 6, 3, 2, 4, 4, 3, 2, 2, 4, 1, 4, 6, 4, 5, 5, 4, 2, 5, 3…
$ taille <dbl> 2, 5, 5, 7, 5, 1, 2, 1, 4, 1, 5, 9, 5, 1, 5, 3, 7, 4, 6, 7, 2, 14, 3, 4, 1, 2, 9, 3, 3, 12, 4, 7, 2, 9, 2, 6, 5, 2, 4, 3, …
$ EP     <fct> 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1,…

The variable Prov indicates the 72 provinces of the country, it looks like this:

> levels(df$Prov)
 [1] "Agadir-Ida-Ou-Tanane"     "Al Haouz"                 "Al Hoceïma"               "Meknès"                   "Azilal"                  
 [6] "Béni Mellal"              "Benslimane"               "Berkane"                  "Berrechid"                "Boujdour"                
[11] "Boulemane"                "Casablanca"               "Chefchaouen"              "Chichaoua"                "Chtouka-Ait Baha"        
[16] "Driouch"                  "El Hajeb"                 "El Jadida"                "El Kelâa des Sraghna"     "Errachidia"              
[21] "Essaouira"                "Fahs-Anjra"               "Fès"                      "Figuig"                   "Fquih Ben Salah"         
[26] "Guelmim"                  "Guercif"                  "Ifrane"                   "Inezgane-Ait Melloul"     "Jerada"                  
[31] "Kénitra"                  "Khémisset"                "Khénifra"                 "Khouribga"                "Laâyoune"                
[36] "Larache"                  "Marrakech"                "Médiouna"                 "Midelt"                   "Mohammadia"              
[41] "Nador"                    "Nouaceur"                 "Ouarzazate"               "Ouezzane"                 "Oujda-Angad"             
[46] "Rabat"                    "Rehamna"                  "Safi"                     "Salé"                     "Sefrou"                  
[51] "Settat"                   "Sidi Bennour"             "Sidi Ifni"                "Sidi Kacem"               "Sidi Slimane"            
[56] "Skhirate-Témara"          "Tanger-Assilah"           "Taounate"                 "Taourirt"                 "Taroudannt"              
[61] "Tata"                     "Taza"                     "Tétouan"                  "M'Diq-Fnideq"             "Tinghir"                 
[66] "Tiznit"                   "Youssoufia"               "Zagora"                   "Moulay Yacoub"            "Tan-Tan / Assa-Zag"      
[71] "Es-Semara / Tarfaya"      "Oued Ed Dahab / Aousserd"

and I have another shapefile containing the location polygons for each province, it looks like this:

> map_3
Simple feature collection with 72 features and 1 field
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -17.10496 ymin: 20.7715 xmax: -0.9987581 ymax: 35.92243
Geodetic CRS:  WGS 84
# A tibble: 72 × 2
   PROV                                                                                                     geometry
 * <chr>                                                                                               <POLYGON [°]>
 1 Agadir-Ida-Ou-Tanane ((-9.504189 30.3498, -9.503555 30.34989, -9.503493 30.34991, -9.502746 30.35016, -9.50251...
 2 Al Haouz             ((-7.344661 31.67321, -7.344881 31.67344, -7.345 31.6736, -7.346125 31.67503, -7.346162 3...
 3 Al Hoceïma           ((-3.926041 35.26096, -3.926289 35.26113, -3.926532 35.26142, -3.926776 35.26171, -3.9270...
 4 Azilal               ((-5.908287 32.30598, -5.908415 32.3063, -5.913961 32.32005, -5.915668 32.32325, -5.92306...
 5 Benslimane           ((-7.195987 33.18768, -7.191716 33.18865, -7.169873 33.19075, -7.164904 33.19066, -7.1557...
 6 Berkane              ((-2.313083 34.58991, -2.312776 34.58996, -2.312381 34.59001, -2.308861 34.59052, -2.3085...
 7 Berrechid            ((-7.250729 33.21856, -7.249378 33.22028, -7.246216 33.22429, -7.245882 33.22472, -7.2454...
 8 Boujdour             ((-12.54495 24.46466, -12.49897 24.47041, -12.4321 24.48361, -12.38604 24.49452, -12.3777...
 9 Boulemane            ((-4.202123 32.58081, -4.200447 32.58095, -4.183846 32.58234, -4.17291 32.58488, -4.15931...
10 Casablanca           ((-7.669198 33.49706, -7.668895 33.49699, -7.66875 33.49705, -7.668693 33.49707, -7.66854...
# ℹ 62 more rows
# ℹ Use `print(n = ...)` to see more rows

When I want to run a simple logistic regression I simply run this command and it gives me the results I want:

> model <- glm(EP~Age+Edu+Kids+Mil+DSize+taille,family = binomial(link = "logit"), data = df)
> summary(model)

Call:
glm(formula = EP ~ Age + Edu + Kids + Mil + DSize + taille, family = binomial(link = "logit"), 
    data = df)

Coefficients:
              Estimate Std. Error  z value Pr(>|z|)    
(Intercept) -0.0040682  0.0165088   -0.246    0.805    
Age         -0.0060011  0.0002926  -20.511   <2e-16 ***
Edu         -0.1165767  0.0010738 -108.562   <2e-16 ***
Kids         0.0927567  0.0038183   24.293   <2e-16 ***
MilRural     1.5594703  0.0078742  198.048   <2e-16 ***
DSize       -0.4697581  0.0034187 -137.409   <2e-16 ***
taille      -0.1368148  0.0026621  -51.393   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 613335  on 719910  degrees of freedom
Residual deviance: 497842  on 719904  degrees of freedom
  (10188 observations deleted due to missingness)
AIC: 497856

Number of Fisher Scoring iterations: 6

Now I want to run a Geographically Weighted Logistic Regression, and for that I checked the GWModel package manual, and found the function ggwr.basic, which has an option for logistic regression.

DM<-gw.dist(dp.locat=coordinates(londonhp))
bw.f2 <- bw.ggwr(BATH2~FLOORSZ,data=londonhp, dMat=DM,family ="binomial")
res.binomial<-ggwr.basic(BATH2~FLOORSZ, bw=bw.f2,data=londonhp, dMat=DM,
              family ="binomial")

But in their example, the location variable is a point (X Y coordinates), whereas in my case the locations are polygons, also they have one observation per location (they use the LondonHP data set), whereas I have more than 730,000 observations scattered across 72 provinces.

Therefore, how can I run a Geographically Weighted Logistic Regression model on my data? I just gave the GWModel package as an example; if you know a different package or function that can get the job done, please feel free to use it.

1

There are 1 best solutions below

1
On

I guess the trick is to convert your data to a spatial format {GWmodel} can process. Please adapt below example to suit your purpose:

library(dplyr)
library(rnaturalearth)
provinces <- ne_states(country = 'Morocco', returnclass = 'sf')


## keep only name and geometry:
provinces <- provinces |> select(name)
## > provinces
## Simple feature collection with 16 features and 1 field
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -17.01374 ymin: 21.41997 xmax: -1.031999 ymax: 35.92652
## CRS:           +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 +towgs84=0,0,0
## First 10 features:
##                                      name                       geometry
## 58                    Guelmim - Es-Semara POLYGON ((-8.817035 27.6614...
## 60   Laâyoune - Boujdour - Sakia El Hamra POLYGON ((-12.05582 25.9958...
  • join attributes (model parameters) from other sources (I created some dummy data instead):
provinces <- 
    provinces |>
    mutate(geometry = st_centroid(geometry),
           x1 = rnorm(16), # predictor 1
           x2 = rnorm(16), # predictor 2
           Y = rbinom(16, 1, .5) # binomial outcome
           )
  • convert object provinces from class sf to class sp, which is required by {GWmodel} functions:
provinces <- as_Spatial(provinces)
  • done (technically at least):
the_model <- ggwr.basic(Y ~ x1 + x2,
                     data = provinces,
                     bw = 50, ## I just put a fixed bandwith here
                     family ="binomial"
                     )
## > summary(the_model)
##               Length Class                  Mode   
## GW.arguments  11     -none-                 list   
## GW.diagnostic  4     -none-                 list   
## glms          22     -none-                 list   
## SDF           16     SpatialPointsDataFrame S4     
## CV            16     -none-                 numeric
## timings        2     -none-                 list   
## this.call      5     -none-                 call