R visualization: sensible repel points on map (beeswarm?)

423 Views Asked by At

I'm trying to replicate approximately a map like this. desc

It depicts a small number of items (schools) spread across an area. For input I have the map of areas with a number for each of them. I would like to lay that out into a that many points around the area. It would be even better if they wouldn't diffuse across area boundaries, but simply distributing them is enough. Some nice repel points within area might work.

Beeswarm plots do something quite similar, could this be done on a map. Bonus question - in fact I've been looking to animate this, but can only think of very complicated ways to do this, so that new points are added as sum nrs increase in time.

The code below places the points in centroids on the map, and takes the number as a size. (I was unable to export the map properly as a single file, so coordinates are a bit messed up, but principle is the same.)

places = st_read("https://gist.githubusercontent.com/peeter-t2/9646a4169e993948fa97f6f503a0688b/raw/cb4e910bf153e51e3727dc9d1c73dd9ef86d2556/kih1897m.geojson", stringsAsFactors = FALSE)

schools <- read_tsv("https://gist.github.com/peeter-t2/34467636b3c1017e89f33284d7907b42/raw/6ea7dd6c005ef8577b36f5e84338afcb6c76b707/school_nums.tsv")
schools_geo <- merge(places,schools,by.x="KIHELKOND",by.y="Kihelkond") #94 matches

p<- schools_geo %>% 
  ggplot()+
  geom_sf(data=schools_geo)+
  geom_sf(data=st_centroid(schools_geo),aes(size=value))+
  theme_bw()
p

Thanks!

2

There are 2 best solutions below

3
On BEST ANSWER

As I noted in my comments, the when I read in the file it is setting the crs to lat/lon (epsg: 4326) while the geometry column is a different crs. I have guessed that the correct crs is espg: 3301 and proceeded on that basis which seems to work fine.

st_crs(schools_geo) <- 3301

We can use st_sample to get a sample of points within the polygons in relation to our 'value' column:

# we can set type = 'hexagonal', 'regular' or 'random'
school_pts <- schools_geo %>% st_sample(size = .$value, type = 'hexagonal')


schools_geo %>% 
  ggplot()+
  geom_sf()+
  geom_sf(data=school_pts, size = .8)+
  theme_bw()

This produces the following plot which I think looks messy due to the fact st_sample spreads the points out to the extents of the polygons.

enter image description here

It might look nicer to have the points more centered in each of the polygons like in the example you posted. To do that we could rescale the polygons depending on the number of points we want to plot within them. In the code below, I shrink the polygons by 90% if they have the least points inside (1) and by 20% if they have the most points (27).

# put values on scale between 0 and 1
scale_fact <- (max(schools_geo$value) -  schools_geo$value) / (max(schools_geo$value) - min(schools_geo$value)) 
# re-scale between 0.2 and 0.9
scale_fact <- scale_fact * (0.9 - 0.2) + 0.2
# reverse the scale 
scale_fact <-  max(scale_fact) + min(scale_fact) - scale_fact 

# apply the scale factor
schools_centroid <- st_geometry(st_centroid(schools_geo))
schools_geo_rescaled <- (st_geometry(schools_geo) - schools_centroid) * scale_fact + schools_centroid

school_pts <- schools_geo_rescaled %>% 
  st_sf(crs = 3301) %>% 
  bind_cols(value = schools_geo$value) %>%
  st_sample(size = .$value, type = 'hexagonal')


# plot
schools_geo %>% 
  ggplot()+
  geom_sf()+
  geom_sf(data=school_pts, size = .8)+
  theme_bw()

enter image description here

2
On

it is not an easy problem. I decided to simplify it choosing only one area not all. In theory, the solution is reproductible for all your areas.

We first import our library

library(rgdal)
library(sf)
library(readr)
library(ggplot2)

We use the proposed data :

places <- st_read("https://gist.githubusercontent.com/peeter-t2/9646a4169e993948fa97f6f503a0688b/raw/cb4e910bf153e51e3727dc9d1c73dd9ef86d2556/kih1897m.geojson", stringsAsFactors = FALSE)

schools <- read_tsv("https://gist.github.com/peeter-t2/34467636b3c1017e89f33284d7907b42/raw/6ea7dd6c005ef8577b36f5e84338afcb6c76b707/school_nums.tsv")
schools_geo <- merge(places,schools,by.x="KIHELKOND",by.y="Kihelkond") #94 matches

We select one state

one <- places$geometry[[1]]

We split the polygon in several sub-polygon thanks to a grid

grid <- st_make_grid(one, n = c(10, 10))
area <- st_area(grid)
grid <- st_as_sf(data.frame(ID=c(1:length(area)),
                            area=area,
                            geometry=grid))
tmp <- st_intersection(grid, one)
tmp$area <- st_area(tmp)

We display all centroids of our grid made by small square

plot(st_geometry(tmp['area']))
plot(st_geometry(st_centroid(tmp['area'])),
     pch = 16, col = 'red', add = TRUE)

At the end we want to keep only the number of points you want which is nbr an equivalent of value (number of schools) in your example.

nbr <- 20
plot(st_geometry(one))
plot(st_geometry(st_centroid(tmp[order(tmp$area, decreasing = T),][1:nbr,])),
     pch = 16, col = 'red', add = TRUE)

I hope it will help you. enter image description here