I would like to create a geographic heatmap to depict the frequency distribution of a genetic lineage.
My data file contains four columns, including location, latitude, longitude, and frequency of the genetic lineage. There are many more rows.
| Location | Latitude | Longitude | Frequency |
|---|---|---|---|
| 01 | 43.03879 | 42.72047 | 0.1304 |
| 02 | 38.58569 | 68.76037 | 0.0500 |
| 03 | 42.87779 | 74.60669 | 0.0500 |
... ...
Here is my R script for producing the heatmap:
library(ggplot2)
library(ggmap)
library(RColorBrewer)
# retrieving the freq data
freq <- read.csv("freq_data.csv", sep = ',', header = TRUE,
strip.white = TRUE)
# defining the map bounds
map_bounds <- c(left = min(freq$Longitude) - 7,
right = max(freq$Longitude) + 7,
top = max(freq$Latitude) + 7,
bottom = min(freq$Latitude) - 7)
# create a base map using Stadia Maps
base_map <- get_stadiamap(map_bounds, zoom = 3, scale = 2,
maptype = "stamen_terrain_background")
# convert the map into a ggmap object
ggmap_map <- ggmap(base_map, extent="device", legend="none")
# add heatmap layer
ht_map <- ggmap_map + geom_density2d(data = freq,
aes(x = Longitude,
y = Latitude))
ht_map <- ht_map + stat_density2d(data = freq,
aes(x = Longitude,
y = Latitude,
color = Frequency,
fill = after_stat(level),
alpha = after_stat(level)),
geom = "polygon")
# define the contour color
ht_map01 <- ht_map + scale_fill_gradientn(colors = rev(brewer.pal(7, "Spectral")))
# add freq info
ht_map02 <- ht_map01 + geom_point(data = freq,
aes(x = Longitude, y = Latitude),
fill="salmon",
shape=21,
size = freq$Frequency*100,
alpha=0.8)
This is the heatmap produced by the script.
The resulted geographic heatmap, however, is not what I expected, because it highlighted the geographic area that contains many clumped sites. The ideal heatmap should instead highlight the sites that have higher frequencies (given as large circles in the heatmap).
What changes should be made to the script an ideal geographic heatmap? I would really appreciate your responses!