All the code and datasets are avalable at this github repo
I am currently working on a workflow for Species Distribution Modeling using the targets R package. I’ve encountered two issues in a specific part of my workflow. Firstly, I am downloading presences in parallel using the crew package since the actual dataset consists of around 40,000 species. I have provided the relevant code below:
library(targets)
source("R/functions.R")
library(crew)
tar_option_set(
packages = c("readr", "SDMWorkflows", "janitor", "data.table"),
controller = crew_controller_local(workers = 6),
error = "null"
)
list(
tar_target(file, "First_10_species.csv", format = "file"),
# Read the file
tar_target(data, get_data(file)),
# Filter the species to only plants
tar_target(Only_Plants, filter_plants(data)),
# Parallelize and retrieve species presences for species within Denmark
tar_target(Presences,
get_plant_presences(Only_Plants),
pattern = map(Only_Plants)
),
# summarize the number of presences per species
tar_target(Presence_summary, summarise_presences(Presences),
pattern = map(Presences)
),
# Filter to only the species that have 5 presences
tar_target(Over_5, Filter_Over_5(Presence_summary))
)
The SDMWorkflows package is a package I made that you can install by using this code
remotes::install_github("Sustainscapes/SDMWorkflows")
The accompanying function script (R/functions.R) is as follows:
get_data <- function(file) {
readr::read_csv(file) |>
janitor::clean_names()
}
filter_plants <- function(df) {
result <- df |>
dplyr::filter(kingdom == "Plantae") |>
dplyr::pull(species) |>
unique() |>
head(10)
return(result)
}
get_plant_presences <- function(species) {
SDMWorkflows::GetOccs(
Species = unique(species),
WriteFile = FALSE,
Log = FALSE,
country = "DK",
limit = 100000,
year = "1999,2023"
)
}
summarise_presences <- function(df) {
Sum <- as.data.table(df)[, .N, keyby = .(family, genus, species)]
return(Sum)
}
Filter_Over_5 <- function(DT) {
DT[N > 5]
}
While the workflow appears to be working well, some species summaries are showing errors. The errors are documented in the following table and figure
name | error |
---|---|
Presence_summary_24c8afe2 | object genus not found |
Presence_summary_7044ad96 | object genus not found |
Presence_summary_a8f163ad | object genus not found |
Presence_summary_c7ecffc9 | object genus not found |
knitr::include_graphics("PlotTarget.png")

These errors are expected for species that did not present presences within Denmark. However, the summary appears fine, and from the initial 10 presences, it generates a data.table with 6 species, as illustrated in this table:
family | genus | species | N |
---|---|---|---|
Pinaceae | Abies | Abies cephalonica | 1 |
Pinaceae | Abies | Abies koreana | 3 |
Pinaceae | Abies | Abies nordmanniana | 1130 |
Pinaceae | Abies | Abies sibirica | 14 |
Pinaceae | Abies | Abies veitchii | 2 |
Thuidiaceae | Abietinella | Abietinella abietina | 9 |
I have two specific questions:
Addressing Errors in summarise_presences: Despite the errors, the results of summarise_presences are as expected. How can I eliminate these errors from the summary?
Filtering Species in Presences for Plotting: Suppose I want to use the results of Presences to plot coordinates with a function like PlotPres, but I only want to include species that appear in the Over_5 object. How can I achieve this mapping, considering that the species have names instead of branches?
PlotPres <- function(df) {
G <- ggplot(df, aes(x = decimalLongitude, y = decimalLatitude)) +
geom_point() +
theme_bw()
print(G)
}
as you can see if I do this for branch 6 it works
PlotPres(tar_read("Presences", branches = 6)[[1]])
Session info
Because session_info
is TRUE
, the rendered result includes session info, even though no such code is included here in the source document.
Warning: program compiled against libxml 210 using older 209
Warning messages:
1: In normalizePath(Sys.getenv("TMPDIR", Sys.getenv("TMP"))) :
path[1]="": No such file or directory
2: In normalizePath(Sys.getenv("TMPDIR", Sys.getenv("TMP"))) :
path[1]="": No such file or directory
3: In normalizePath(Sys.getenv("TMPDIR", Sys.getenv("TMP"))) :
path[1]="": No such file or directory
Session info
sessioninfo::session_info()
#;-) ─ Session info ───────────────────────────────────────────────────────────────
#;-) setting value
#;-) version R version 4.3.2 (2023-10-31)
#;-) os Ubuntu 20.04.6 LTS
#;-) system x86_64, linux-gnu
#;-) ui X11
#;-) language en_US:en
#;-) collate en_US.UTF-8
#;-) ctype en_US.UTF-8
#;-) tz Europe/Copenhagen
#;-) date 2023-11-29
#;-) pandoc 2.19.2 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#;-)
#;-) ─ Packages ───────────────────────────────────────────────────────────────────
#;-) package * version date (UTC) lib source
#;-) backports 1.4.1 2021-12-13 [1] CRAN (R 4.3.0)
#;-) base64url 1.4 2018-05-14 [1] CRAN (R 4.3.2)
#;-) callr 3.7.3 2022-11-02 [3] CRAN (R 4.2.2)
#;-) cli 3.6.1 2023-03-23 [3] CRAN (R 4.2.3)
#;-) codetools 0.2-19 2023-02-01 [4] CRAN (R 4.2.2)
#;-) colorspace 2.1-0 2023-01-23 [3] CRAN (R 4.2.2)
#;-) curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1)
#;-) data.table * 1.14.8 2023-02-17 [1] CRAN (R 4.3.0)
#;-) digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1)
#;-) dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.3.2)
#;-) evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.2)
#;-) fansi 1.0.5 2023-10-08 [1] CRAN (R 4.3.1)
#;-) farver 2.1.1 2022-07-06 [3] CRAN (R 4.2.1)
#;-) fastmap 1.1.1 2023-02-24 [3] CRAN (R 4.2.2)
#;-) fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1)
#;-) generics 0.1.3 2022-07-05 [1] CRAN (R 4.3.0)
#;-) ggplot2 * 3.4.4 2023-10-12 [3] CRAN (R 4.3.1)
#;-) glue 1.6.2 2022-02-24 [3] CRAN (R 4.1.2)
#;-) gtable 0.3.4 2023-08-21 [3] CRAN (R 4.3.1)
#;-) highr 0.10 2022-12-22 [1] CRAN (R 4.3.0)
#;-) htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.2)
#;-) igraph 1.5.1 2023-08-10 [3] CRAN (R 4.3.1)
#;-) knitr 1.45 2023-10-30 [1] CRAN (R 4.3.2)
#;-) labeling 0.4.3 2023-08-29 [3] CRAN (R 4.3.1)
#;-) lifecycle 1.0.4 2023-11-07 [3] CRAN (R 4.3.2)
#;-) magrittr 2.0.3 2022-03-30 [3] CRAN (R 4.1.3)
#;-) munsell 0.5.0 2018-06-12 [3] CRAN (R 4.0.0)
#;-) pillar 1.9.0 2023-03-22 [3] CRAN (R 4.2.3)
#;-) pkgconfig 2.0.3 2019-09-22 [3] CRAN (R 4.0.0)
#;-) png 0.1-8 2022-11-29 [1] CRAN (R 4.3.0)
#;-) processx 3.8.2 2023-06-30 [1] CRAN (R 4.3.1)
#;-) ps 1.7.5 2023-04-18 [1] CRAN (R 4.3.0)
#;-) purrr 1.0.2 2023-08-10 [3] CRAN (R 4.3.1)
#;-) R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.0)
#;-) R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0)
#;-) R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.0)
#;-) R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.0)
#;-) R6 2.5.1 2021-08-19 [3] CRAN (R 4.1.1)
#;-) reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0)
#;-) rlang 1.1.2 2023-11-04 [1] CRAN (R 4.3.2)
#;-) rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1)
#;-) rstudioapi 0.15.0 2023-07-07 [3] CRAN (R 4.3.1)
#;-) scales 1.3.0 2023-11-28 [1] CRAN (R 4.3.2)
#;-) sessioninfo 1.2.2 2021-12-06 [3] CRAN (R 4.1.2)
#;-) styler 1.10.0 2023-05-24 [1] CRAN (R 4.3.0)
#;-) targets * 1.3.2 2023-10-12 [1] CRAN (R 4.3.2)
#;-) tibble 3.2.1 2023-03-20 [3] CRAN (R 4.3.1)
#;-) tidyselect 1.2.0 2022-10-10 [1] CRAN (R 4.3.0)
#;-) utf8 1.2.4 2023-10-22 [1] CRAN (R 4.3.1)
#;-) vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1)
#;-) withr 2.5.2 2023-10-30 [1] CRAN (R 4.3.2)
#;-) xfun 0.41 2023-11-01 [1] CRAN (R 4.3.2)
#;-) xml2 1.3.5 2023-07-06 [1] CRAN (R 4.3.1)
#;-) yaml 2.3.7 2023-01-23 [1] CRAN (R 4.3.1)
#;-)
#;-) [1] /home/au687614/R/x86_64-pc-linux-gnu-library/4.3
#;-) [2] /usr/local/lib/R/site-library
#;-) [3] /usr/lib/R/site-library
#;-) [4] /usr/lib/R/library
#;-)
#;-) ──────────────────────────────────────────────────────────────────────────────