/EDIT: My problem has been resolved. It turns out it was indeed an error unrelated to parallel
.
Hi StackOverflow Community,
today I have a rather odd problem. I have a function which I am calling to run on 8 cores, and it has a parameter called type
. Running it with type = "points"
works perfectly, but I am concerned about running it with type = "lines"
. Even though it follows almost the same code, at some point the script crashes, and all what I get is (after almost 2 days of running!):
Warning message:
In mclapply(cluster_times, the_nt_function, all_trips, ellipses, :
all scheduled cores encountered errors in user code
Error: unexpected symbol in:
"
endtime"
Execution halted
The first thing I checked is, of course, the code of the_nt_function
. I provide a summary of the code below.
The important things here, are:
- The code for type = "points" runs perfectly and gives the expected output
- The code for type = "lines" seems to run perfectly for nearly 2 days until the error stated above appears
- The error stated above is inexplicable: there is no 'endtime' or anything in the code below!
the_nt_function
takes some inputs, and for each row in times_df
it creates a kind of sample of dat
. If type="points"
, it then counts either how many (geographical) points in dat
are in the ell
object, using st_intersection
from sf
. If type="lines"
, it basically does the same, but instead of counting points intersecting with the ell
object, it counts lines intersecting with the ell
object.
I let the code run on a high performance cluster (but only using 8 cores). On my computer, it runs fine on 7 cores, at least for a test set of seven entries in the times_df
object. It does not crash but gives the expected output.
I believe that the error is not in the_nt_function
. I've googled it and looked for similar problems here on stackoverflow, but the only thing I found so far is this (which I am now trying out of desperation).
Do you have any ideas, what this error wants to tell me?
/Edit: I call the function in parallel by:
system.time(
out <- mclapply(cluster_times, the_nt_function, all_trips, ellipses, TRUE, opts$type, mc.cores = cpus)
)
where I give the arguments in correct order.
the_nt_function <- function(times_df, dat, sf_object, type) {
times_df$N_t = as.numeric(NA)
# Here is code that performs some kind of preprocessing and filtering down data, basically creating
# the data.table dat_filt_spc
for every row in times_df {
# do some more filtering and end up with a data table called dat_filt_time
if(type == "points"){
# Convert filtered DT to sf object (using startloc)
suppressWarnings(
dat_start_sf <- st_as_sf(
as.data.frame(dat_filt_time),
coords = c("startloclon", "startloclat"),
crs = 4326
)
)
} else if(type == "lines") {
suppressWarnings(
dat_start_sf <- st_as_sf(
as.data.frame(dat_filt_time),
crs = 4326
)
)
}
# Intersect startlocs with ellipse
suppressMessages(
start_intersect <- st_intersection(dat_start_sf, ell)$tripid
)
# Convert filtered DT to sf object (using endloc) and filter out trips which are already intersected
if(type == "points"){
suppressWarnings(
dat_end_sf <- st_as_sf(
as.data.frame(dat_filt_time[!tripid %in% start_intersect]),
coords = c("endloclon", "endloclat"),
crs = 4326
)
)
# Intersect endlocs with ellipse
suppressMessages(
end_intersect <- st_intersection(dat_end_sf, ell)$tripid
)
# concatenate start and endloc intersections, assess Nt and add to times_df
trips_intersect <- unique(c(start_intersect, end_intersect))
} else if(type == "lines"){
trips_intersect <- start_intersect
}
times_df[i, "N_t"] <- length(trips_intersect)
}
return(times_df)
}
}