I am using R and the bupar package to do process analysis. Suppose my data stored in a csv file looks like this:
STATUS;timestamp;CASEID
created;16-02-2023 09:46:32;1
accepted;13-04-2023 23:59:59;1
created;16-02-2023 09:46:32;2
accepted;13-04-2023 23:59:59;2
created;14-12-2022 13:17:54;3
accepted;02-01-2023 23:59:59;3
created;28-02-2023 19:37:01;4
accepted;03-03-2023 23:59:59;4
created;02-01-2023 07:45:43;5
created;24-01-2022 16:05:58;6
accepted;03-02-2022 23:59:59;6
created;24-01-2022 15:52:53;7
accepted;03-02-2022 23:59:59;7
created;15-08-2022 12:54:23;8
rejected;18-08-2022 23:59:59;8
created;21-03-2022 15:32:05;9
accepted;26-04-2022 23:59:59;9
created;21-03-2022 15:42:39;10
Now when I run the following code I get the process map:
library(bupaR)
library(processmapR)
library(edeaR)
datafile <- read.csv(file="pathtofile\\testfile.csv",header=T, sep=";")
datafile$timestampcolumn <- as.POSIXct(datafile$timestamp, format="%d-%m-%Y %H:%M:%S")
print(datafile)
mytest <- simple_eventlog(datafile, case_id = "CASEID", activity_id = "STATUS", timestamp = "timestampcolumn")
process_map(mytest, type = frequency("absolute"))
mytest %>%
precedence_matrix(type = "absolute") %>%
plot
(I don't know why 9 is displayed for start created, it should be 10)
Now, I would like to have for example the mean displayed on the traces. The following output shows the desired process map:
I tried the following code (according to this post):
mytest %>%
process_map(type_nodes = frequency(value = "absolute_case"), type_edges = performance(FUN = mean, units = "days")) %>%
plot
or (according to this post)
mytest %>%
process_map(performance(mean, "days"),
type_nodes = performance(median, "days"),
sec_nodes = frequency("relative"),
type_edges = performance(median, "days"),
sec_edges = frequency("relative")) %>%
plot
But I get an error message:
Error in xy.coords(x, y, xlabel, ylabel, log)
So what is the correct code for this? I need mean, median and maximum.



This piece of code is correct, you just have to drop the "plot()" part
This code shows the performance median on both flows and nodes as primary data. As secondary label it shows the relative frequency.
Note that, as there is no difference between edges and nodes, you can do this also shorter.