avoiding/disabling lazy evaluation for pipeline processing

78 Views Asked by At

Imagine I have a set of functions for data processing, for example:

procA <- function(input){
  cat('\n Now processing #A') # message just to log pipeline flow 
  
  # Actual data processing, may include some diagnostic messaging:
  cat('\n #A: ', dim(input))
  input$procA <- 'procA'
  
  return(input)
}

procB <- function(input){
  cat('\n Now processing #B') # message just to log pipeline flow 
  
  # Actual data processing, may include some diagnostic messaging:
  cat('\n #B: ', dim(input))
  input$procB <- 'procB' 
  
  return(input)
}

procC <- function(input){
  cat('\n Now processing #C') # message just to log pipeline flow 
  
  # Actual data processing, may include some diagnostic messaging:
  cat('\n #C: ', dim(input))
  input$procC <- 'procC' 
  
  return(input)
}

And I combine them in a pipeline, for example:

data(iris)

iris_processed <-
  iris %>% 
  procA %>% 
  procB %>% 
  procC

Messaging output will be as following:

Now processing #C
Now processing #B
Now processing #A
#A: 150 5
#B: 150 6
#C: 150 7

Due to lazy evaluation, those log messages go in the opposite order which makes it harder for me to debug the pipeline. So far my solution is to add input <- eval(input) at the beginning of each function. Is there any better solution, any good practice standards, etc.?

1

There are 1 best solutions below

6
On BEST ANSWER

We can use the magrittr eager pipe. Note that a library(magrittr) is needed. It is not sufficent to just use library(dplyr).

library(magrittr)

iris_processed <-
  iris %!>% 
  procA %!>% 
  procB %!>% 
  procC

## Now processing #A
##  #A:  150 5
##  Now processing #B
##  #B:  150 6
##  Now processing #C
##  #C:  150 7>