Writing a knitr document when data munging happens in functions

43 Views Asked by At

I have a large and complicated workflow (lots of initial inputs, recoding, merges, dropped observations, etc) in R and I do that work within many isolated functions specific to each input type, each merge and data manipulation step, etc. Right now only the final "analysis dataset" is returned into the global environment.

However, I want to write a knitr document that documents the data assembly process, but all of the various objects (data frames/tibbles) are local to the functions in which they are assembled, which I take as good practice.

The options seem to be:

  • I could generate lots of interim data objects to the global environment, but that would clutter the global environment, which I would like to keep neat

  • I could return lists of interesting attributes (N, merge success info, structures, etc) from the function to the global environment. A little neater, but not completely efficient.

This is clearly now a new problem. I would welcome suggestions on the best way(s) forward?

2

There are 2 best solutions below

0
On

Return objects with a class attribute, and define a print method for those classes. In the main document, print the objects. That's the standard R approach to this problem.

1
On

Have you considered using knitr::spin? There are three types of comments that are used to define how the end file will be rendered.

  1. # a standard R comment
  2. #' at the beginning of the line will be rendered as markdown
  3. #+ chunk options

By writing your data-assembly.R script and then calling knitr::spin("data-assembly.R") a .html file will be generated that may provide the needed detail.

Example data-assembly.R file:

#' # Data Assembly Process
#' This document provides details on the construction of the final analysis data
#' set.
#' 
#' The namespaces needed for this work are:
#+ message = FALSE
library(tidyverse)

#' Our first step is to read in the data sets.  For this example, we'll just use
#' the `mtcars` data set
mtcars

#' A summary of the `mtcars` data set is below
summary(mtcars)

#' Let's only use data records for cars with automatic transmissions
mt_am_cars <- dplyr::filter(mtcars, am == 1)
mt_am_cars