Retrieving actual source expression that defines a S4 Reference Class from its associated object

434 Views Asked by At

In short (actual question)

How can I access the actual source code/expression that defines a S4 Reference Class (see ?setRefClass) from the object returned by either getClass("MyClass") or getRefClass("MyClass") (so after it has been sourced, not by investigating the actual source file)?

The homework I've done

As everything is an object in R, I can retrieve the source code/expression of

1) A regular function by simply investigating the respective object:

foo <- function(x) print(x)

> foo
function(x) print(x)

2) A formal method by getting the method object of a specific method via selectMethod:

setGeneric(name="myMethod", signature=c("x"),
    def=function(x) standardGeneric("myMethod")       
)
setMethod(
    f="myMethod", 
    signature=signature(x="numeric"), 
    definition=function(x) print(x)
)
def <- selectMethod(f="myMethod", signature=c(x="numeric"))

# Get actual source code/expression
> attributes(def)$srcref
function(x) print(x)

But things seem to be different for S4 Reference Classes:

setRefClass(Class="MyClass", fields=list(x.1="character"))

def <- getRefClass("MyClass")

# Inspect object >> no expression
> def
Generator object for class "MyClass":

Class fields:

Name:        x.1
Class: character

 Class Methods:  
    "callSuper", "copy", "export", "field", "getClass", "getRefClass", "import", 
"initFields", "show", "trace", "untrace"


 Reference Superclasses:  
    "envRefClass"

def.temp <- attributes(attributes(def)$.xData$def)

# Inspect attributes >> no expression
> attributes(def.temp)
$names
 [1] "fieldClasses"    "fieldPrototypes" "refMethods"      "refSuperClasses"
 [5] "slots"           "contains"        "virtual"         "prototype"      
 [9] "validity"        "access"          "className"       "package"        
[13] "subclasses"      "versionKey"      "sealed"          "class"          

# Alternatively:
> names(attributes(getClass("MyClass")))
 [1] "fieldClasses"    "fieldPrototypes" "refMethods"      "refSuperClasses"
 [5] "slots"           "contains"        "virtual"         "prototype"      
 [9] "validity"        "access"          "className"       "package"        
[13] "subclasses"      "versionKey"      "sealed"          "class"  

I can't seem to find the attribute where the actual source code/expression that exactly defines the class is stored.

Just to make sure: this expression is what I'd like to access

setRefClass(Class="MyClass", fields=list(x.1="character"))

Background/Motivation

I work a lot with S4 Reference Classes (?setRefClass) and thus OOP aspects like class inheritance play a big role in my daily work. I also follow a "one def per file" paradigm in order to keep things organized, so the various class defs are stored in separate files where the file names correspond to the names of the respective classes.

As with everything in life, there are some advantages, but also some inherent disadvantages to such an approach:

Aspect 1

Over short or long, you'll end up with an inheritance structure that doesn't match the alphabetical file order of the individual source files anymore. So simply sourcing one file after another will result in an error at a particular point where some required superclass hasn't been sourced yet.

dir.create("classes", showWarnings=FALSE)
write("setRefClass(Class=\"A\", contains=\"B\", fields=list(x.3=\"logical\"))", 
    file="classes/class_A.R")
write("setRefClass(Class=\"B\", contains=\"C\", fields=list(x.2=\"numeric\"))", 
    file="classes/class_B.R")
write("setRefClass(Class=\"C\", fields=list(x.1=\"character\"))", 
    file="classes/class_C.R")

class_A.R is the first file in folder classes, but in order to source it we first need to source class_B.R (as this file defines class B) which in turn requires class C and thus the prior sourcing of class_C.R.

The correct collation thus is:

c("class_C.R", "class_B.R", "class_A.R")

Aspect 2

For certain tasks you do want/need a "multiple defs per file" paradigm: quick & easy distribution of necessary objects/functions/classes to worker processes when parallalizing, organizing code when actually building a package etc.

path <- "classes/classes.R"
file.create(path)
write("setRefClass(Class=\"C\", fields=list(x.1=\"character\"))", 
    file=path, append=TRUE)
write("setRefClass(Class=\"B\", contains=\"C\", fields=list(x.2=\"numeric\"))", 
    file=path, append=TRUE)
write("setRefClass(Class=\"A\", contains=\"B\", fields=list(x.3=\"logical\"))", 
    file=path, append=TRUE)

Ad Aspect 1

I don't like the idea of keeping some sort of manual collation reference that specifies the correct sourcing order as I think that's a job the computer can easily do for me (finding out the correct collation). The only thing you need to do here is to find out each class' superclasses (sort of its dependencies), then retrieving the correct collation is a piece of cake.

EDIT

In case someone's interested: I did come up with a working approach for this. Just drop me a line in case you'd like to see some code. It's based on parsing (without evaluating) the respective class def source files in order to investigate the value of the contains argument which lists the superclasses. The whole process is then recursively repeated for the source files of those superclasses until you eventually end up with the correct collation. It's not that time consuming either.

Here's the outline:

x <- list.files("classes", full.names=TRUE)    
code <- base::parse(file=x[1])

> code 
expression(setRefClass(Class="A", contains="B", fields=list(x.3="logical")))

superclasses <- eval(code[[1]][["contains"]])
> superclasses
[1] "B"

# Next: investigate source file for class 'B'

Ad Aspect 2

I also don't like manual copy & pasting, so I implemented a routine that allows me to consolidate source code either stored in individual files or drawn from the respective objects to a single "consolidated" file (via deparse(<source_code_expression>) and write(..., append=TRUE)). As for classes, the correct collation also matters here, otherwise there'll be errors again when you try to source the consolidated file.

For both aspects it'd be nice to be able to choose how to get to the actual source code/expression for classes/functions/methods:

  1. either based on investigating the code stored in the respective source files (parse(file=*))
  2. or based on accessing the desired information directly from the respective object.

The second option would be the link to the actual question above.

2

There are 2 best solutions below

1
On

Class definitions

Since we can't evaluate the files (because we don't know the collation order), source or exploring the defined classes is not on the table. Here we parse the text in each file into a character vector, keeping the parsed lines that start with 'setRefClass'. The parser strips white space and does other transformations to make the text more uniformly formatted, though the following will rely on consistent class definition (e.g., using named arguments).

fls <- dir()
names(fls) <- fls
p0 <- lapply(fls, function(fl) as.character(parse(text=readLines(fl))))
p1 <- lapply(p0, grep, pattern="^setRefClass", value=TRUE)

Let's aim for a data.frame containing the class being defined, the contained class, and the file name in which the class definition occurs

df <- data.frame(Class=NA_character_, contains=NA_character_,
                 File=rep(fls, sapply(p1, length)),
                 stringsAsFactors=FALSE)

and then fill it using pattern matching / subsitution

p2 <- unlist(p1, use.names=FALSE)
pat <- '.*Class = "([[:alnum:]]+)".*'
df[,"Class"] <- sub(pat, "\\1", p2)

pat <- '.*contains = "([[:alnum:]]+)".*'
idx <- grep(pat, p2)
df[idx,"contains"] <- sub(pat, "\\1", p2[idx])

Ending with (I added a class A1, also derived from B, to the class_A.R file, for fun)

> df
  Class contains      File
1     A        B class_A.R
2    A1        B class_A.R
3     B        C class_B.R
4     C     <NA> class_C.R

Another strategy for harvesting the Class / contains data frame is to intercept the calls to setRefClass

gClass <- character()
gcontains <- character()
setRefClass <- function(Class, fields, contains=NA_character_, ...)
{
    gClass <<- c(gClass, Class)
    gcontains <<- c(gcontains, contains)
}

gClass and gcontains will contain the relevant data for constructing the dependency graph after source'ing the relevant files (assuming the files can be sourced without the class definitions being available).

for (fl in dir()) source(fl)

Dependencies

What we'd like is the dependency graph, for the classes that have dependencies. So we'll use the graph and RBGL packages from Bioconductor to construct an appropriate graph

## From --> To == contains --> Class
m <- as.matrix(df[!is.na(df$contains), c("contains", "Class")])
gr <- ftM2graphNEL(m, edgemode="directed")

then ask for a breadth-first search starting at each of our base package (df[is.na(df$contains), "Class"]), and use the resulting order to retrieve the appropriate collation order

o <- bfs(gr, "C")                       # order: breadth-first search
unique(df[match(o, df$Class), "File"])

So

classDefFilesCollateOrder <- function(fls)
{
    names(fls) <- fls
    p0 <- lapply(fls, function(fl) as.character(parse(text=readLines(fl))))
    p1 <- lapply(p0, grep, pattern="^setRefClass", value=TRUE)

    df <- data.frame(Class=NA_character_, contains=NA_character_,
                     File=rep(fls, sapply(p1, length)),
                     stringsAsFactors=FALSE)

    p2 <- unlist(p1, use.names=FALSE)
    pat <- '.*Class = "([[:alnum:]]+)".*'
    df[,"Class"] <- sub(pat, "\\1", p2)

    pat <- '.*contains = "([[:alnum:]]+)".*'
    idx <- grep(pat, p2)
    df[idx, "contains"] <- sub(pat, "\\1", p2[idx])

    ## From --> To == contains --> Class
    m <- as.matrix(df[!is.na(df$contains), c("contains", "Class")])
    gr <- ftM2graphNEL(m, edgemode="directed")

    ## single class only
    base <- df$Class[is.na(df$contains)]
    if (length(base) != 1)
        stop("don't yet know how to deal with more than 1 base class")
    o <- bfs(gr, base)

    unique(df[match(o, df$Class), "File"])
}
3
On

The "source" code of the command isn't stored so you won't see it by inspecting the object.

Take a look at the source of setRefClass by typing it at the console and hitting [ENTER]. Notice that what you have done is just passed arguments to a function... not defined a new expression. So, when you getRefClass you are getting everything that the class knows about itself.

You could rebuild it by creating a parseRefClassArgs function that rebuilds the arguments for setRefClass.