In short (actual question)
How can I access the actual source code/expression that defines a S4 Reference Class (see ?setRefClass
) from the object returned by either getClass("MyClass")
or getRefClass("MyClass")
(so after it has been sourced, not by investigating the actual source file)?
The homework I've done
As everything is an object in R, I can retrieve the source code/expression of
1) A regular function by simply investigating the respective object:
foo <- function(x) print(x)
> foo
function(x) print(x)
2) A formal method by getting the method object of a specific method via selectMethod
:
setGeneric(name="myMethod", signature=c("x"),
def=function(x) standardGeneric("myMethod")
)
setMethod(
f="myMethod",
signature=signature(x="numeric"),
definition=function(x) print(x)
)
def <- selectMethod(f="myMethod", signature=c(x="numeric"))
# Get actual source code/expression
> attributes(def)$srcref
function(x) print(x)
But things seem to be different for S4 Reference Classes:
setRefClass(Class="MyClass", fields=list(x.1="character"))
def <- getRefClass("MyClass")
# Inspect object >> no expression
> def
Generator object for class "MyClass":
Class fields:
Name: x.1
Class: character
Class Methods:
"callSuper", "copy", "export", "field", "getClass", "getRefClass", "import",
"initFields", "show", "trace", "untrace"
Reference Superclasses:
"envRefClass"
def.temp <- attributes(attributes(def)$.xData$def)
# Inspect attributes >> no expression
> attributes(def.temp)
$names
[1] "fieldClasses" "fieldPrototypes" "refMethods" "refSuperClasses"
[5] "slots" "contains" "virtual" "prototype"
[9] "validity" "access" "className" "package"
[13] "subclasses" "versionKey" "sealed" "class"
# Alternatively:
> names(attributes(getClass("MyClass")))
[1] "fieldClasses" "fieldPrototypes" "refMethods" "refSuperClasses"
[5] "slots" "contains" "virtual" "prototype"
[9] "validity" "access" "className" "package"
[13] "subclasses" "versionKey" "sealed" "class"
I can't seem to find the attribute where the actual source code/expression that exactly defines the class is stored.
Just to make sure: this expression is what I'd like to access
setRefClass(Class="MyClass", fields=list(x.1="character"))
Background/Motivation
I work a lot with S4 Reference Classes (?setRefClass
) and thus OOP aspects like class inheritance play a big role in my daily work. I also follow a "one def per file" paradigm in order to keep things organized, so the various class defs are stored in separate files where the file names correspond to the names of the respective classes.
As with everything in life, there are some advantages, but also some inherent disadvantages to such an approach:
Aspect 1
Over short or long, you'll end up with an inheritance structure that doesn't match the alphabetical file order of the individual source files anymore. So simply sourcing one file after another will result in an error at a particular point where some required superclass hasn't been sourced yet.
dir.create("classes", showWarnings=FALSE)
write("setRefClass(Class=\"A\", contains=\"B\", fields=list(x.3=\"logical\"))",
file="classes/class_A.R")
write("setRefClass(Class=\"B\", contains=\"C\", fields=list(x.2=\"numeric\"))",
file="classes/class_B.R")
write("setRefClass(Class=\"C\", fields=list(x.1=\"character\"))",
file="classes/class_C.R")
class_A.R
is the first file in folder classes
, but in order to source it we first need to source class_B.R
(as this file defines class B
) which in turn requires class C
and thus the prior sourcing of class_C.R
.
The correct collation thus is:
c("class_C.R", "class_B.R", "class_A.R")
Aspect 2
For certain tasks you do want/need a "multiple defs per file" paradigm: quick & easy distribution of necessary objects/functions/classes to worker processes when parallalizing, organizing code when actually building a package etc.
path <- "classes/classes.R"
file.create(path)
write("setRefClass(Class=\"C\", fields=list(x.1=\"character\"))",
file=path, append=TRUE)
write("setRefClass(Class=\"B\", contains=\"C\", fields=list(x.2=\"numeric\"))",
file=path, append=TRUE)
write("setRefClass(Class=\"A\", contains=\"B\", fields=list(x.3=\"logical\"))",
file=path, append=TRUE)
Ad Aspect 1
I don't like the idea of keeping some sort of manual collation reference that specifies the correct sourcing order as I think that's a job the computer can easily do for me (finding out the correct collation). The only thing you need to do here is to find out each class' superclasses (sort of its dependencies), then retrieving the correct collation is a piece of cake.
EDIT
In case someone's interested: I did come up with a working approach for this. Just drop me a line in case you'd like to see some code. It's based on parsing (without evaluating) the respective class def source files in order to investigate the value of the contains
argument which lists the superclasses. The whole process is then recursively repeated for the source files of those superclasses until you eventually end up with the correct collation. It's not that time consuming either.
Here's the outline:
x <- list.files("classes", full.names=TRUE)
code <- base::parse(file=x[1])
> code
expression(setRefClass(Class="A", contains="B", fields=list(x.3="logical")))
superclasses <- eval(code[[1]][["contains"]])
> superclasses
[1] "B"
# Next: investigate source file for class 'B'
Ad Aspect 2
I also don't like manual copy & pasting, so I implemented a routine that allows me to consolidate source code either stored in individual files or drawn from the respective objects to a single "consolidated" file (via deparse(<source_code_expression>)
and write(..., append=TRUE)
). As for classes, the correct collation also matters here, otherwise there'll be errors again when you try to source the consolidated file.
For both aspects it'd be nice to be able to choose how to get to the actual source code/expression for classes/functions/methods:
- either based on investigating the code stored in the respective
source files (
parse(file=*)
) - or based on accessing the desired information directly from the respective object.
The second option would be the link to the actual question above.
Class definitions
Since we can't evaluate the files (because we don't know the collation order),
source
or exploring the defined classes is not on the table. Here we parse the text in each file into a character vector, keeping the parsed lines that start with 'setRefClass'. The parser strips white space and does other transformations to make the text more uniformly formatted, though the following will rely on consistent class definition (e.g., using named arguments).Let's aim for a data.frame containing the class being defined, the contained class, and the file name in which the class definition occurs
and then fill it using pattern matching / subsitution
Ending with (I added a class A1, also derived from B, to the class_A.R file, for fun)
Another strategy for harvesting the Class / contains data frame is to intercept the calls to setRefClass
gClass
andgcontains
will contain the relevant data for constructing the dependency graph after source'ing the relevant files (assuming the files can be sourced without the class definitions being available).Dependencies
What we'd like is the dependency graph, for the classes that have dependencies. So we'll use the graph and RBGL packages from Bioconductor to construct an appropriate graph
then ask for a breadth-first search starting at each of our base package (
df[is.na(df$contains), "Class"]
), and use the resulting order to retrieve the appropriate collation orderSo