Removing object metadata in R

878 Views Asked by At

I'm writing some code to anonymize an R dataset in such a way that it strips any useful information out of the data while preserving the structure that would be important for running regressions, etc on it. I want to be sure I've removed all possible places any telling information about the data could be hiding. My process so far is:

  1. Replace variable names of data frame with uninformative names (x1, x2, ...)
  2. Turn all categorical variables into factors with simple numerical levels
  3. Scale and center all numerical variables (except logical or 0/1)
  4. Use attributes(x) <- NULL to strip things like variable labels added through haven, etc.

I'm trying to keep my tin foil hat on when specing out this procedure. Have I covered all my bases, or is there some other way information about the data contents could be hiding in my dataset?

NB: I'm specifically asking about whether I have removed all the information explicitly contained in the R objects. For example, a novice R user who didn't know about attributes might think that steps 1 - 3 on their own were sufficient to strip an object of readable information. I would like to ascertain whether there are other features I might potentially need to strip out. The question of whether there's any telling information in the structure of the data itself is pertinent to my broader task but out of scope for this site, and I imagine there could be reams written on it.

0

There are 0 best solutions below