I'm actually evaluate different solution to enhance/explore reproductibility in my R/Python scientific workflow : data with reproductible analysis (plot, analysis) and paper.
There is, as you know, two big linux flavours offer some solutions : Nix and Guix
In nix, the way commonly described to develop with R is, for example, using rWrapper
and rPackages
:
pkgs.rWrapper.override{ packages = with pkgs.rPackages; [tidyverse rmarkdown]; };
My problem is (not so...) simple, like Python, R is well known to be a nightmare in term of reproducibility, even at middle term. For fun, you could try to run a ggplot2 code from 2 years with a recent version of R...
In order to propose a flake that produce the same result from the same data for a scientific paper, i'm interested to fix in derivation the version of R and the version of R packages used to compute analysis or plot.
{
description = "Generate R result from simulation";
inputs = {
nixpkgs.url = "nixpkgs/nixos-20.09";
utils.url = "github:numtide/flake-utils";
};
outputs = {self, nixpkgs, utils, mach-nix } : (utils.lib.eachDefaultSystem
(system :
let
pkgs = nixpkgs.legacyPackages.${system};
REnv = pkgs.rWrapper.override{ packages = with pkgs.rPackages; [tidyverse rmarkdown]; };
buildRScripts = { stdenv, fetch,... }: stdenv.mkDerivation {
name = "myscript";
src = self;
nativeBuildInputs = [ REnv ];
dontBuild = true;
buildInputs = [ pkgs.pandoc pkgs.unzip ];
installPhase=''
mkdir $out
cd $out
${REnv}/bin/Rscript -e 'rmarkdown::render("test.Rmd")
'';
in {
defaultPackage = self.packages.${system}.buildRScripts;
}
));}
For example, how could i define more precisly that i want to use, to compile my test.Rmd
, only the tidyverse 1.3.1 with R 4.1.O ? Even in 5 years ?
I found that Guix show different available packages/versions of R and tidyverse :
Version needed by tidyverse.1.3.1 are clearly presented :
With rPackages
in Nix
i search a way to achieve something similar, ie. a way to refer explicitly to version of R or R packages into derivation, but i didn't found it.
With rPackages here nix developper already offering a great fundation, but perhaps we need more ...
How we could, collectively achieve a better reproducibility on R packages with Nix ? I'm interested by any ideas ?
Perhaps we could fetch sources of packages directly from the cran archive and compile it ? For example with tidyverse :
- https://cran.r-project.org/web/packages/tidyverse/index.html,
- https://cran.r-project.org/src/contrib/Archive/tidyverse/ ?
Ps : i know that Nix and Guix are each partners with https://archive.softwareheritage.org/, a great way to archive and call cran package :
- https://guix.gnu.org/fr/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/
- https://www.tweag.io/blog/2020-06-18-software-heritage/
Ps : answer could also be added to https://nixos.wiki/wiki/R
Update 1
After discussion with some great people on nix discord, i understand that nix doesn't need version because flake.nix + flake.lock
store hash (see nix flake metadata) that link my build and download with a very specific commit on nixpkgs.
But that don't solve :
- the problem of the tar.gz sources linked/needed by this packages declared at this very specific commit by RPackages ? I suppose software heritage will help on this point ?
- the common problem of incompatibility between some R version, and R version of packages. For example, you write a code with R 3.0.0 and tidyverse 1.2.3, you update your R version because some other packages need an update and only works with dependency available with R 3.2.0, but ahum, tidyverse 1.2.3 don't exist for R 3.2.0 ... Fixing version and access to old tar.gz resolve part of this problem i suppose.
How we define something like this using nix ?
Update 2
It seems someone build an unofficial index to help people searching old version of package Ex with tidyverse : https://lazamar.co.uk/nix-versions/?channel=nixpkgs-unstable&package=r-tidyverse
Thanks @dram for link and discussion on this.