Better reproductibility of rPackages (pin version of packages) in nix in comparison to guix

642 Views Asked by At

I'm actually evaluate different solution to enhance/explore reproductibility in my R/Python scientific workflow : data with reproductible analysis (plot, analysis) and paper.

There is, as you know, two big linux flavours offer some solutions : Nix and Guix

In nix, the way commonly described to develop with R is, for example, using rWrapper and rPackages :

pkgs.rWrapper.override{ packages = with pkgs.rPackages; [tidyverse rmarkdown]; };

My problem is (not so...) simple, like Python, R is well known to be a nightmare in term of reproducibility, even at middle term. For fun, you could try to run a ggplot2 code from 2 years with a recent version of R...

In order to propose a flake that produce the same result from the same data for a scientific paper, i'm interested to fix in derivation the version of R and the version of R packages used to compute analysis or plot.

{
description = "Generate R result from simulation";

inputs = {
    nixpkgs.url = "nixpkgs/nixos-20.09";
    utils.url = "github:numtide/flake-utils";

};

outputs = {self, nixpkgs, utils, mach-nix } : (utils.lib.eachDefaultSystem
    (system :
    let
        pkgs = nixpkgs.legacyPackages.${system};
        REnv = pkgs.rWrapper.override{ packages = with pkgs.rPackages; [tidyverse rmarkdown]; };

        buildRScripts = { stdenv, fetch,... }: stdenv.mkDerivation {
        name = "myscript";
        src = self;
        nativeBuildInputs = [ REnv ];
        dontBuild = true;
        buildInputs = [ pkgs.pandoc pkgs.unzip ];
 
        installPhase=''
            mkdir $out
            cd $out
            ${REnv}/bin/Rscript -e 'rmarkdown::render("test.Rmd")
        '';
  in {
      defaultPackage = self.packages.${system}.buildRScripts;
     }
  ));}

For example, how could i define more precisly that i want to use, to compile my test.Rmd, only the tidyverse 1.3.1 with R 4.1.O ? Even in 5 years ?

I found that Guix show different available packages/versions of R and tidyverse :

Version needed by tidyverse.1.3.1 are clearly presented :

With rPackages in Nix i search a way to achieve something similar, ie. a way to refer explicitly to version of R or R packages into derivation, but i didn't found it.

With rPackages here nix developper already offering a great fundation, but perhaps we need more ...

How we could, collectively achieve a better reproducibility on R packages with Nix ? I'm interested by any ideas ?

Perhaps we could fetch sources of packages directly from the cran archive and compile it ? For example with tidyverse :

Ps : i know that Nix and Guix are each partners with https://archive.softwareheritage.org/, a great way to archive and call cran package :

Ps : answer could also be added to https://nixos.wiki/wiki/R

Update 1

After discussion with some great people on nix discord, i understand that nix doesn't need version because flake.nix + flake.lock store hash (see nix flake metadata) that link my build and download with a very specific commit on nixpkgs.

But that don't solve :

  • the problem of the tar.gz sources linked/needed by this packages declared at this very specific commit by RPackages ? I suppose software heritage will help on this point ?
  • the common problem of incompatibility between some R version, and R version of packages. For example, you write a code with R 3.0.0 and tidyverse 1.2.3, you update your R version because some other packages need an update and only works with dependency available with R 3.2.0, but ahum, tidyverse 1.2.3 don't exist for R 3.2.0 ... Fixing version and access to old tar.gz resolve part of this problem i suppose.

How we define something like this using nix ?

Update 2

It seems someone build an unofficial index to help people searching old version of package Ex with tidyverse : https://lazamar.co.uk/nix-versions/?channel=nixpkgs-unstable&package=r-tidyverse

Thanks @dram for link and discussion on this.

0

There are 0 best solutions below