aboutsummaryrefslogtreecommitdiff
path: root/abcbs_2018.md
blob: a8adc45ad0133e0c77ad21749949bd65bd17eef7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
**Nix for reproducible research**

Justin Bedő, Leon Di Stefano, and Tony Papenfuss

> A challenge for bioinformaticians is to make our computations reproducible — that is, easy to rerun, combine, share, and guaranteed to generate the same results.
> We show how Nix, a next generation cross-platform software deployment system, cleanly overcomes problems usually tackled with a combination of package managers (e.g., conda), containers (e.g., Docker, Singularity), and workflow engines (e.g., Toil, Ruffus).
> 
> On its own Nix can be used as a package manager; it can also easily create isolated development environments and export portable containers to share with others.
> We have created a number of transparent and lightweight extensions that enable Nix to succinctly specify bioinformatics analysis environments and pipelines locally, in HPC environments, or in the cloud.
> 
> Nix uses hash-based naming to ensure that what it builds is uniquely specified, isolation and completeness to ensure that its build processes are deterministic, and a simple programming language to ensure that the whole system is easy to manage.
> It has an extensive package collection, which includes all of CRAN and Bioconductor, and the conda package manager allowing access to Bioconda recipes.
> Nix is well supported and general-purpose software that has been in development for over 10 years.
> 
> We will demonstrate how Nix with our extensions can be used to succinctly specify a typical bioinformatics pipeline and contrast this against other dedicated bioinformatics pipeline languages.
> We then show how it can be executed in whole or in part on an HPC queuing system
> Finally, we show that the pipeline can also be executed using cloud resources.

### Stuff to match in competitors

-   **A few standard pipelines**
-   Dealing with big files
-   Slightly complicated analyses
-   local, HPC, and cloud execution
-   Resumable, parallel
-   Bioconda import

### Points of difference

-   **Full-stack reproduciblity with one tool**
-   **A language rather than a configuration format (cf. CWL/Javascript)**
-   Not bioinformatics-specific
-   Mature (~10y)
-   Containers obsolete (but easy to generate)
-   Higher level of reproducibility overall (hashing of inputs, outputs, derivations)
-   Safety
    -   Declarative language
    -   Type/tag system (to do)

### Weaknesses

-   Small bioinformatics collection
-   No build execution stats
-   Subtleties around filesystems and the Nix store