1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
|
**Nix for reproducible research**
Justin Bedő, Leon Di Stefano, and Tony Papenfuss
> A challenge for bioinformaticians is to make our computations reproducible — that is, easy to rerun, combine, share, and guaranteed to generate the same results.
> We show how Nix, a next generation cross-platform software deployment system, cleanly overcomes problems usually tackled with a combination of package managers (e.g., conda), containers (e.g., Docker, Singularity), and workflow engines (e.g., Toil, Ruffus).
>
> On its own Nix can be used as a package manager; it can also easily create isolated development environments and export portable containers to share with others.
> We have created a number of transparent and lightweight extensions that enable Nix to succinctly specify bioinformatics analysis environments and pipelines locally, in HPC environments, or in the cloud.
>
> Nix uses hash-based naming to ensure that what it builds is uniquely specified, isolation and completeness to ensure that its build processes are deterministic, and a simple programming language to ensure that the whole system is easy to manage.
> It has an extensive package collection, which includes all of CRAN and Bioconductor, and the conda package manager allowing access to Bioconda recipes.
> Nix is well supported and general-purpose software that has been in development for over 10 years.
>
> We will demonstrate how Nix with our extensions can be used to succinctly specify a typical bioinformatics pipeline and contrast this against other dedicated bioinformatics pipeline languages.
> We then show how it can be executed in whole or in part on an HPC queuing system
> Finally, we show that the pipeline can also be executed using cloud resources.
### Stuff to match in competitors
- **A few standard pipelines**
- Dealing with big files
- Slightly complicated analyses
- local, HPC, and cloud execution
- Resumable, parallel
- Bioconda import
### Points of difference
- **Full-stack reproduciblity with one tool**
- **A language rather than a configuration format (cf. CWL/Javascript)**
- Not bioinformatics-specific
- Mature (~10y)
- Containers obsolete (but easy to generate)
- Higher level of reproducibility overall (hashing of inputs, outputs, derivations)
- Safety
- Declarative language
- Type/tag system (to do)
### Weaknesses
- Small bioinformatics collection
- No build execution stats
- Subtleties around filesystems and the Nix store
|