diff options
author | l-d-s <distefano.l@wehi.edu.au> | 2018-09-06 19:40:04 +1000 |
---|---|---|
committer | GitHub <noreply@github.com> | 2018-09-06 19:40:04 +1000 |
commit | 0d7eba04fe4d2d932443b9dd3d37984002fa752c (patch) | |
tree | 29a7d0320e925f6e5864093a9ddbe619462da03d | |
parent | 0427cd1c270d3e64b368ed1da2ddef95816fab5a (diff) |
Take 2, sent to TP
-rw-r--r-- | abcbs_2018.md | 19 |
1 files changed, 8 insertions, 11 deletions
diff --git a/abcbs_2018.md b/abcbs_2018.md index 63bb0a7..4b913cd 100644 --- a/abcbs_2018.md +++ b/abcbs_2018.md @@ -1,15 +1,12 @@ -**Reproducible bioinformatics with Nix** +**Nix for reproducible research** -Justin Bedő, Leon Di Stefano, and Tony Papenfuss +_Justin Bedő, Leon Di Stefano, and Tony Papenfuss_ -> A cornerstone of science is reproducibility, the ability to independently verify experimental research. For bioinformatics to support scientific reproducibility, the computational portion of a research project has to be well specified and recomputable. However, it is difficult to guarantee reproducibility for a bioinformatics pipeline, in part due to the large number of software invoked, their complicated interactions, and the size of our data. Recent approaches such as containerisation does not solve this problem as it simply shifts the difficulty to managing containers instead of managing software artifacts. Furthermore, the execution of a pipeline is usually disjoint from the container construction, adding further management difficulties. -> We show how Nix, a next generation cross-platform software deployment system, can cleanly solve a number of reproducibility headaches in bioinformatics and computational biology. -> Nix uses hash-based naming to ensure that its builds are uniquely specified, isolation and completeness to ensure that they are deterministic, and a simple programming language to ensure that they are easily manageable. -> Nix is well supported and mature software with a large community that has been in development for over 10 years. -> -> With our transparent and lightweight extensions Nix succinctly describe computational pipelines, manage their execution in HPC environments or in parallel across a collection of machines, and produce containers (e.g., Docker, Singularity) to share with others. -> Nix has an extensive package collection that includes the whole of CRAN and Bioconductor, which can be leveraged in our pipelines. -> While Nix lacks Bioconda's coverage of standalone bioinformatics tools, we show that Bioconda can be used within Nix expressions, with some attendant loss of reproducibility. +> A challenge for bioinformaticians is to make our computations reproducible — that is, easy to rerun, combine, and share. We show how Nix, a next generation cross-platform software deployment system, cleanly overcomes problems usually tackled with a combination of package managers (conda), containers (Docker, Singularity), and workflow engines (Toil, Ruffus). > -> In our talk we will use Nix to specify a typical bioinformatics pipeline, and show how it can be executed in whole or in part on an HPC queuing system. +> On its own Nix can be used as a package manager; it can also easily create isolated development environments and export portable containers to share with others. But with a small number of transparent and lightweight extensions, we are also able to use Nix to succinctly specify bioinformatics pipelines and to manage their execution — whether locally, in HPC environments, or in the cloud. +> +> Nix uses hash-based naming to ensure that what it builds is uniquely-specified, isolation and completeness to ensure that its build processes are deterministic, and a very simple programming language to ensure that the whole system is easy to manage. It has an extensive package collection which includes all of CRAN and Bioconductor, and while it lacks Bioconda’s coverage of standalone bioinformatics tools, we show that Bioconda packages can be called from within Nix expressions, with some attendant loss of reproducibility. Nix is well supported and general-purpose software that has been in development for over 10 years. +> +> In our talk we will use Nix to specify a typical bioinformatics pipeline, and show how it can be executed in whole or in part on an HPC queuing system, or in the cloud. |