From d12f635279ec1b43a57293c9c95919a489606d8c Mon Sep 17 00:00:00 2001 From: Justin Bedo Date: Tue, 11 Jun 2019 16:44:36 +1000 Subject: rewrite README documentation Significant expansion of README to include installation instructions and instructions on how to use with HPC. --- README.md | 122 ++++++++++++++++++++++++++++++++++++++++++++++++++------------ 1 file changed, 100 insertions(+), 22 deletions(-) (limited to 'README.md') diff --git a/README.md b/README.md index 8858a25..be73a22 100644 --- a/README.md +++ b/README.md @@ -1,40 +1,118 @@

BioNix

-BioNix is a tool for reproducible bioinformatics that unifies workflow engines, package managers, and containers. -It is implemented as a lightweight library on top of the [Nix](https://nixos.org/nix/) deployment system. +BioNix is a tool for reproducible bioinformatics that unifies workflow +engines, package managers, and containers. It is implemented as a +lightweight library on top of the [Nix](https://nixos.org/nix/) +deployment system. BioNix is currently a work in progress, so documentation is sparse. +Please get in contact with us for more information, help, and +contributing (see bottom of this page). -## Getting started - -Install [Nix](http://nixos.org/nix): +## Installation +BioNix requires no dependencies beyond [Nix](http://nixos.org/nix), +which may be installed by: ```{sh} curl https://nixos.org/nix/install | sh ``` -To run a sample pipeline, clone this project and run `nix-build` in the `/examples` directory: +If you do not have root access a variety of [rootless +install](https://nixos.wiki/wiki/Nix_Installation_Guide#Installing_without_root_permissions) +options are available. -```{sh} -$ git clone https://github.com/PapenfussLab/bionix -$ cd examples -$ nix-build -``` +API docs can be generated by executing `nix build` in the `doc` +directory and viewing `result/OEBPS/index.html`. + +## Examples + +Several examples are available in `./examples/`. The main example is +presented in `./examples/default.nix` and can be built using `nix build` +in `./examples/`. This sample pipeline performs variant calling using +[`platypus`](https://github.com/andyrimmer/Platypus), alignment using +[`bwa mem`](https://github.com/lh3/bwa), and preprocessing using +[`samtools`](http://www.htslib.org/). + +See the documentation in `./examples/README.md` for more detail about +this pipeline and the other examples. + +- The pipeline itself is specified in `examples/call.nix` and + `examples/default.nix`. +- The BioNix wrapper to run `platypus` is in + `tools/platypus-callVariants.nix`. +- The Nix expression for the `platypus` software itself can be found in + [nixpkgs](https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/science/biology/platypus/default.nix). + +## Constructing workflows + +Writing workflows requires some familiarity with the Nix +programming language and deployment system. Good introductions can be +found [here](https://learnxinyminutes.com/docs/nix/) and +[here](https://ebzzry.io/en/nix/). + +To understand how to construct workflows it is recommended to study the +examples provided. Thanks to the flexibility of Nix, the workflows can +be constructed in different ways to suit the intended purposes and the +examples illustrate some of the ways one might approach various +problems. + +For constructing tool wrappers, take a look in the `./tools/` +directory for the currently existing tool wrappers. A good starting +point are the wrappers for BWA. + +## HPC execution + +BioNix supports submission of jobs to computing queues rather than +directly building them using the Nix build engine. The two supported +engines are Slurm and PBS represented by the `slurm` and `qsub` entries +in the root BioNix tree, which take an attribute set of default +parameters to a new tree of tools. Simply use tools out of these trees +to submit jobs, and specify resource requirements as ordinary +configuration options to the tools. + +The following resource parameters can be specified: + +- *ppn*: The number of cores to request; +- *mem*: The amount of memory to request (GB); +- *walltime*: A string defining the maximum walltime. + +As we rely on side effects to submit jobs sandbox builds cannot be used +and must be disabled (`--option sandbox false` with `nix-build` or +`--no-sandbox` with `nix build`). + +### Slurm specifics + +Slurm jobs are submitted by executing the `salloc` binary on the +cluster. By default this is assumed to be `/usr/bin/salloc`; if this is +not the case on your cluster then you need to additionally specify the +path to salloc via the `salloc` parameter. -The sample pipeline performs variant calling using [`platypus`](https://github.com/andyrimmer/Platypus), alignment using [`bwa mem`](https://github.com/lh3/bwa), and preprocessing using [`samtools`](http://www.htslib.org/). -BioNix will download or build all of the necessary software and create a soft link (`result`) to the workflow output. +When launching the build, it is important that the `TMPDIR` +environment variable points to a location which is on shared storage +(i.e., available from all nodes). This will be the location used for +temporary files during the execution of stages. -Next, check out the code: +### PBS specifics -- The pipeline itself is specified in `examples/call.nix` and `examples/default.nix`. -- The BioNix wrapper to run `platypus` is in `tools/platypus-callVariants.nix`. -- The Nix expression for the `platypus` software itself can be found in [nixpkgs](https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/science/biology/platypus/default.nix). +The PBS wrapper is considerably more complicated as initiating +interactive processes is not as reliable as Slurm's `salloc`. +Consequently, jobs are submitted via non-interactive queue submissions +and the queue polled to determine when the submitted job has completed. -BioNix pipelines can be easily wrapped in shell scripts: see `examples/ex-tnpair/tnpair` for an example script that accepts a reference fasta, along with paired normal and tumor fastq files, and performs alignment, preprocessing, and variant calling with [`strelka`](https://github.com/Illumina/strelka). +The path to the PBS executables (i.e., `qsub` and `qstat`) has to be +given in the `qsubPath` attribute. Furthermore, a temporary directory +that's shared across all nodes must be specified in `tmpDir`. -Writing your own pipelines requires some familiarity with the Nix programming language and deployment system. Good introductions can be found [here](https://learnxinyminutes.com/docs/nix/) and [here](https://ebzzry.io/en/nix/). +## Distributed execution -We have successfully run BioNix pipelines in a zero-install manner (using a [statically linked binary](https://matthewbauer.us/blog/static-nix.html) and [user namespaces](https://www.redhat.com/en/blog/whats-next-containers-user-namespaces)), but this feature is currently unstable. Stay tuned! +Nix has support for distributing jobs amongst a collection of +distributed machines. See the +[manual](https://nixos.org/nix/manual/#chap-distributed-builds) and +[wiki](https://nixos.wiki/wiki/Distributed_build) for more information. -## Contact +## Getting help and contributing -Please come chat with us at [#bionix:cua0.org](http://matrix.to/#/#bionix:cua0.org). +For general questions, issues, and +[contributing](https://git-send-email.io), please +[email](mailto:bionix@cua0.org) or [subscribe +to](mailto:bionix+subscribe@cua0.org) our mailing list. You may also +chat with us at [#bionix:cua0.org](http://matrix.to/#/#bionix:cua0.org). -- cgit v1.2.3