From d12f635279ec1b43a57293c9c95919a489606d8c Mon Sep 17 00:00:00 2001 From: Justin Bedo Date: Tue, 11 Jun 2019 16:44:36 +1000 Subject: rewrite README documentation Significant expansion of README to include installation instructions and instructions on how to use with HPC. --- README.md | 122 +++++++++++++++++++++++++++++++++++++++++++---------- doc/default.nix | 47 +++++++++++++++++++++ doc/tools-doc.nix | 47 --------------------- examples/README.md | 25 +++++------ 4 files changed, 160 insertions(+), 81 deletions(-) create mode 100644 doc/default.nix delete mode 100644 doc/tools-doc.nix diff --git a/README.md b/README.md index 8858a25..be73a22 100644 --- a/README.md +++ b/README.md @@ -1,40 +1,118 @@

BioNix

-BioNix is a tool for reproducible bioinformatics that unifies workflow engines, package managers, and containers. -It is implemented as a lightweight library on top of the [Nix](https://nixos.org/nix/) deployment system. +BioNix is a tool for reproducible bioinformatics that unifies workflow +engines, package managers, and containers. It is implemented as a +lightweight library on top of the [Nix](https://nixos.org/nix/) +deployment system. BioNix is currently a work in progress, so documentation is sparse. +Please get in contact with us for more information, help, and +contributing (see bottom of this page). -## Getting started - -Install [Nix](http://nixos.org/nix): +## Installation +BioNix requires no dependencies beyond [Nix](http://nixos.org/nix), +which may be installed by: ```{sh} curl https://nixos.org/nix/install | sh ``` -To run a sample pipeline, clone this project and run `nix-build` in the `/examples` directory: +If you do not have root access a variety of [rootless +install](https://nixos.wiki/wiki/Nix_Installation_Guide#Installing_without_root_permissions) +options are available. -```{sh} -$ git clone https://github.com/PapenfussLab/bionix -$ cd examples -$ nix-build -``` +API docs can be generated by executing `nix build` in the `doc` +directory and viewing `result/OEBPS/index.html`. + +## Examples + +Several examples are available in `./examples/`. The main example is +presented in `./examples/default.nix` and can be built using `nix build` +in `./examples/`. This sample pipeline performs variant calling using +[`platypus`](https://github.com/andyrimmer/Platypus), alignment using +[`bwa mem`](https://github.com/lh3/bwa), and preprocessing using +[`samtools`](http://www.htslib.org/). + +See the documentation in `./examples/README.md` for more detail about +this pipeline and the other examples. + +- The pipeline itself is specified in `examples/call.nix` and + `examples/default.nix`. +- The BioNix wrapper to run `platypus` is in + `tools/platypus-callVariants.nix`. +- The Nix expression for the `platypus` software itself can be found in + [nixpkgs](https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/science/biology/platypus/default.nix). + +## Constructing workflows + +Writing workflows requires some familiarity with the Nix +programming language and deployment system. Good introductions can be +found [here](https://learnxinyminutes.com/docs/nix/) and +[here](https://ebzzry.io/en/nix/). + +To understand how to construct workflows it is recommended to study the +examples provided. Thanks to the flexibility of Nix, the workflows can +be constructed in different ways to suit the intended purposes and the +examples illustrate some of the ways one might approach various +problems. + +For constructing tool wrappers, take a look in the `./tools/` +directory for the currently existing tool wrappers. A good starting +point are the wrappers for BWA. + +## HPC execution + +BioNix supports submission of jobs to computing queues rather than +directly building them using the Nix build engine. The two supported +engines are Slurm and PBS represented by the `slurm` and `qsub` entries +in the root BioNix tree, which take an attribute set of default +parameters to a new tree of tools. Simply use tools out of these trees +to submit jobs, and specify resource requirements as ordinary +configuration options to the tools. + +The following resource parameters can be specified: + +- *ppn*: The number of cores to request; +- *mem*: The amount of memory to request (GB); +- *walltime*: A string defining the maximum walltime. + +As we rely on side effects to submit jobs sandbox builds cannot be used +and must be disabled (`--option sandbox false` with `nix-build` or +`--no-sandbox` with `nix build`). + +### Slurm specifics + +Slurm jobs are submitted by executing the `salloc` binary on the +cluster. By default this is assumed to be `/usr/bin/salloc`; if this is +not the case on your cluster then you need to additionally specify the +path to salloc via the `salloc` parameter. -The sample pipeline performs variant calling using [`platypus`](https://github.com/andyrimmer/Platypus), alignment using [`bwa mem`](https://github.com/lh3/bwa), and preprocessing using [`samtools`](http://www.htslib.org/). -BioNix will download or build all of the necessary software and create a soft link (`result`) to the workflow output. +When launching the build, it is important that the `TMPDIR` +environment variable points to a location which is on shared storage +(i.e., available from all nodes). This will be the location used for +temporary files during the execution of stages. -Next, check out the code: +### PBS specifics -- The pipeline itself is specified in `examples/call.nix` and `examples/default.nix`. -- The BioNix wrapper to run `platypus` is in `tools/platypus-callVariants.nix`. -- The Nix expression for the `platypus` software itself can be found in [nixpkgs](https://github.com/NixOS/nixpkgs/blob/master/pkgs/applications/science/biology/platypus/default.nix). +The PBS wrapper is considerably more complicated as initiating +interactive processes is not as reliable as Slurm's `salloc`. +Consequently, jobs are submitted via non-interactive queue submissions +and the queue polled to determine when the submitted job has completed. -BioNix pipelines can be easily wrapped in shell scripts: see `examples/ex-tnpair/tnpair` for an example script that accepts a reference fasta, along with paired normal and tumor fastq files, and performs alignment, preprocessing, and variant calling with [`strelka`](https://github.com/Illumina/strelka). +The path to the PBS executables (i.e., `qsub` and `qstat`) has to be +given in the `qsubPath` attribute. Furthermore, a temporary directory +that's shared across all nodes must be specified in `tmpDir`. -Writing your own pipelines requires some familiarity with the Nix programming language and deployment system. Good introductions can be found [here](https://learnxinyminutes.com/docs/nix/) and [here](https://ebzzry.io/en/nix/). +## Distributed execution -We have successfully run BioNix pipelines in a zero-install manner (using a [statically linked binary](https://matthewbauer.us/blog/static-nix.html) and [user namespaces](https://www.redhat.com/en/blog/whats-next-containers-user-namespaces)), but this feature is currently unstable. Stay tuned! +Nix has support for distributing jobs amongst a collection of +distributed machines. See the +[manual](https://nixos.org/nix/manual/#chap-distributed-builds) and +[wiki](https://nixos.wiki/wiki/Distributed_build) for more information. -## Contact +## Getting help and contributing -Please come chat with us at [#bionix:cua0.org](http://matrix.to/#/#bionix:cua0.org). +For general questions, issues, and +[contributing](https://git-send-email.io), please +[email](mailto:bionix@cua0.org) or [subscribe +to](mailto:bionix+subscribe@cua0.org) our mailing list. You may also +chat with us at [#bionix:cua0.org](http://matrix.to/#/#bionix:cua0.org). diff --git a/doc/default.nix b/doc/default.nix new file mode 100644 index 0000000..d68a6b1 --- /dev/null +++ b/doc/default.nix @@ -0,0 +1,47 @@ +{ bionix ? import ./.. {} }: + +with bionix; + +stage { + name = "tools-docs"; + src = ../tools; + + xsltFlags = lib.concatStringsSep " " [ + #"--param section.autolabel 1" + #"--param section.label.includes.component.label 1" + #"--stringparam html.stylesheet 'style.css overrides.css highlightjs/mono-blue.css'" + #"--stringparam html.script './highlightjs/highlight.pack.js ./highlightjs/loader.js'" + #"--param xref.with.number.and.title 1" + #"--param toc.section.depth 3" + #"--stringparam admon.style ''" + #"--stringparam callout.graphics.extension .svg" + ]; + + + buildInputs = with pkgs; [ nixdoc libxslt libxml2 ]; + installPhase = '' + function docgen { + nixdoc -c "$1" -d "$2" -f "$1.nix" | sed 's/lib\./bionix./g' |grep -v locations.xml > "$1.xml" + } + + docgen ascat 'ascatNGS CNV caller' + docgen bowtie 'Bowtie aligner' + docgen bwa 'BWA aligner' + docgen cnvkit 'CNVkit CNV caller' + docgen facets 'Facets CNV caller' + docgen fastqc 'FastQC quality control' + docgen gridss 'GRIDSS SV caller' + docgen strelka 'Strelka2 variant caller' + + mkdir $out + cp ${./tools.xml} tools.xml + xmllint --nonet --xinclude --noxincludenode tools.xml --output tools-full.xml + cat tools-full.xml + xsltproc $xsltFlags \ + --nonet \ + --xinclude \ + --output $out/index.html \ + ${pkgs.docbook_xsl_ns}/xml/xsl/docbook/epub/docbook.xsl \ + tools-full.xml + ''; +} diff --git a/doc/tools-doc.nix b/doc/tools-doc.nix deleted file mode 100644 index d68a6b1..0000000 --- a/doc/tools-doc.nix +++ /dev/null @@ -1,47 +0,0 @@ -{ bionix ? import ./.. {} }: - -with bionix; - -stage { - name = "tools-docs"; - src = ../tools; - - xsltFlags = lib.concatStringsSep " " [ - #"--param section.autolabel 1" - #"--param section.label.includes.component.label 1" - #"--stringparam html.stylesheet 'style.css overrides.css highlightjs/mono-blue.css'" - #"--stringparam html.script './highlightjs/highlight.pack.js ./highlightjs/loader.js'" - #"--param xref.with.number.and.title 1" - #"--param toc.section.depth 3" - #"--stringparam admon.style ''" - #"--stringparam callout.graphics.extension .svg" - ]; - - - buildInputs = with pkgs; [ nixdoc libxslt libxml2 ]; - installPhase = '' - function docgen { - nixdoc -c "$1" -d "$2" -f "$1.nix" | sed 's/lib\./bionix./g' |grep -v locations.xml > "$1.xml" - } - - docgen ascat 'ascatNGS CNV caller' - docgen bowtie 'Bowtie aligner' - docgen bwa 'BWA aligner' - docgen cnvkit 'CNVkit CNV caller' - docgen facets 'Facets CNV caller' - docgen fastqc 'FastQC quality control' - docgen gridss 'GRIDSS SV caller' - docgen strelka 'Strelka2 variant caller' - - mkdir $out - cp ${./tools.xml} tools.xml - xmllint --nonet --xinclude --noxincludenode tools.xml --output tools-full.xml - cat tools-full.xml - xsltproc $xsltFlags \ - --nonet \ - --xinclude \ - --output $out/index.html \ - ${pkgs.docbook_xsl_ns}/xml/xsl/docbook/epub/docbook.xsl \ - tools-full.xml - ''; -} diff --git a/examples/README.md b/examples/README.md index 66ed04f..1689709 100644 --- a/examples/README.md +++ b/examples/README.md @@ -1,17 +1,14 @@ # Bionix examples -This directory has a few example workflows in bionix along with example data. -A basic workflow is defined in `call.nix`, and an example of applying it to the -sample data is in `default.nix`. To build the `default.nix` workflow, run -``` -nix build -``` -from this directory. +This directory has a few example workflows in bionix along with example +data. A basic workflow is defined in `call.nix`, and an example of +applying it to the sample data is in `default.nix`. To build the +`default.nix` workflow, run ```nix build``` from this directory. ## NextFlow and WDL translations -The directories `ex-nextflow` and `ex-wdl` contain translated examples from the -NextFlow and WDL documentation respectively. +The directories `ex-nextflow` and `ex-wdl` contain translated examples +from the NextFlow and WDL documentation respectively. The NextFlow translated example does not come with example data. It can be built with ``` @@ -25,6 +22,10 @@ nix build -f wdl-scatter-gather.nix ## Example script wrapper -`ex-tnpair` contains a shell script based example on how a front-end for users -might be constructed. It is a simple tumour-normal somatic calling workflow -using the Strelka variant caller. +`ex-tnpair` contains a shell script based example on how a front-end for +users might be constructed. It is a simple tumour-normal somatic calling +workflow using the Strelka variant caller. The script accepts a +reference fasta along with paired normal and tumor fastq files and +performs alignment, preprocessing, and variant calling with +[`strelka`](https://github.com/Illumina/strelka). + -- cgit v1.2.3