From dda29ca7685d360d5428dd827d11a9e4139a0872 Mon Sep 17 00:00:00 2001 From: Justin Bedo Date: Thu, 6 Oct 2022 10:21:34 +1100 Subject: init --- day1/README.md | 93 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 day1/README.md (limited to 'day1') diff --git a/day1/README.md b/day1/README.md new file mode 100644 index 0000000..f5be657 --- /dev/null +++ b/day1/README.md @@ -0,0 +1,93 @@ +# Day 1 - BioNix Workshop + +Let's start by defining *computational reproducibility* as always +obtaining the same output from a computation given the same inputs. In +other words, computational reproducibility is about making computations +*deterministic*. In the research context, this is important as +reproducibility allows others (and ourselves) to verify and build upon +what we have done in future. + +# A functional view of things and why Nix is needed + +What makes reproducibility difficult is the management of *state*, or +the context within with a computation takes place. State manipulation is +widespread: how many apps updates or system updates do you recall +automatically being installed over the past year? Do you think your +analysis today will be the same in one years time if your software stack +has changed? + +One way to deal with this problem is to make computations *pure* by forbidding +the use of anything that is not explicitly stated as an input. This is +the same idea of pure functional programming, only at the higher level of +executing software. + +Nix effectively enforces purity for software execution by ensuring the software +cannot access anything outside of the specified inputs. By this way, it can +guarantee a very high degree of reproducibility. Nix is a general build engine +most commonly used for building software today, but as we will see a bit later +it can also execute computational biology workflows in a pure manner with a small +library called BioNix. + +# Pipelines in BioNix + +``` +# This is an example pipeline specification to do multi-sample variant calling +# with the Platypus variant caller. Each input is preprocessed by aligning +# against a reference genome (defaults to GRCH38), fixing mate information, and +# marking duplicates. Finally platypus is called over all samples. +{ bionix ? import { } +, inputs +, ref ? bionix.ref.grch38.seq +}: + +with bionix; +with lib; + +let + preprocess = flip pipe [ + (bwa.align { inherit ref; }) + (samtools.sort { nameSort = true; }) + (samtools.fixmate { }) + (samtools.sort { }) + (samtools.markdup { }) + ]; + +in +platypus.call { } (map preprocess inputs) +``` + +# Nix the language + +We will start with learning Nix the langauge, which is used for +specifying workflows. If you are familar with JSON, it is very similar +in terms of availble data types but has one very important addition: +functions. Let's cover the basic data types and their syntax: + +- Booleans: `true` and `false` +- Strings: `"this is a string"` +- Numbers: `0`, `1.234` +- Lists: `[ 0 1.234 "string" ]` +- Attribute sets: `{ a = 5; b = "something else"; }` +- Comments: `# this is a comment` +- Functions: `x: x + 1` +- Variable binding: `let x = 5; in x #=> 5` +- Function application: `let f = x: x + 1; in f 5 #=> 6` +- File paths: `/path/to/file` + +Some common operators: +- Boolean conjunctions and disjunctions: `true || false #=> true` `true && false #=> false` +- Ordering: `3 < 3 #=> false`, `3 <= 3 #=> true` +- Conditionals: `if 3 < 4 then "a" else "b" #=> a` +- Addition and subtraction: `3 + 4 #=> 7`, `3 - 4 #=> -1` +- Multiplication and division: `3 * 4 #=> 12`, `3.0 / 4 #=> 0.75` +- String concatenation: `"hello " + "world" #=> "hello world"` +- String interpolation: `"hello ${"world"}" #=> "hello world"`, `"1 + 2 = ${toString (1 + 2)}" #=> "1 + 2 = 3"` +- Attribute set unions: `{ a = 5; } // { b = 6; } #=> { a = 5; b = 6; }` + +# About this interface + +This workshop uses [A tour of +nix](https://github.com/nixcloud/tour_of_nix) with some altered content +for the purposes of learning enough of Nix the language to write +workflows in BioNix during the second part. Click next to continue to +the exercises. -- cgit v1.2.3