|
1 | 1 | # weaver |
2 | 2 |
|
3 | | -*Code is available upon request and will be made publicly available following preprint/publication* |
| 3 | +*Source code is available upon request and will be made publicly available prior to publication* |
4 | 4 |
|
5 | | -Weaver is a short read mapper for pangenome references. Weaver supports both linear (FASTA) and graphs (rGFA) references inputs and additionally uses small variants in a supplementary VCF file. It is designed to reduce reference bias in variant calling while being a simple drop-in replacement for linear reference mappers. |
| 5 | +Weaver is a short read mapper for pangenome references. Weaver supports both linear (FASTA) and graphs (rGFA) inputs and additionally uses small variants in a supplementary VCF file. It is designed to reduce reference bias in variant calling while being a simple drop-in replacement for linear reference mappers. |
6 | 6 |
|
7 | 7 | ## Getting started |
8 | 8 |
|
9 | | -Go to the [release page](https://github.com/DecodeGenetics/weaver/releases) and get the latest statically linked x86_64 binary. If you prefer, you can also build weaver locally (see below). |
| 9 | +### Installation |
| 10 | + |
| 11 | +Go to the [release page](https://github.com/DecodeGenetics/weaver/releases), download the latest statically linked x86_64 binary and add executable permissions. The binary should run on any 64-bit Linux OS. It does not require a GPU, FPGA or any non-standard hardware. Binary was built and tested on Red Hat Enterprise 9 (RHEL9). |
| 12 | + |
| 13 | +```sh |
| 14 | +mkdir -p bin && cd bin |
| 15 | +wget https://github.com/DecodeGenetics/weaver/releases/download/v0.2.0/weaver |
| 16 | +chmod a+x weaver |
| 17 | +``` |
| 18 | + |
| 19 | +Optionally you may consider adding the new `bin` directory to your `$PATH`. |
10 | 20 |
|
11 | 21 | ### Usage |
12 | 22 |
|
| 23 | +Weaver has two main subcommands, `weaver index` to construct the Weaver index and `weaver map` to map the reads to the pangenome. The Weaver index is constructed once and loaded during mapping. Typical usage: |
| 24 | + |
13 | 25 | ```sh |
14 | | -# Builds an weaver minimizer index (wmi) at genome.gfa.gz.wmi |
| 26 | +# Builds an weaver minimizer index (wmi) once at genome.gfa.gz.wmi |
15 | 27 | weaver index genome.gfa.gz --vcf=small_variants.vcf.gz |
16 | 28 |
|
17 | 29 | # Maps paired short-reads to the graph |
18 | 30 | weaver map genome.gfa.gz interleaved.fq.gz > out.sam |
19 | 31 |
|
20 | | -# Output contains all necessary SAM tags for samtools markdup so it can be piped directly through samtools into a BAM/CRAM |
| 32 | +# Output contains all necessary SAM tags for samtools markdup so it can be piped directly through samtools into a BAM/CRAM (requires samtools) |
21 | 33 | weaver map genome.gfa.gz interleaved.fq.gz | samtools markdup -u - - | samtools view --remove-tag ms -b -o final.bam |
22 | 34 | samtools index final.bam # Output is already position sorted |
| 35 | + |
| 36 | +# The alignments are in context of the stable sequences of the graph, a linear representation of the GFA. You can make a FASTA file with those sequences with (requires gfatools) |
| 37 | +gfatools gfa2fa -s genome.gfa.gz > genome.fa |
| 38 | +samtools faidx genome.fa |
23 | 39 | ``` |
24 | 40 |
|
25 | | -### Build |
| 41 | +### Build from source code |
26 | 42 |
|
27 | | -Requirements: C++17 compiler (GCC 7+ or clang 12+), CMake 3.2, libzstd, libz. |
| 43 | +Requirements: Linux OS (64bit), C++17 compiler (GCC 7+ or clang 12+), CMake 3.2+, libzstd, libz. Tested on RHEL9 OS (64bit). |
28 | 44 |
|
29 | 45 | Recursively git clone the repo and then build with: |
30 | 46 |
|
31 | 47 | ```sh |
32 | | -mkdir build && cd build |
33 | | -cmake .. |
34 | | -make weaver |
35 | | -./weaver version # Check version |
| 48 | +git clone --recursive https://github.com/DecodeGenetics/weaver && cd weaver |
| 49 | +mkdir -p build && cd build |
| 50 | +cmake .. # Builds a release build by default |
| 51 | +make -j4 weaver # Go grab some coffee, building takes a couple of minutes... |
36 | 52 | ``` |
37 | 53 |
|
38 | | -### Test |
| 54 | +I recommend to retrying in an empty `build` directory if you encounter any errors from cmake, i.e. some dependency not found. |
| 55 | + |
| 56 | +### Demo |
| 57 | + |
| 58 | +If you want to run a demo with your prebuilt binary or build, you can run the following (working directory should still be your `bin` or `build` directory) |
| 59 | + |
| 60 | +```sh |
| 61 | +$ ./weaver version # Check version |
| 62 | +0.2.0-5aabbfa |
| 63 | +5aabbfac6d86953c78e15e79a36bfcc28d5fa784 |
| 64 | +$ ./weaver index ../test/data/test_human_10k.gfa.gz --threads 4 -k 17 -w 5 --vcf=../test/data/truth.vcf.gz --log=./demo_weaver_index.log --vverbose |
| 65 | +$ ./weaver idxstats ../test/data/test_human_10k.gfa.gz.wmi # Print some index stats |
| 66 | +Index stats: |
| 67 | + k = 17 |
| 68 | + w = 5 |
| 69 | + graph size = 3363 bytes |
| 70 | + map size = 3512 keys |
| 71 | +$ ./weaver map ../test/data/test_human_10k.gfa.gz ../test/data/small.read1.fq.gz --fq2=../test/data/small.read2.fq.gz \ |
| 72 | + --extra-header-lines=../test/data/extra_header_lines.tsv --threads=4 > ./output.sam |
| 73 | +$ grep -v ^@PG ./output.sam | md5sum # Requires grep and md5sum |
| 74 | +7bf29e8b638070e95c9a1313c1eb6f53 - |
| 75 | +``` |
| 76 | + |
| 77 | +Use the default values for k and w if when running on the full human genome. Details about the available options are in `./weaver [subcommand] --help` |
| 78 | + |
| 79 | +### Unit tests |
39 | 80 |
|
40 | | -To run the test suite use: |
| 81 | +To run the unit test suite use (possible when source code has been released): |
41 | 82 |
|
42 | 83 | ```sh |
43 | | -make |
44 | | -make test # Runs unit tests |
| 84 | +make # Compiles everything, including the units test |
| 85 | +make test # Runs the unit tests |
45 | 86 | ``` |
46 | 87 |
|
47 | 88 | ## License |
|
0 commit comments