Skip to content

Commit e07b078

Browse files
committed
Updated README, added demo data
1 parent 87ced56 commit e07b078

5 files changed

Lines changed: 58 additions & 15 deletions

File tree

README.md

Lines changed: 56 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,88 @@
11
# weaver
22

3-
*Code is available upon request and will be made publicly available following preprint/publication*
3+
*Source code is available upon request and will be made publicly available prior to publication*
44

5-
Weaver is a short read mapper for pangenome references. Weaver supports both linear (FASTA) and graphs (rGFA) references inputs and additionally uses small variants in a supplementary VCF file. It is designed to reduce reference bias in variant calling while being a simple drop-in replacement for linear reference mappers.
5+
Weaver is a short read mapper for pangenome references. Weaver supports both linear (FASTA) and graphs (rGFA) inputs and additionally uses small variants in a supplementary VCF file. It is designed to reduce reference bias in variant calling while being a simple drop-in replacement for linear reference mappers.
66

77
## Getting started
88

9-
Go to the [release page](https://github.com/DecodeGenetics/weaver/releases) and get the latest statically linked x86_64 binary. If you prefer, you can also build weaver locally (see below).
9+
### Installation
10+
11+
Go to the [release page](https://github.com/DecodeGenetics/weaver/releases), download the latest statically linked x86_64 binary and add executable permissions. The binary should run on any 64-bit Linux OS. It does not require a GPU, FPGA or any non-standard hardware. Binary was built and tested on Red Hat Enterprise 9 (RHEL9).
12+
13+
```sh
14+
mkdir -p bin && cd bin
15+
wget https://github.com/DecodeGenetics/weaver/releases/download/v0.2.0/weaver
16+
chmod a+x weaver
17+
```
18+
19+
Optionally you may consider adding the new `bin` directory to your `$PATH`.
1020

1121
### Usage
1222

23+
Weaver has two main subcommands, `weaver index` to construct the Weaver index and `weaver map` to map the reads to the pangenome. The Weaver index is constructed once and loaded during mapping. Typical usage:
24+
1325
```sh
14-
# Builds an weaver minimizer index (wmi) at genome.gfa.gz.wmi
26+
# Builds an weaver minimizer index (wmi) once at genome.gfa.gz.wmi
1527
weaver index genome.gfa.gz --vcf=small_variants.vcf.gz
1628

1729
# Maps paired short-reads to the graph
1830
weaver map genome.gfa.gz interleaved.fq.gz > out.sam
1931

20-
# Output contains all necessary SAM tags for samtools markdup so it can be piped directly through samtools into a BAM/CRAM
32+
# Output contains all necessary SAM tags for samtools markdup so it can be piped directly through samtools into a BAM/CRAM (requires samtools)
2133
weaver map genome.gfa.gz interleaved.fq.gz | samtools markdup -u - - | samtools view --remove-tag ms -b -o final.bam
2234
samtools index final.bam # Output is already position sorted
35+
36+
# The alignments are in context of the stable sequences of the graph, a linear representation of the GFA. You can make a FASTA file with those sequences with (requires gfatools)
37+
gfatools gfa2fa -s genome.gfa.gz > genome.fa
38+
samtools faidx genome.fa
2339
```
2440

25-
### Build
41+
### Build from source code
2642

27-
Requirements: C++17 compiler (GCC 7+ or clang 12+), CMake 3.2, libzstd, libz.
43+
Requirements: Linux OS (64bit), C++17 compiler (GCC 7+ or clang 12+), CMake 3.2+, libzstd, libz. Tested on RHEL9 OS (64bit).
2844

2945
Recursively git clone the repo and then build with:
3046

3147
```sh
32-
mkdir build && cd build
33-
cmake ..
34-
make weaver
35-
./weaver version # Check version
48+
git clone --recursive https://github.com/DecodeGenetics/weaver && cd weaver
49+
mkdir -p build && cd build
50+
cmake .. # Builds a release build by default
51+
make -j4 weaver # Go grab some coffee, building takes a couple of minutes...
3652
```
3753

38-
### Test
54+
I recommend to retrying in an empty `build` directory if you encounter any errors from cmake, i.e. some dependency not found.
55+
56+
### Demo
57+
58+
If you want to run a demo with your prebuilt binary or build, you can run the following (working directory should still be your `bin` or `build` directory)
59+
60+
```sh
61+
$ ./weaver version # Check version
62+
0.2.0-5aabbfa
63+
5aabbfac6d86953c78e15e79a36bfcc28d5fa784
64+
$ ./weaver index ../test/data/test_human_10k.gfa.gz --threads 4 -k 17 -w 5 --vcf=../test/data/truth.vcf.gz --log=./demo_weaver_index.log --vverbose
65+
$ ./weaver idxstats ../test/data/test_human_10k.gfa.gz.wmi # Print some index stats
66+
Index stats:
67+
k = 17
68+
w = 5
69+
graph size = 3363 bytes
70+
map size = 3512 keys
71+
$ ./weaver map ../test/data/test_human_10k.gfa.gz ../test/data/small.read1.fq.gz --fq2=../test/data/small.read2.fq.gz \
72+
--extra-header-lines=../test/data/extra_header_lines.tsv --threads=4 > ./output.sam
73+
$ grep -v ^@PG ./output.sam | md5sum # Requires grep and md5sum
74+
7bf29e8b638070e95c9a1313c1eb6f53 -
75+
```
76+
77+
Use the default values for k and w if when running on the full human genome. Details about the available options are in `./weaver [subcommand] --help`
78+
79+
### Unit tests
3980

40-
To run the test suite use:
81+
To run the unit test suite use (possible when source code has been released):
4182

4283
```sh
43-
make
44-
make test # Runs unit tests
84+
make # Compiles everything, including the units test
85+
make test # Runs the unit tests
4586
```
4687

4788
## License

test/data/extra_header_lines.tsv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
@RG ID:test PL:ILLUMINA SM:bar
2+
@RG ID:foo PL:ILLUMINA SM:bar

test/data/small.read1.fq.gz

131 KB
Binary file not shown.

test/data/small.read2.fq.gz

131 KB
Binary file not shown.

test/data/test_human_10k.gfa.gz

3.28 KB
Binary file not shown.

0 commit comments

Comments
 (0)