To run plassembler, first you need to install the database in a directory of your chosing:
plassembler download -d <database directory>
Once this is finished, you can run Plassembler as follows:
plassembler run -d <database directory> -l <long read fastq> -o <output dir> -1 < short read R1 fastq> -2 < short read R2 fastq> -c <estimated lower bound of chromosome length>
-cor--chromosomewill default to 1000000 if not specified.
To specify more threads to speed up Plassembler, use -t or --threads:
plassembler run -d <database directory> -l <long read fastq> -o <output dir> -1 < short read R1 fastq> -2 < short read R2 fastq> -c <estimated chromosome length> -t <threads>
plassembler defaults to 1 thread.
To specify a prefix for the output files, use -p or --prefix:
plassembler run -d <database directory> -l <long read fastq> -o <output dir> -1 < short read R1 fastq> -2 < short read R2 fastq> -c <estimated chromosome length> -t <threads> -p <prefix>
To specify a minimum length and minimum read quality Q-score for chopper, use -m and -q :
plassembler run -d <database directory> -l <long read fastq> -o <output dir> -1 < short read R1 fastq> -2 < short read R2 fastq> -c <estimated chromosome length> -t <threads> -p <prefix> -m <min length> -q <min quality>
-mwill default to 500 and-qwill default to 9.
To overwrite an existing output directory, use -f or --force
plassembler run -d <database directory> -l <long read fastq> -o <output dir> -1 < short read R1 fastq> -2 < short read R2 fastq> -c <estimated chromosome length> -t <threads>
To use Raven instead of Flye as a long read assembler, use --use_raven
plassembler run -d <database directory> -l <long read fastq> -o <output dir> -1 < short read R1 fastq> -2 < short read R2 fastq> -c <estimated chromosome length> -t <threads> --use_raven
To keep the Flye assembled chromosome(s) (as chromosome.fasta), use --keep-chromosome
plassembler run -d <database directory> -l <long read fastq> -o <output dir> -1 < short read R1 fastq> -2 < short read R2 fastq> -c <estimated chromosome length> -t <threads> --keep_chromosome
To use pacbio reads use --pacbio_model (e.g. with regular CLR reads so with pacbio-raw model specified in Flye):
plassembler run -d <database directory> -l <long read fastq> -o <output dir> -1 < short read R1 fastq> -2 < short read R2 fastq> -c <estimated chromosome length> -t <threads> --pacbio_model pacbio-raw
To skip quality control (chopper and fastp), use --skip_qc
plassembler run -d <database directory> -l <long read fastq> -o <output dir> -1 < short read R1 fastq> -2 < short read R2 fastq> -c <estimated chromosome length> -t <threads> --skip_qc
To use assembled mode to calculate plasmid copy numbers, you need to use plassembler assembled, along with an already assembled chromosome with --input_chromosome and plasmids with --input_plasmids.
plassembler assembled -d <database directory> -l <long read fastq> -o <output dir> -1 < short read R1 fastq> -2 < short read R2 fastq> -c <estimated chromosome length> -t <threads> -a --input_chromosome <path to chromosome FASTA> --input_plasmids <path to plasmids FASTA>
You can also use plassembler long, which will simply run Flye and keep all contigs below -c and denote them as 'plasmids', but this is experimental only for now and I do not vouch for its performance.
Usage: plassembler run [OPTIONS]
Runs Plassembler
Options:
-h, --help Show this message and exit.
-V, --version Show the version and exit.
-d, --database PATH Directory of PLSDB database. [required]
-l, --longreads PATH FASTQ file of long reads. [required]
-1, --short_one PATH R1 short read FASTQ file. [required]
-2, --short_two PATH R2 short read FASTQ file. [required]
-c, --chromosome INTEGER Approximate lower-bound chromosome length of
bacteria (in base pairs). [default: 1000000]
-o, --outdir PATH Directory to write the output to. [default:
plassembler.output/]
-m, --min_length TEXT minimum length for filtering long reads with
chopper. [default: 500]
-q, --min_quality TEXT minimum quality q-score for filtering long reads
with chopper. [default: 9]
-t, --threads TEXT Number of threads. [default: 1]
-f, --force Force overwrites the output directory.
-p, --prefix TEXT Prefix for output files. This is not required.
[default: plassembler]
--skip_qc Skips qc (chopper and fastp).
--pacbio_model TEXT Pacbio model for Flye. Must be one of pacbio-raw,
pacbio-corr or pacbio-hifi. Use pacbio-raw for
PacBio regular CLR reads (<20 percent error),
pacbio-corr for PacBio reads that were corrected
with other methods (<3 percent error) or pacbio-
hifi for PacBio HiFi reads (<1 percent error).
-r, --raw_flag Use --nano-raw for Flye. Designed for Guppy fast
configuration reads. By default, Flye will assume
SUP or HAC reads and use --nano-hq.
--keep_fastqs Whether you want to keep FASTQ files containing
putative plasmid reads and long reads that map to
multiple contigs (plasmid and chromosome).
--keep_chromosome If you want to keep the chromosome assembly.
--use_raven Uses Raven instead of Flye for long read assembly.
May be useful if you want to reduce runtime.
All options
Usage: plassembler [OPTIONS] COMMAND [ARGS]...
Options:
-h, --help Show this message and exit.
-V, --version Show the version and exit.
Commands:
assembled Runs assembled mode
citation Print the citation(s) for this tool
download Downloads Plassembler DB
long Plassembler with long reads only - experimental and untested
run Runs Plassembler