`pathotypr match`

Find the best matching reference genome for a set of FASTQ reads.

How it works

Count k-mers from FASTQ reads (parallel, with optional noise filtering)
Compare against references using streaming mode: processes references in batches, constant memory
Score each reference by weighted k-mer containment fraction
Report the best match

Usage

# Match reads against reference genomes
pathotypr match \
  -i reads_R1.fastq.gz reads_R2.fastq.gz \
  -r references.fasta \
  -o match.tsv --excel

# From sample list
pathotypr match \
  -l samples.tsv \
  -r references.fasta \
  -o match.tsv

Options

Flag	Default	Description
`-i, --input`	—	One or more FASTQ files
`-l, --input-list`	—	TSV: sample_name → FASTQ path(s)
`-r, --references`	—	Multi-FASTA with reference genomes
`-k, --kmer-size`	`31`	K-mer size
`-o, --output`	stdout	Output TSV path
`-t, --threads`	all cores	Number of CPU threads
`--min-kmer-count`	`2`	Discard k-mers with fewer occurrences (noise filter)
`--excel`	off	Also generate .xlsx
`--strict-percentages`	on	Legacy-compatible weighted scoring
`--early-stop-confidence`	`0` (off)	Stop when confidence exceeds threshold
`--early-stop-min-kmers`	`1,000,000`	Minimum k-mers before early stop can trigger

Output columns

Column	Description
`Query_Files`	Comma-separated input FASTQ paths
`Best_Match_Reference`	Header of the best-scoring reference
`Shared_Kmer_Fraction`	Weighted containment score (0–1)

Technical details

Streaming batch size: num_threads references per batch, balancing parallelism vs memory
Constant memory: regardless of reference count (1 to 500+), memory usage stays bounded
Noise filtering: k-mers appearing only once are likely sequencing errors; filtered when total unique k-mers > 100K

Algorithm Details

For in-depth documentation of the underlying algorithms:

Reference Matching — K-mer containment scoring, noise filtering, streaming batch processing, adaptive batch sizing, weighted score calculation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`pathotypr match`

How it works

Usage

Options

Output columns

Technical details

Algorithm Details

FilesExpand file tree

match.md

Latest commit

History

match.md

File metadata and controls

pathotypr match

How it works

Usage

Options

Output columns

Technical details

Algorithm Details

`pathotypr match`