35 lines (27 loc) · 1.84 KB

Documentation

Command Reference

Usage guides, CLI options, and output formats for each command.

Command	Description
train.md	Train a Random Forest classifier from labeled genomes
predict.md	Predict lineages using a trained model
classify.md	Call known SNP markers in assembled genomes
split-fastq.md	Alignment-free genotyping from raw FASTQ reads
match.md	Find the closest reference genome for a sample

Guides

Document	Description
input-formats.md	Format specifications for FASTA, marker TSV, and input lists
gui.md	Building and using the Tauri desktop application
benchmarks.md	Performance benchmarks: speed, throughput, memory, and comparisons

Algorithm Details

In-depth descriptions of the algorithms, data structures, and design decisions behind each module.

Document	Topic
algorithms/feature-hashing.md	The hashing trick: k-mers → fixed-size sparse vectors
algorithms/random-forest.md	Sparse CART trees with bootstrap aggregation and OOB accuracy
algorithms/training.md	End-to-end pipeline: vectorize → CV/split → train → serialize
algorithms/prediction.md	Streaming batch prediction with majority voting
algorithms/marker-genotyping.md	Diagnostic k-mers + Bloom filter for FASTQ scanning
algorithms/reference-matching.md	K-mer containment scoring with streaming batches
algorithms/assembly-classification.md	Marker calling on FASTA assemblies with GFF annotation