This directory describes the algorithms and data structures behind each pathotypr module. For CLI usage and options, see the command docs.
| Module | Document | Core Idea |
|---|---|---|
| Feature Hashing | feature-hashing.md | The hashing trick: k-mers → fixed-size sparse vectors |
| Random Forest | random-forest.md | Sparse CART trees with bootstrap aggregation |
| Training Pipeline | training.md | End-to-end: vectorize → evaluate → train → OOB → export |
| Prediction | prediction.md | Streaming batch prediction with majority voting |
| Marker Genotyping | marker-genotyping.md | Diagnostic k-mers + Bloom filter for FASTQ scanning |
| Reference Matching | reference-matching.md | K-mer containment scoring with streaming batches |
| Assembly Classification | assembly-classification.md | Marker calling on FASTA assemblies with GFF annotation |