-Identifying genomes in metagenomics samples can be complicated by taxonomic profiling tools that lack uncertainty quantification and rely on incomplete reference databases. YACHT (**Y**es/No **A**nswers to **C**ommunity membership via **H**ypothesis **T**esting) introduces a $k$-mer sketching based statistical framework that incorporates average nucleotide identity (ANI) and coverage, the portion of $k$-mers observed for a microbe’s genome detected in a sample, to detect genetic similarity between reference and sample genomes using binomial hypothesis testing on exclusive $k$-mers to confidently determine genome presence/absence [@koslicki2024yacht]. This paper describes the software implementation of this methodology as a command-line tool that detects low-abundant species while controlling the false-negative rate, making it applicable to functional profiling, metatranscriptomics, and clinical microbiome analysis despite incomplete genomes and variable coverage. YACHT is developed with C++ and Python and depends on `sourmash` [@irber2024sourmash] for $k$-mer extraction and management.
0 commit comments