Skip to content

Commit bcbe409

Browse files
committed
docs(paper): fix layout spacing in paper.md
1 parent dbf7836 commit bcbe409

1 file changed

Lines changed: 4 additions & 1 deletion

File tree

paper/paper.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,18 +32,21 @@ Comparative metagenomics and microbiome studies depend fundamentally on cross-sa
3232
`KrakenParser` is implemented in Python 3 (distributed via PyPI as `krakenparser`) and follows a modular architecture split into three distinct operational layers: Data Processing, Statistical Analysis, and Visualization. The pipeline can be executed in an end-to-end automated mode by providing global input and output paths directly to the main command, or controlled step-by-step through granular subcommands.
3333

3434
## Data Processing and Filtering
35+
3536
Individual taxonomic reports are programmatically parsed, converted into MetaPhlAn (MPA) tables, and merged into a unified cross-sample master count matrix. This matrix is subsequently deconstructed into distinct tables for each major taxonomic rank. During deconstruction, `KrakenParser` purges internal structural prefixes (e.g., stripping `s__` from species names) and normalizes taxonomic strings by replacing underscores with spaces to ensure human readability and compatibility with downstream software.
3637

3738
The core data engine features flexible filtering mechanisms. Users can selectively isolate or exclude specific biological domains or kingdoms (Bacteria, Viruses, Archaea, Fungi) during extraction. While non-target host reads (e.g., human contamination) are filtered out by default to focus on microbial signatures, the `--keep-human` flag preserves host read counts within the output matrices. Crucially, `--keep-human` can be combined concurrently with domain-specific filters, allowing the simultaneous evaluation of host-to-microbe or host-to-pathogen abundance ratios within a single run.
3839

3940
## Statistical Analysis
41+
4042
Following matrix generation, the statistical module computes normalization metrics and ecological indices directly:
4143

4244
* **Relative Abundance:** Normalizes absolute counts into percentage distributions using the formula: $\text{Relative Abundance} = \left( \frac{\text{Number of individuals of taxa}}{\text{Total number of individuals of all taxa}} \right) \times 100$. A user-defined abundance threshold aggregates rare background taxa into a consolidated `Other` category to simplify downstream parsing and plotting.
4345
* **Alpha Diversity:** Calculates *Shannon* [@shannon1948mathematical], *Pielou’s evenness* [@pielou1966measurement], and *Chao1* [@chao2002estimating] indices. To mitigate artifacts caused by uneven sequencing depths across different sequencing runs, a built-in rarefaction procedure subsamples reads to a uniform user-specified depth prior to calculating indices.
4446
* **Beta Diversity:** Computes compositional dissimilarity between samples via *Bray-Curtis* [@bray10jt] and *Jaccard* [@jaccard1901etude] distance metrics, exporting standard distance matrices ready for ordination.
4547

4648
## Visualization
49+
4750
The `kpplot` module utilizes an object-oriented design inheriting from a unified base configuration class (`KpPlotBase`), enforcing consistent rendering properties such as DPI, bounding box scaling, and layout properties. Built on top of `matplotlib` [@Hunter2007], `pandas` [@reback2020pandas], and `seaborn` [@Waskom2021], the visualization engine exposes four primary programmatic layouts:
4851

4952
* **Stacked Bar Plots:** For comparing relative taxonomic proportions across multi-sample cohorts.
@@ -59,4 +62,4 @@ The functional reliability and execution integrity of `KrakenParser` are validat
5962

6063
Generative AI tools were used during the development of this work to assist with code refactoring, documentation drafting, and manuscript text editing. All software design decisions, implementation, validation, and scientific interpretation were performed and reviewed by the authors. No generative AI tools were used to generate or analyze research data, and all results reported are reproducible from the publicly available source code and documentation.
6164

62-
# References
65+
# References

0 commit comments

Comments
 (0)