Skip to content

Commit 8e0a47a

Browse files
committed
Fix Draft PDF #7
1 parent a141cbb commit 8e0a47a

2 files changed

Lines changed: 10 additions & 12 deletions

File tree

.github/workflows/draft-pdf.yml

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,9 @@
1-
# Workflow derived from https://github.com/marketplace/actions/open-journals-pdf-generator
2-
1+
name: Draft PDF
32
on:
43
push:
5-
branches: [dev]
6-
pull_request:
7-
branches: [dev]
8-
9-
name: paper
4+
paths:
5+
- paper/**
6+
- .github/workflows/draft-pdf.yml
107

118
jobs:
129
paper:
@@ -19,10 +16,13 @@ jobs:
1916
uses: openjournals/openjournals-draft-action@master
2017
with:
2118
journal: joss
19+
# This should be the path to the paper within your repo.
2220
paper-path: paper/paper.md
2321
- name: Upload
2422
uses: actions/upload-artifact@v4
2523
with:
2624
name: paper
27-
path: paper/paper.pdf
28-
25+
# This is the output path where Pandoc will write the compiled
26+
# PDF. Note, this should be the same directory as the input
27+
# paper.md
28+
path: paper/paper.pdf

paper/paper.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,7 @@ Metagenomic classification has seen rapid development, with numerous tools avail
3535
3. Deconstruct taxonomic levels: The combined data is split out by rank. `KrakenParser` extracts separate text files for phylum, class, order, family, genus, and species counts. During this step, it can optionally isolate certain domains; for example, using `--deconstruct_viruses` will produce a file of only viral species counts, ignoring other domains. Also, the default `--deconstruct` excludes reads classified as human to focus on microbial content.
3636
4. Process extracted data: Each rank-specific text file is cleaned and formatted. `KrakenParser` removes classification prefixes (like “s__” for species, “g__” for genus) and replaces underscores with spaces for readability. This step ensures taxon names are human-friendly (e.g. “s__Escherichia_coli” becomes “Escherichia coli”).
3737
5. Convert to CSV: The cleaned text tables are converted to CSV files (comma-separated values). In this transpose operation, taxa become columns and sample identifiers become rows, yielding a standard matrix format. This structured CSV is easy to import into statistical software, spreadsheets, or R/Python data frames for further analysis.
38-
6. Calculate relative abundances: For each count table, `KrakenParser` can create a corresponding relative abundance table (`--relabund` option) by computing percentages of total reads per sample, using the formula:
39-
$ \text{Relative Abundance} = \left( \frac{\text{Number of individuals of taxa}}{\text{Total number of individuals of all taxa}} \right) \times 100 $.
40-
Users can specify a threshold to group low-abundance taxa into an “Other” category. This results in a normalized profile for each sample, often more interpretable in comparative studies than raw counts.
38+
6. Calculate relative abundances: For each count table, `KrakenParser` can create a corresponding relative abundance table (`--relabund` option) by computing percentages of total reads per sample, using the formula: $\text{Relative Abundance} = \left( \frac{\text{Number of individuals of taxa}}{\text{Total number of individuals of all taxa}} \right) \times 100$. Users can specify a threshold to group low-abundance taxa into an “Other” category. This results in a normalized profile for each sample, often more interpretable in comparative studies than raw counts.
4139

4240
Each of these steps is exposed as a sub-command in the CLI, so advanced users can integrate KrakenParser into custom workflows. By default, running `KrakenParser --complete -i <reports_dir>/kreports` executes all steps sequentially, writing outputs to a structured directory tree (with subfolders for each step). The outputs include one CSV file per rank (e.g. counts_phylum.csv, counts_species.csv) containing absolute read counts, and similarly named files under a `csv_relabund/` directory for percentages if requested. KrakenParser is optimized for speed and memory efficiency given the nature of the task: it processes text files line by line and uses `pandas` data frames for merging and calculations, which easily handle dozens of samples and tens of thousands of taxa on a standard workstation. The reliance on `KrakenTools` for the initial conversion ensures that the parsing logic benefits from the robustness of well-tested scripts, while the unified interface adds convenience. The tool also includes built-in help for each subcommand (`-h`), guiding users on required inputs and options. `KrakenParser`’s design reflects practical needs observed in the metagenomics community—it was tested during the 2025 “Bioinformatics Bootcamp” hackathon organized by ITMO University, where teams analyzing metagenomic datasets were able to obtain meaningful results in a short time thanks to `KrakenParser`’s streamlined processing pipeline. By combining established methods with new automation, `KrakenParser` provides an efficient, reproducible, and user-friendly means to handle the otherwise tedious steps of post-classification data processing.
4341

0 commit comments

Comments
 (0)