Fix Draft PDF #7

ilypopv · ilypopv · commit 8e0a47a70d4e · 2025-06-01T12:46:09.000+03:00
diff --git a/.github/workflows/draft-pdf.yml b/.github/workflows/draft-pdf.yml
@@ -1,12 +1,9 @@
-# Workflow derived from https://github.com/marketplace/actions/open-journals-pdf-generator
-
+name: Draft PDF
 on:
   push:
-    branches: [dev]
-  pull_request:
-    branches: [dev]
-    
-name: paper
+    paths:
+      - paper/**
+      - .github/workflows/draft-pdf.yml
 
 jobs:
   paper:
@@ -19,10 +16,13 @@ jobs:
         uses: openjournals/openjournals-draft-action@master
         with:
           journal: joss
+          # This should be the path to the paper within your repo.
           paper-path: paper/paper.md
       - name: Upload
         uses: actions/upload-artifact@v4
         with:
           name: paper
-          path: paper/paper.pdf
-          
+          # This is the output path where Pandoc will write the compiled
+          # PDF. Note, this should be the same directory as the input
+          # paper.md
+          path: paper/paper.pdf
diff --git a/paper/paper.md b/paper/paper.md
@@ -35,9 +35,7 @@ Metagenomic classification has seen rapid development, with numerous tools avail
 3. Deconstruct taxonomic levels: The combined data is split out by rank. `KrakenParser` extracts separate text files for phylum, class, order, family, genus, and species counts. During this step, it can optionally isolate certain domains; for example, using `--deconstruct_viruses` will produce a file of only viral species counts, ignoring other domains. Also, the default `--deconstruct` excludes reads classified as human to focus on microbial content.
 4. Process extracted data: Each rank-specific text file is cleaned and formatted. `KrakenParser` removes classification prefixes (like “s__” for species, “g__” for genus) and replaces underscores with spaces for readability. This step ensures taxon names are human-friendly (e.g. “s__Escherichia_coli” becomes “Escherichia coli”).
 5. Convert to CSV: The cleaned text tables are converted to CSV files (comma-separated values). In this transpose operation, taxa become columns and sample identifiers become rows, yielding a standard matrix format. This structured CSV is easy to import into statistical software, spreadsheets, or R/Python data frames for further analysis.
-6. Calculate relative abundances: For each count table, `KrakenParser` can create a corresponding relative abundance table (`--relabund` option) by computing percentages of total reads per sample, using the formula:
-$ \text{Relative Abundance} = \left( \frac{\text{Number of individuals of taxa}}{\text{Total number of individuals of all taxa}} \right) \times 100 $.
-Users can specify a threshold to group low-abundance taxa into an “Other” category. This results in a normalized profile for each sample, often more interpretable in comparative studies than raw counts.
+6. Calculate relative abundances: For each count table, `KrakenParser` can create a corresponding relative abundance table (`--relabund` option) by computing percentages of total reads per sample, using the formula: $\text{Relative Abundance} = \left( \frac{\text{Number of individuals of taxa}}{\text{Total number of individuals of all taxa}} \right) \times 100$. Users can specify a threshold to group low-abundance taxa into an “Other” category. This results in a normalized profile for each sample, often more interpretable in comparative studies than raw counts.
 
 Each of these steps is exposed as a sub-command in the CLI, so advanced users can integrate KrakenParser into custom workflows. By default, running `KrakenParser --complete -i <reports_dir>/kreports` executes all steps sequentially, writing outputs to a structured directory tree (with subfolders for each step). The outputs include one CSV file per rank (e.g. counts_phylum.csv, counts_species.csv) containing absolute read counts, and similarly named files under a `csv_relabund/` directory for percentages if requested. KrakenParser is optimized for speed and memory efficiency given the nature of the task: it processes text files line by line and uses `pandas` data frames for merging and calculations, which easily handle dozens of samples and tens of thousands of taxa on a standard workstation. The reliance on `KrakenTools` for the initial conversion ensures that the parsing logic benefits from the robustness of well-tested scripts, while the unified interface adds convenience. The tool also includes built-in help for each subcommand (`-h`), guiding users on required inputs and options. `KrakenParser`’s design reflects practical needs observed in the metagenomics community—it was tested during the 2025 “Bioinformatics Bootcamp” hackathon organized by ITMO University, where teams analyzing metagenomic datasets were able to obtain meaningful results in a short time thanks to `KrakenParser`’s streamlined processing pipeline. By combining established methods with new automation, `KrakenParser` provides an efficient, reproducible, and user-friendly means to handle the otherwise tedious steps of post-classification data processing.