@@ -107,15 +107,9 @@ X9,0.7232472324723247,0.7352941176470589,...,0.8066914498141264,0.0
107107| ![ combined_white] ( https://github.com/user-attachments/assets/48b3f6e3-6dd5-4298-a793-23dcd549e90c ) | ![ kpclust] ( https://github.com/user-attachments/assets/98a4d540-7c43-4802-8f77-277a5637a7a1 ) |
108108
109109## Quick Start (Full Pipeline)
110- To run the full pipeline, use the following command:
111- ``` bash
112- KrakenParser --complete -i data/kreports -o results/
113- # Having troubles? Run KrakenParser --complete -h
114- ```
115110
116- For ** reproducible** β-diversity (rarefaction is stochastic by default):
117111``` bash
118- KrakenParser -i data/kreports -o results/ -s 42
112+ KrakenParser -i data/kreports -o results/
119113```
120114
121115This will:
@@ -127,147 +121,165 @@ This will:
1271216 . Calculate relative abundance
1281227 . Calculate α & β-diversities
129123
130- ## Installation
124+ > [ !TIP]
125+ > After the pipeline finishes, the output window will remind you about calibrating
126+ > rarefaction depth for β-diversity and re-running relative abundance normalization
127+ > before visualization — with ready-to-paste example commands tailored to your output paths.
128+
129+ ### Full help output
131130
132131```
133- pip install krakenparser
132+ usage: KrakenParser [-h] [-i INPUT] [-o OUTPUT] [--viruses] [--keep-human]
133+ [-V] [-d DEPTH] [-s SEED] [--overwrite]
134+ [--step {mpa,combine,split,process,csv,relabund,diversity}]
135+
136+ KrakenParser: Convert Kraken2 Reports to CSV.
137+
138+ options:
139+ -h, --help show this help message and exit
140+
141+ Core Arguments:
142+ -i, --input INPUT Directory containing Kraken2 report files
143+ -o, --output OUTPUT Output directory (default: parent of input)
144+ --viruses Extract only VIRUSES domain taxa in the pipeline
145+ --keep-human Do not filter human-related taxa
146+ -V, --version show program's version number and exit
147+
148+ Pipeline Options (Full Run):
149+ -d, --depth DEPTH Rarefaction depth for β-diversity (default: 1000)
150+ -s, --seed SEED Random seed for reproducible rarefaction (default: random)
151+ --overwrite Overwrite the output directory if it already exists
152+
153+ Advanced (Step-by-step control):
154+ --step {mpa,combine,split,process,csv,relabund,diversity}
155+ Run only a specific part of the pipeline.
156+ Type 'krakenparser --step <name> -h' for more.
134157```
135158
136- ## Before Visualization: Grouping Low-Abundance Taxa
137-
138- The full pipeline automatically calculates relative abundance. Before passing data to visualization, it is strongly recommended to re-run ` --relabund ` with the ` -O ` flag — this collapses all taxa below the chosen threshold into a single ** "Other"** group, producing much cleaner and more readable plots.
159+ ## Installation
139160
140- ``` bash
141- KrakenParser --relabund -i data/counts/counts_species.csv -o data/rel_abund/ra_species.csv -O 4
142161```
143-
144- This groups every taxon with relative abundance ** < 4 %** into ` Other (<4.0%) ` . Adjust the threshold to your data.
145-
146- > ** Note:** The pipeline-generated ` rel_abund/ra_*.csv ` files (no ` -O ` ) preserve the full unfiltered data — use them for statistical analysis. Use the ` -O ` variant specifically for visualization.
162+ pip install krakenparser
163+ ```
147164
148165---
149166
150167<details >
151168<summary ><b >Using Individual Modules (Advanced)</b ></summary >
152169<br >
153170
154- Each step of the pipeline can also be run individually. This is useful for re-running a single step, debugging, or integrating KrakenParser into a custom workflow.
171+ Each step of the pipeline can be run individually via ` --step ` . This is useful for re-running a single step, debugging, or integrating KrakenParser into a custom workflow. Run ` krakenparser --step <name> -h ` to see the full argument list for any step .
155172
156173### ** Step 1: Convert Kraken2 Reports to MPA Format**
157174``` bash
158175# Batch mode (directory)
159- KrakenParser --kreport2mpa -i data/kreports -o data/intermediate/mpa
176+ KrakenParser --step mpa -i data/kreports -o data/intermediate/mpa
160177# Single file
161- KrakenParser --kreport2mpa -r data/kreports/sample.kreport -o data/intermediate/mpa/sample.MPA.TXT
162- # Having troubles? Run KrakenParser --kreport2mpa -h
178+ KrakenParser --step mpa -r data/kreports/sample.kreport -o data/intermediate/mpa/sample.MPA.TXT
163179```
164180Converts Kraken2 ` .kreport ` files into ** MPA format** .
165181
166182### ** Step 2: Combine MPA Files**
167183``` bash
168- KrakenParser --combine_mpa -i data/intermediate/mpa/* -o data/intermediate/COMBINED.txt
169- # Having troubles? Run KrakenParser --combine_mpa -h
184+ KrakenParser --step combine -i data/intermediate/mpa/* -o data/intermediate/COMBINED.txt
170185```
171186Merges multiple MPA files into a single combined table.
172187
173188### ** Step 3: Extract Taxonomic Levels**
174189``` bash
175- KrakenParser --deconstruct -i data/intermediate/COMBINED.txt -o data/intermediate
176- # Having troubles? Run KrakenParser --deconstruct -h
190+ KrakenParser --step split -i data/intermediate/COMBINED.txt -o data/intermediate
177191```
178192
179193By default, human-related taxa (Homo sapiens, Hominidae, Primates, Mammalia, Chordata) are removed. To keep them:
180194``` bash
181- KrakenParser --deconstruct -i data/intermediate/COMBINED.txt -o data/intermediate --keep-human
195+ KrakenParser --step split -i data/intermediate/COMBINED.txt -o data/intermediate --keep-human
182196```
183197
184- To inspect the ** Viruses** domain separately :
198+ To inspect the ** Viruses** domain only :
185199``` bash
186- KrakenParser --deconstruct_viruses -i data/intermediate/COMBINED.txt -o data/counts_viruses
187- # Having troubles? Run KrakenParser --deconstruct_viruses -h
200+ KrakenParser --step split -i data/intermediate/COMBINED.txt -o data/counts_viruses --viruses-only
188201```
189202
190203### ** Step 4: Process Extracted Taxonomic Data**
191204``` bash
192- KrakenParser --process -i data/intermediate/COMBINED.txt -o data/intermediate/txt/counts_phylum.txt
193- # Having troubles? Run KrakenParser --process -h
205+ KrakenParser --step process -i data/intermediate/COMBINED.txt -o data/intermediate/txt/counts_phylum.txt
194206```
195207
196- Repeat on other 5 taxonomical levels (class, order, family, genus, species) or wrap up ` KrakenParser -- process` in a loop.
208+ Repeat on other 5 taxonomical levels (class, order, family, genus, species) or wrap ` --step process` in a loop.
197209
198210Cleans up taxonomic names: removes prefixes (` s__ ` , ` g__ ` , etc.) and replaces underscores with spaces.
199211
200212### ** Step 5: Convert TXT to CSV**
201213``` bash
202- KrakenParser --txt2csv -i data/intermediate/txt/counts_phylum.txt -o data/counts/counts_phylum.csv
203- # Having troubles? Run KrakenParser --txt2csv -h
214+ KrakenParser --step csv -i data/intermediate/txt/counts_phylum.txt -o data/counts/counts_phylum.csv
204215```
205216Repeat on other 5 taxonomical levels or wrap in a loop. Transposes data so that sample names become rows.
206217
207218### ** Step 6: Calculate Relative Abundance**
208219``` bash
209- KrakenParser --relabund -i data/counts/counts_phylum.csv -o data/rel_abund/ra_phylum.csv
210- # Having troubles? Run KrakenParser --relabund -h
220+ KrakenParser --step relabund -i data/counts/counts_phylum.csv -o data/rel_abund/ra_phylum.csv
211221```
212222Repeat on other 5 taxonomical levels or wrap in a loop.
213223
214224With "Other" grouping:
215225``` bash
216- KrakenParser --relabund -i data/counts/counts_phylum.csv -o data/rel_abund/ra_phylum.csv -O 3.5
226+ KrakenParser --step relabund -i data/counts/counts_phylum.csv -o data/rel_abund/ra_phylum.csv -O 3.5
217227```
218228Groups all taxa with abundance < 3.5 % into ` Other (<3.5%) ` .
219229
220230### ** Step 7: Calculate α & β-Diversities**
221231``` bash
222- KrakenParser --diversity -i data/counts/counts_species.csv -o data/diversity
223- # Having troubles? Run KrakenParser --diversity -h
232+ KrakenParser --step diversity -i data/counts/counts_species.csv -o data/diversity
224233```
225234
226235With a custom rarefaction depth:
227236``` bash
228- KrakenParser --diversity -i data/counts/counts_species.csv -o data/diversity -d 750
237+ KrakenParser --step diversity -i data/counts/counts_species.csv -o data/diversity -d 750
229238```
230239
231- For reproducible results (rarefaction uses random subsampling — fix the seed to get the same matrix every run):
240+ For reproducible results (fix the seed to get the same matrix every run):
232241``` bash
233- KrakenParser --diversity -i data/counts/counts_species.csv -o data/diversity -s 42
242+ KrakenParser --step diversity -i data/counts/counts_species.csv -o data/diversity -s 42
234243```
235244
236245---
237246
238247## Arguments Breakdown
239248
240- ### ** --complete** (Full Pipeline)
241- - Requires ` -i ` : path to the Kraken2 reports directory (e.g., ` data/kreports ` ).
242- - Optional ` -o ` : output directory (default: parent of ` -i ` ).
243- - Optional ` --keep-human ` : retain human-related taxa (default: filtered out).
244- - Optional ` -s INT ` : random seed for reproducible β-diversity rarefaction (default: random).
249+ ### ** Full Pipeline** (` -i ` )
250+ - ` -i / --input ` : path to the Kraken2 reports directory (e.g., ` data/kreports ` ). Triggers the full pipeline.
251+ - ` -o / --output ` : output directory (default: parent of ` -i ` ).
252+ - ` --viruses ` : extract only Viruses domain taxa throughout the pipeline.
253+ - ` --keep-human ` : retain human-related taxa (default: filtered out).
254+ - ` -d INT / --depth ` : rarefaction depth for β-diversity (default: 1000).
255+ - ` -s INT / --seed ` : random seed for reproducible β-diversity rarefaction (default: random).
256+ - ` --overwrite ` : overwrite the output directory if it already exists.
245257
246- ### ** --kreport2mpa ** (Step 1)
258+ ### ** --step mpa ** (Step 1)
247259- Batch mode: ` -i DIR -o DIR ` — converts all files in a directory.
248260- Single-file mode: ` -r FILE -o FILE ` .
249261
250- ### ** --combine_mpa ** (Step 2)
262+ ### ** --step combine ** (Step 2)
251263- ` -i FILE [FILE ...] ` : one or more MPA files.
252264- ` -o FILE ` : output merged table.
253265
254- ### ** --deconstruct ** & ** --deconstruct_viruses ** (Step 3)
266+ ### ** --step split ** (Step 3)
255267- Extracts ** phylum, class, order, family, genus, species** into separate text files.
256- - ` --deconstruct ` removes human-related reads by default; use ` --keep-human ` to retain them.
257- - ` --deconstruct_viruses ` extracts only the Viruses domain.
268+ - Removes human-related reads by default; use ` --keep-human ` to retain them.
269+ - Use ` --viruses-only ` to extract only the Viruses domain.
258270
259- ### ** --process** (Step 4)
271+ ### ** --step process** (Step 4)
260272- Removes prefixes (` s__ ` , ` g__ ` , etc.), replaces underscores with spaces.
261273- ` -i ` : COMBINED.txt (source for sample-name header); ` -o ` : target txt file.
262274
263- ### ** --txt2csv ** (Step 5)
275+ ### ** --step csv ** (Step 5)
264276- Transposes a processed txt file into a CSV with sample names as rows.
265277
266- ### ** --relabund** (Step 6)
278+ ### ** --step relabund** (Step 6)
267279- Calculates relative abundance from a total-counts CSV.
268280- ` -O FLOAT ` : group taxa below FLOAT % into ` Other (<FLOAT%) ` .
269281
270- ### ** --diversity** (Step 7)
282+ ### ** --step diversity** (Step 7)
271283- Shannon, Pielou & Chao1 for α-diversity.
272284- Bray-Curtis & Jaccard for β-diversity.
273285- ` -d INT ` : rarefaction depth for β-diversity (default: 1000).
@@ -293,16 +305,17 @@ results/
293305│ ├─ alpha_div.csv
294306│ ├─ beta_div_bray.csv
295307│ └─ beta_div_jaccard.csv
296- └─ intermediate/ # Intermediate files
297- ├─ mpa/ # Converted MPA files
298- │ ├─ {sample}.txt
299- │ ├─ ...
300- ├─ COMBINED.txt # Merged MPA table
301- └─ txt/ # Extracted taxonomic levels in TXT
302- ├─ counts_species.txt
303- ├─ counts_genus.txt
304- ├─ ...
305- └─ counts_phylum.txt
308+ ├─ intermediate/ # Intermediate files
309+ │ ├─ mpa/ # Converted MPA files
310+ │ │ ├─ {sample}.txt
311+ │ │ ├─ ...
312+ │ ├─ COMBINED.txt # Merged MPA table
313+ │ └─ txt/ # Extracted taxonomic levels in TXT
314+ │ ├─ counts_species.txt
315+ │ ├─ counts_genus.txt
316+ │ ├─ ...
317+ │ └─ counts_phylum.txt
318+ └─ krakenparser.log # Pipeline execution logs
306319```
307320
308321## Conclusion
0 commit comments