Skip to content

Commit e6efa95

Browse files
committed
combine gtdb and user data
1 parent 0acf111 commit e6efa95

2 files changed

Lines changed: 3 additions & 3 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Sensitive mode selects representative genomes based on high-ANI connectivity tha
3939

4040
Benchmarking results comparing the default --no-reassembly and --sensitive modes are available in [sensitive_mode.md](https://github.com/soedinglab/MAGmax/blob/main/sensitive.md).
4141

42-
## Generate a custom species-level database
42+
## Generate a custom species-level reference database
4343
MAGmax provides a `customdb` subcommand to build a species-level non-redundant genome database by combining GTDB-Tk taxonomic classification with ANI-based dereplication. Bins confidently assigned to a known GTDB-Tk species are grouped by species and one representative is selected per species. Remaining unclassified bins are dereplicated by ANI clustering, enabling discovery of novel species. Isolate genomes can be prioritized as representatives using `--isolate-genomes`.
4444

4545
magmax customdb -g gtdbtk.summary.tsv -b <binsdir> -q quality_report.tsv -t 24

generate_customdatabase.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
## Tutorial: generate custom species-level database
1+
## Tutorial: generate a custom species-level reference catalog
22

33
Genome dereplication is not always perfect due to inherent limitations of hierarchical clustering algorithms used in dereplication tools (dRep and Galah). Alternatively, taxonomic classification using GTDBtk followed by grouping genomes by taxonomy assignment is another option for dereplication, but it has limitations too: 1) ANI radius of under-represented species may be inaccurate, causing wrong taxonomy labeling; 2) novel species cannot be assigned. Combining dereplication and taxonomic classification can enhance the discovery of novel species with improved accuracy.
44

@@ -206,7 +206,7 @@ Remaining bins are clustered by pairwise ANI (default 95%, aligned fraction ≥
206206
5. **The `unclassified_clusterrepresentatives_gtdbtkspecies_ani_connections.tsv` file is a diagnostic resource.** It lists novel-cluster representatives whose ANI to a known GTDB-Tk species representative meets or exceeds that species' ANI radius. This happens when unclassified cluster representatives have lower ANI to the GTDB reference species than the representatives selected from the user's input dataset.
207207

208208

209-
## Building a unified species-level database: integrating MAGmax dereplication results with GTDB reference genomes
209+
## Building a unified species-level genome catalog: integrating MAGmax dereplication results with GTDB reference genomes
210210

211211
The `unifygtdb.sh` script combines magmax customdb output and GTDB reference genomes. This is useful when users wants to create a complete species-level genome reference database including all known species and unknown species covered in the input data.
212212

0 commit comments

Comments
 (0)