Skip to content

Commit 29c816f

Browse files
authored
Merge pull request #1999 from dialvarezs/mag
Update GUNC test db to use official one
2 parents 783def9 + 5e9e853 commit 29c816f

3 files changed

Lines changed: 6 additions & 24 deletions

File tree

README.md

Lines changed: 6 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,12 @@ We have uploaded a copy of the official [GTDB-Tk mock database](https://data.gtd
7979
The database is available as a gzipped tarball at:
8080
- `databases/gtdbtk/gtdbtk_mockup_20250422.tar.gz`
8181

82+
### GUNC
83+
84+
GUNC database is provided by the GUNC team, as a minimal test set for CI/CD pipelines.
85+
86+
This was introduces in GUNC v1.1.0 and can by downloaded by running `gunc download_db --db test_data`. It is also available in [Zenodo](https://zenodo.org/records/19631420).
87+
8288
## Broken samplesheets
8389

8490
For testing input validation, the `samplesheets` directory contains the `broken/` subdirectory containing samplesheets with errors that should be caught by the pipeline.
@@ -96,30 +102,6 @@ For testing input validation, the `samplesheets` directory contains the `broken/
96102
-`samplesheets/broken/samplesheet_nonunique_sample_run_combination.csv`: has invalid duplicate sample-run combinations
97103
-`samplesheets/broken/samplesheet_spaces_in_name.csv`: incorrect sample name with spaces
98104

99-
### GUNC
100-
101-
For GUNC we created a mock database with the following commands:
102-
103-
```bash
104-
curl "https://www.ncbi.nlm.nih.gov/sviewer/viewer.cgi?tool=portal&save=file&log$=seqview&db=nuccore&report=fasta_cds_aa&id=1992822979&extrafeat=null&conwithfeat=on&hide-cdd=on&ncbi_phid=CE8C15326D6BB8C10000000006490560" -o sequence.fasta
105-
diamond makedb --in sequence.fasta -d gunc-mock
106-
```
107-
108-
Wich resulted in the `gunc-mock.dmnd` database file that can be used this way:
109-
```bash
110-
gunc run --db_file gunc-mock.dmnd -i bin.fa.gz
111-
```
112-
113-
The only caveat is that the output is mostly NaNs, but it produces an output good enough to test if the tool is running.
114-
```tsv
115-
genome n_genes_called n_genes_mapped n_contigs taxonomic_level proportion_genes_retained_in_major_clades genes_retained_index clade_separation_score contamination_portion n_effective_surplus_clades mean_hit_identity reference_representation_score pass.GUNC
116-
MEGAHIT-MetaBAT2-test_minigut.1 736 663 76 kingdom nan nan nan nan nan nan nan nan
117-
```
118-
119-
Versions used for the db creation / testing
120-
- diamond: 2.0.4
121-
- GUNC: 1.0.6
122-
123105
## Support
124106

125107
For further information or help, don't hesitate to get in touch on our [Slack organisation](https://nf-co.re/join/slack) (a tool for instant messaging).

databases/gunc/ci_test.dmnd

4.21 KB
Binary file not shown.

databases/gunc/gunc-mock.dmnd

-2.26 MB
Binary file not shown.

0 commit comments

Comments
 (0)