"description": "# nf-cmgg/preprocessing\n\n[](https://github.com/codespaces/new/nf-cmgg/preprocessing)\n[](https://github.com/nf-cmgg/preprocessing/actions/workflows/nf-test.yml)\n[](https://github.com/nf-cmgg/preprocessing/actions/workflows/linting.yml)[](https://doi.org/10.5281/zenodo.XXXXXXX)\n[](https://www.nf-test.com)\n\n[](https://www.nextflow.io/)\n[](https://github.com/nf-core/tools/releases/tag/3.5.1)\n[](https://www.docker.com/)\n[](https://sylabs.io/docs/)\n[](https://cloud.seqera.io/launch?pipeline=https://github.com/nf-cmgg/preprocessing)\n\n## Introduction\n\n**nf-cmgg/preprocessing** is a bioinformatics pipeline that demultiplexes and aligns raw sequencing data.\nIt also performs basic QC and coverage analysis.\n\nThe pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.\n\nSteps include:\n\n- Demultiplexing using [`BCLconvert`](https://emea.support.illumina.com/sequencing/sequencing_software/bcl-convert.html)\n- Run QC using [`MultiQC SAV`](https://github.com/MultiQC/MultiQC_SAV)\n- Read QC and trimming using [`fastp`](https://github.com/OpenGene/fastp) or [`falco`](https://github.com/smithlabcode/falco)\n- Alignment using either [`bwa`](https://github.com/lh3/bwa), [`bwa-mem2`](https://github.com/bwa-mem2/bwa-mem2), [`bowtie2`](https://github.com/BenLangmead/bowtie2), [`dragmap`](https://github.com/Illumina/DRAGMAP), [`snap`](https://github.com/amplab/snap) or [`strobe`](https://github.com/ksahlin/strobealign) for DNA-seq and [`STAR`](https://github.com/alexdobin/STAR) for RNA-seq\n- Duplicate marking using [`bamsormadup`](https://gitlab.com/german.tischler/biobambam2) or [`samtools markdup`](http://www.htslib.org/doc/samtools-markdup.html)\n- Coverage analysis using [`mosdepth`](https://github.com/brentp/mosdepth) and [`samtools coverage`](http://www.htslib.org/doc/samtools-coverage.html)\n- Alignment QC using [`samtools flagstat`](http://www.htslib.org/doc/samtools-flagstat.html), [`samtools stats`](http://www.htslib.org/doc/samtools-stats.html), [`samtools idxstats`](http://www.htslib.org/doc/samtools-idxstats.html) and [`picard CollectHsMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectHsMetrics), [`picard CollectWgsMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectWgsMetrics), [`picard CollectMultipleMetrics`](https://broadinstitute.github.io/picard/command-line-overview.html#CollectMultipleMetrics)\n- QC aggregation using [`multiqc`](https://multiqc.info/)\n\n<picture>\n\n <source media=\"(prefers-color-scheme: dark)\" srcset=\"docs/images/metro_map_dark.svg\">\n <source media=\"(prefers-color-scheme: light)\" srcset=\"docs/images/metro_map_light.svg\">\n <img alt=\"Fallback image description\" src=\"docs/images/metro_map_light.svg\">\n</picture>\n\n## Usage\n\n> [!NOTE]\n> If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.\n\nThe full documentation can be found [here](docs/README.md)\n\nFirst, prepare a samplesheet with your input data. Check the [usage docs](docs/usage.md) for details on the required format and example files.\n\nNow, you can run the pipeline using:\n\n```bash\nnextflow run nf-cmgg/preprocessing \\\n -profile <docker/singularity/...> \\\n --igenomes_base /path/to/genomes \\\n --input samplesheet.<csv|yaml|json> \\\n --outdir <OUTDIR>\n```\n\n> [!WARNING]\n> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_;\n> see [docs](https://nf-co.re/usage/configuration#custom-configuration-files).\n\n## Development environment\n\nA [pixi](https://pixi.prefix.dev/latest/) development environment is available for this pipeline. Run the following command to install the environment:\n\n```\npixi install\n```\n\nThen run `pixi shell` to enter the environment and start developing.\n\n## Credits\n\nnf-cmgg/preprocessing was originally written by the CMGG ICT team.\n\n## Support\n\nThis pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).\n\n> **The nf-core framework for community-curated bioinformatics pipelines.**\n>\n> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.\n>\n> _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x).\n",
0 commit comments