@@ -20,7 +20,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
2020 - [ bedtools] ( #bedtools )
2121 - [ bedGraphToBigWig] ( #bedGraphToBigWig )
2222- [ Extensive QC of alignments] ( #Alignment-quality-control )
23- - [ samtools] ( #samtools-flagstats )
23+ - [ samtools] ( #samtools-flagstat )
2424 - [ cramino] ( #cramino )
2525 - [ alfred] ( #alfred )
2626 - [ ngs-bits] ( #ngs-bits )
@@ -38,7 +38,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
3838- [ MultiQC] ( #multiqc ) - Aggregate report describing results and QC from the whole pipeline
3939- [ Pipeline information] ( #pipeline-information ) - Report metrics generated during the workflow execution
4040
41- ## FASTQ- quality- control
41+ ## FASTQ quality control
4242
4343### NANOQ
4444
@@ -105,9 +105,152 @@ Top ranking read lengths (bp)
105105
106106</details >
107107
108- [ NANOQ] ( https://github.com/esteinig/nanoq ) provides general quality statistics
109- about the nanopore sequence reads. It outputs the statistics in both verbose and
110- minimal reports, which can be formatted in ` json ` format.
108+ [ SEQUALI] ( https://github.com/rhpvorderman/sequali ) provides general quality statistics
109+ about the sequence reads, along with several other features including,
110+ overrepresentation analysis and duplication rate estimation. It outputs the
111+ statistics in both ??
112+
113+ ## Reference genome mapping
114+
115+ ### minimap2
116+
117+ <details markdown =" 1 " >
118+ <summary >Output files</summary >
119+
120+ - ` multiqc/ `
121+ - ` multiqc_report.html ` : a standalone HTML file that can be viewed in your web browser.
122+ - ` multiqc_data/ ` : directory containing parsed statistics from the different tools used in the pipeline.
123+ - ` multiqc_plots/ ` : directory containing static images from the report in various formats.
124+
125+ </details >
126+
127+ [ minimap2] ( https://github.com/lh3/minimap2 ) is perhaps the most popular
128+ long-read sequence aligner. In general, it aligns the sequence reads to the reference
129+ genome/transcriptome provided by the user. Taken directly from the developers
130+ > Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA
131+ sequences against a large reference database. Typical use cases include: (1)
132+ mapping PacBio or Oxford Nanopore genomic reads to the human genome; (2)
133+ finding overlaps between long reads with error rate up to ~ 15%; (3)
134+ splice-aware alignment of PacBio Iso-Seq or Nanopore cDNA or Direct RNA reads
135+ against a reference genome; (4) aligning Illumina single- or paired-end reads;
136+ (5) assembly-to-assembly alignment; (6) full-genome alignment between two
137+ closely related species with divergence below ~ 15%.
138+
139+ ### samtools sort index
140+
141+ <details markdown =" 1 " >
142+ <summary >Output files</summary >
143+
144+ - ` multiqc/ `
145+ - ` multiqc_report.html ` : a standalone HTML file that can be viewed in your web browser.
146+ - ` multiqc_data/ ` : directory containing parsed statistics from the different tools used in the pipeline.
147+ - ` multiqc_plots/ ` : directory containing static images from the report in various formats.
148+
149+ </details >
150+
151+ [ samtools] ( http://www.htslib.org/doc/#manual-pages ) is a multipurpose
152+ toolkit for working with SAM/BAM files. It is used to sort the output from
153+ minimap2 (SAM format) and output it in compressed BAM format, and then index
154+ this file.
155+
156+ ## Create files to visualise mapping
157+
158+ ### bedtools
159+
160+ <details markdown =" 1 " >
161+ <summary >Output files</summary >
162+
163+ - ` multiqc/ `
164+ - ` multiqc_report.html ` : a standalone HTML file that can be viewed in your web browser.
165+ - ` multiqc_data/ ` : directory containing parsed statistics from the different tools used in the pipeline.
166+ - ` multiqc_plots/ ` : directory containing static images from the report in various formats.
167+
168+ </details >
169+
170+ [ bedtools] ( https://github.com/arq5x/bedtools2 ) is a multipurpose
171+ toolkit for working with tab separated genomic formats such as GTF/GFF/BED, but
172+ also SAM/BAM/CRAM files. Here it is used convert the mapped BAM file to BEDGRAPH
173+ format, in preparation for conversion to BigWig.
174+
175+ ### bedGraphToBigWig
176+
177+ <details markdown =" 1 " >
178+ <summary >Output files</summary >
179+
180+ - ` multiqc/ `
181+ - ` multiqc_report.html ` : a standalone HTML file that can be viewed in your web browser.
182+ - ` multiqc_data/ ` : directory containing parsed statistics from the different tools used in the pipeline.
183+ - ` multiqc_plots/ ` : directory containing static images from the report in various formats.
184+
185+ </details >
186+
187+ [ bedGraphToBigWig] ( https://hgdownload.soe.ucsc.edu/admin/exe/ ) is a specific
188+ tool that is part of a broad UCSC software suite. It has one specific
189+ function that can be guessed from it's very name. You guessed it, it converts a
190+ bedgraph to a BigWig file, that's it. Once created, the BigWig files can be
191+ loaded into a genome browser such as IGV, allowing the mapping to be visualised
192+ in a lightweight way.
193+
194+ ## Alignment quality control
195+
196+ ### samtools flagstat
197+
198+ [ samtools] ( http://www.htslib.org/doc/#manual-pages ) flagstats provides summary
199+ statistics on the mapped BAM file. Specifically, it counts the number of
200+ alignments for each FLAG type.
201+
202+ ### cramino
203+
204+ [ cramino] ( https://github.com/wdecoster/cramino ) is a tool for quick quality assessment of cram and bam files, intended for long read sequencing.
205+
206+ ```
207+ File name example.cram
208+ Number of reads 14108020
209+ % from total reads 83.45
210+ Yield [Gb] 139.91
211+ N50 17447
212+ Median length 6743.00
213+ Mean length 9917
214+ Median identity 94.27
215+ Mean identity 92.53
216+ Path alignment/example.cram
217+ Creation time 09/09/2022 10:53:36
218+ ```
219+
220+ ### alfred
221+
222+ [ alfred] ( https://www.gear-genomics.com/docs/alfred/cli/ ) computes various
223+ alignment metrics and summary statistics by read group.
224+
225+ ### ngs-bits
226+
227+ [ ngs-bits
228+ mappingQC] ( https://github.com/imgag/ngs-bits/blob/master/doc/tools/MappingQC/index.md )
229+ provides one more technique for quality control of the mapped BAM files. It's
230+ advantage is that it has an output that is compatible with
231+ [ MultiQC] ( https://github.com/MultiQC/MultiQC/blob/main/docs/markdown/modules/ngsbits.md ) .
232+
233+
234+ ## Transcriptome reconstruction
235+
236+ ### FLAIR
237+
238+ [ FLAIR] ( https://github.com/BrooksLabUCSC/flair ) ** F** ull ** L** ength
239+ ** A** lternative ** I** soform analysis of ** R** NA is used for the correction,
240+ isoform definition, and alternative splicing analysis of noisy reads. FLAIR has
241+ primarily been used for nanopore cDNA, native RNA, and PacBio sequencing reads.
242+ FLAIR is able to be used with and without read correction, making it amenable to
243+ sensitive sample types, such as those coming from cancer where errors may
244+ instead be putative variants which should not be corrected.
245+
246+ ![ FLAIR - example schematic] ( images/flair_workflow_compartmentalized.png )
247+
248+ ### bambu
249+
250+ ### IsoQuant
251+
252+ ### StringTie
253+
111254
112255
113256<details markdown =" 1 " >
0 commit comments