Skip to content

Commit a060766

Browse files
committed
more docs outputs.md
1 parent 1257951 commit a060766

2 files changed

Lines changed: 148 additions & 5 deletions

File tree

579 KB
Loading

docs/output.md

Lines changed: 148 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
2020
- [bedtools](#bedtools)
2121
- [bedGraphToBigWig](#bedGraphToBigWig)
2222
- [Extensive QC of alignments](#Alignment-quality-control)
23-
- [samtools](#samtools-flagstats)
23+
- [samtools](#samtools-flagstat)
2424
- [cramino](#cramino)
2525
- [alfred](#alfred)
2626
- [ngs-bits](#ngs-bits)
@@ -38,7 +38,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
3838
- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
3939
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
4040

41-
## FASTQ-quality-control
41+
## FASTQ quality control
4242

4343
### NANOQ
4444

@@ -105,9 +105,152 @@ Top ranking read lengths (bp)
105105

106106
</details>
107107

108-
[NANOQ](https://github.com/esteinig/nanoq) provides general quality statistics
109-
about the nanopore sequence reads. It outputs the statistics in both verbose and
110-
minimal reports, which can be formatted in `json` format.
108+
[SEQUALI](https://github.com/rhpvorderman/sequali) provides general quality statistics
109+
about the sequence reads, along with several other features including,
110+
overrepresentation analysis and duplication rate estimation. It outputs the
111+
statistics in both ??
112+
113+
## Reference genome mapping
114+
115+
### minimap2
116+
117+
<details markdown="1">
118+
<summary>Output files</summary>
119+
120+
- `multiqc/`
121+
- `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
122+
- `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
123+
- `multiqc_plots/`: directory containing static images from the report in various formats.
124+
125+
</details>
126+
127+
[minimap2](https://github.com/lh3/minimap2) is perhaps the most popular
128+
long-read sequence aligner. In general, it aligns the sequence reads to the reference
129+
genome/transcriptome provided by the user. Taken directly from the developers
130+
> Minimap2 is a versatile sequence alignment program that aligns DNA or mRNA
131+
sequences against a large reference database. Typical use cases include: (1)
132+
mapping PacBio or Oxford Nanopore genomic reads to the human genome; (2)
133+
finding overlaps between long reads with error rate up to ~15%; (3)
134+
splice-aware alignment of PacBio Iso-Seq or Nanopore cDNA or Direct RNA reads
135+
against a reference genome; (4) aligning Illumina single- or paired-end reads;
136+
(5) assembly-to-assembly alignment; (6) full-genome alignment between two
137+
closely related species with divergence below ~15%.
138+
139+
### samtools sort index
140+
141+
<details markdown="1">
142+
<summary>Output files</summary>
143+
144+
- `multiqc/`
145+
- `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
146+
- `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
147+
- `multiqc_plots/`: directory containing static images from the report in various formats.
148+
149+
</details>
150+
151+
[samtools](http://www.htslib.org/doc/#manual-pages) is a multipurpose
152+
toolkit for working with SAM/BAM files. It is used to sort the output from
153+
minimap2 (SAM format) and output it in compressed BAM format, and then index
154+
this file.
155+
156+
## Create files to visualise mapping
157+
158+
### bedtools
159+
160+
<details markdown="1">
161+
<summary>Output files</summary>
162+
163+
- `multiqc/`
164+
- `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
165+
- `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
166+
- `multiqc_plots/`: directory containing static images from the report in various formats.
167+
168+
</details>
169+
170+
[bedtools](https://github.com/arq5x/bedtools2) is a multipurpose
171+
toolkit for working with tab separated genomic formats such as GTF/GFF/BED, but
172+
also SAM/BAM/CRAM files. Here it is used convert the mapped BAM file to BEDGRAPH
173+
format, in preparation for conversion to BigWig.
174+
175+
### bedGraphToBigWig
176+
177+
<details markdown="1">
178+
<summary>Output files</summary>
179+
180+
- `multiqc/`
181+
- `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
182+
- `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
183+
- `multiqc_plots/`: directory containing static images from the report in various formats.
184+
185+
</details>
186+
187+
[bedGraphToBigWig](https://hgdownload.soe.ucsc.edu/admin/exe/) is a specific
188+
tool that is part of a broad UCSC software suite. It has one specific
189+
function that can be guessed from it's very name. You guessed it, it converts a
190+
bedgraph to a BigWig file, that's it. Once created, the BigWig files can be
191+
loaded into a genome browser such as IGV, allowing the mapping to be visualised
192+
in a lightweight way.
193+
194+
## Alignment quality control
195+
196+
### samtools flagstat
197+
198+
[samtools](http://www.htslib.org/doc/#manual-pages) flagstats provides summary
199+
statistics on the mapped BAM file. Specifically, it counts the number of
200+
alignments for each FLAG type.
201+
202+
### cramino
203+
204+
[cramino](https://github.com/wdecoster/cramino) is a tool for quick quality assessment of cram and bam files, intended for long read sequencing.
205+
206+
```
207+
File name example.cram
208+
Number of reads 14108020
209+
% from total reads 83.45
210+
Yield [Gb] 139.91
211+
N50 17447
212+
Median length 6743.00
213+
Mean length 9917
214+
Median identity 94.27
215+
Mean identity 92.53
216+
Path alignment/example.cram
217+
Creation time 09/09/2022 10:53:36
218+
```
219+
220+
### alfred
221+
222+
[alfred](https://www.gear-genomics.com/docs/alfred/cli/) computes various
223+
alignment metrics and summary statistics by read group.
224+
225+
### ngs-bits
226+
227+
[ngs-bits
228+
mappingQC](https://github.com/imgag/ngs-bits/blob/master/doc/tools/MappingQC/index.md)
229+
provides one more technique for quality control of the mapped BAM files. It's
230+
advantage is that it has an output that is compatible with
231+
[MultiQC](https://github.com/MultiQC/MultiQC/blob/main/docs/markdown/modules/ngsbits.md).
232+
233+
234+
## Transcriptome reconstruction
235+
236+
### FLAIR
237+
238+
[FLAIR](https://github.com/BrooksLabUCSC/flair) **F**ull **L**ength
239+
**A**lternative **I**soform analysis of **R**NA is used for the correction,
240+
isoform definition, and alternative splicing analysis of noisy reads. FLAIR has
241+
primarily been used for nanopore cDNA, native RNA, and PacBio sequencing reads.
242+
FLAIR is able to be used with and without read correction, making it amenable to
243+
sensitive sample types, such as those coming from cancer where errors may
244+
instead be putative variants which should not be corrected.
245+
246+
![FLAIR - example schematic](images/flair_workflow_compartmentalized.png)
247+
248+
### bambu
249+
250+
### IsoQuant
251+
252+
### StringTie
253+
111254

112255

113256
<details markdown="1">

0 commit comments

Comments
 (0)