You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[MultiQC](https://github.com/MultiQC/MultiQC/blob/main/docs/markdown/modules/ngsbits.md). Outut comes formatted in XML format, so is not particularly human readable.
271
267
272
268
## Transcriptome reconstruction
273
269
@@ -276,20 +272,26 @@ advantage is that it has an output that is compatible with
276
272
<detailsmarkdown="1">
277
273
<summary>Output files</summary>
278
274
279
-
-`multiqc/`
280
-
-`multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
281
-
-`multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
282
-
-`multiqc_plots/`: directory containing static images from the report in various formats.
275
+
-`transcript_reconstruction/flair/bam_to_bed/`
276
+
-`*_.bed`: genome alignment that has been converted from BAM to BED format, to be used as input to FLAIR.
277
+
-`correct/`(optional)
278
+
-`*_flair_correct_all_corrected.bed`: BED file of correct reads that is used in subsequent steps.
279
+
-`*_flair_correct_all_inconsistent.bed`: BED file of rejected alignments.
280
+
-`*_flair_correct_cannot_verify.bed`: BED file of unknown alignments, (only if the) chromosome is not found in annotation.
281
+
-`collapse/`
282
+
-`*_collapsed_isoforms.bed`: BED file of high confidence isoforms.
283
+
-`*_collapsed_isoforms.gtf`: as above but in GTF format.
284
+
-`*_collapsed_isoforms.fa`: fasta sequences of high confidence isoforms.
**A**lternative **I**soform analysis of **R**NA is used for the correction,
288
290
isoform definition, and alternative splicing analysis of noisy reads. FLAIR has
289
291
primarily been used for nanopore cDNA, native RNA, and PacBio sequencing reads.
290
-
FLAIR is able to be used with and without read correction, making it amenable to
292
+
FLAIR is able to be used with and without read correction (splice site correction), making it amenable to
291
293
sensitive sample types, such as those coming from cancer where errors may
292
-
instead be putative variants which should not be corrected.
294
+
instead be putative variants which should not be corrected. FLAIR accepts a BED file as input, therefore, the aligned BAM file is always converted to BED format prior to input.
293
295
294
296

295
297
@@ -298,10 +300,10 @@ instead be putative variants which should not be corrected.
298
300
<detailsmarkdown="1">
299
301
<summary>Output files</summary>
300
302
301
-
-`multiqc/`
302
-
-`multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
303
-
-`multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
304
-
-`multiqc_plots/`: directory containing static images from the report in various formats.
-`extended_annotations.gtf`: contains all transcript models from the reference annotations and any novel high confidence transcript models (below NDR threshold).
305
307
306
308
</details>
307
309
@@ -312,9 +314,20 @@ instead be putative variants which should not be corrected.
312
314
<detailsmarkdown="1">
313
315
<summary>Output files</summary>
314
316
315
-
-`multiqc/`
316
-
-`multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
317
-
-`multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
317
+
-`transcript_reconstruction/isoquant/`
318
+
-`*_isoquant.corrected_reads.bed.gz`: BED file with corrected read alignments (gzipped by default).
319
+
-`*_isoquant.discovered_gene_counts.tsv`: raw read counts for discovered genes (corresponds to SAMPLE_ID.transcript_models.gtf).
320
+
-`*_isoquant.discovered_gene_tpm.tsv`: expression of discovered genes in TPM (corresponds to SAMPLE_ID.transcript_models.gtf).
321
+
-`*_isoquant.discovered_transcript_counts.tsv`: raw read counts for discovered transcript models (corresponds to SAMPLE_ID.transcript_models.gtf).
322
+
-`*_isoquant.discovered_transcript_tpm.tsv`: expression of discovered transcripts models in TPM (corresponds to SAMPLE_ID.transcript_models.gtf).
323
+
-`*_isoquant.extended_annotation.gtf`: GTF file with the entire reference annotation plus all discovered novel transcripts.
324
+
-`*_isoquant.gene_counts.tsv`: TSV file with raw read counts for reference genes.
325
+
-`*_isoquant.gene_tpm.tsv`: TSV file with reference gene expression in TPM.
326
+
-`*_isoquant.transcript_counts.tsv`: TSV file with raw read counts for reference transcript.
327
+
-`*_isoquant.transcript_tpm.tsv`: TSV file with reference transcript expression in TPM.
328
+
-`*_isoquant.read_assignments.tsv.gz`: TSV file with read to isoform assignments (gzipped by default).
329
+
-`*_isoquant.transcript_model_reads.tsv.gz`: TSV file indicating which reads contributed to transcript models (gzipped by default).
330
+
-`*_isoquant.transcript_models.gtf`: GTF file with discovered expressed transcript (both known and novel transcripts).
-`*_{flair,isoquant,bambu,stringtie}.annotated.gtf`: input transcriptome GTF file annotated with the reference transcriptome provided.
387
+
-`*_{flair,isoquant,bambu,stringtie}.gtf.refmap`: this tab-delimited file lists, for each reference transcript, which query transcript either fully or partially matches that reference transcript.
388
+
-`*_{flair,isoquant,bambu,stringtie}.gtf.tmap`: this tab delimited file lists the most closely matching reference transcript for each query transcript.
389
+
-`*_{flair,isoquant,bambu,stringtie}.stats`: in this output file Gffcompare reports various statistics related to the “accuracy” (or a measure of agreement) of the input transcripts when compared to reference annotation data. These accuracy measures are calculated under the assumption that the input GFF/GTF file(s) (the "query" transcripts, or transfrags, from one or multiple "samples") are coming from some transcript discovery/assembly pipeline (e.g. Cufflinks or StringTie), or from any other gene/transcript prediction pipeline. GffCompare can be used to assess the accuracy of such pipelines, when comparing their results to a known reference annotation
390
+
-`*_{flair,isoquant,bambu,stringtie}.tracking`: this file matches transcripts up between samples. This file matches transcripts up between samples. Each row represents a transcript structure that is preserved (structurally equivalent) across all the input GTF files. GffCompare considers transcripts "matching" (i.e. structurally equivalent) if all their introns are identical. Note that "matching" transcripts are allowed to differ on the length of the first and last exons, since these lengths can usually vary across samples for the same biological transcript.
-`*.quant.gz`: a tab separated file listing the quantified targets, as well as information about their length and other metadata. The num_reads column provides the estimate of the number of reads originating from each target.
398
421
-`*.meta_info.json`: a JSON format file containing information about relevant parameters with which oarfish was run, and other relevant inforamtion from the processed sample apart from the actual transcript quantifications.
0 commit comments