Skip to content

Commit 02d4b13

Browse files
committed
linting failures fixing
1 parent 2b3442e commit 02d4b13

9 files changed

Lines changed: 52 additions & 35 deletions

File tree

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@
1212

1313
## Introduction
1414

15-
**rich_longTranscriptomics** is a nextflow pipeline that is used for the processing of direct RNA nanopore sequencing data, providing multiple transcript reconstruction options, and quantification with the use of a reference genome, and transcriptome annotation.
15+
**LongTranscriptomics** is a nextflow pipeline that is used for the processing of direct RNA nanopore sequencing data, providing multiple transcript reconstruction options, and quantification with the use of a reference genome, and transcriptome annotation.
16+
1617
<!-- Additionally, it performs post transcriptome reconstruction assessment, and recovery. -->
1718

1819
The pipeline currently _only_ accepts sequencing data from directRNA Oxford

assets/multiqc_config.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,11 @@ report_comment: >
22
This report has been generated by the <a href="https://github.com/number-25/LongTranscriptomics/tree/dev" target="_blank">number-25/LongTranscriptomics</a>
33
analysis pipeline.
44
report_section_order:
5-
"number-25-rich_directRNA-methods-description":
5+
"number-25-LongTranscriptomics-methods-description":
66
order: -1000
77
software_versions:
88
order: -1001
9-
"number-25-rich_directRNA-summary":
9+
"number-25-LongTranscriptomics-summary":
1010
order: -1002
1111

1212
export_plots: true
File renamed without changes.

bin/run_bambu.r

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,3 +48,6 @@ print(readlist)
4848
grlist <- prepareAnnotations(annot_gtf)
4949
se <- bambu(reads = readlist, annotations = grlist, genome = genomeSequence, ncore = ncore, verbose = TRUE)
5050
writeBambuOutput(se, output_tag)
51+
52+
#se.novel = se[mcols(se)$novelTranscript&(apply(assays(se)$fullLengthCounts >= 1,1,sum)>=1),]
53+
#writeBambuOutput(se.novel, output_tag)
File renamed without changes.
File renamed without changes.

docs/output.md

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
1616
- [Reference genome mapping](#Reference-genome-mapping)
1717
- [minimap2](#minimap2)
1818
- [samtools](#samtools-sort-index)
19-
- [Create bigWig coverage files](#Create-files-to-visualise-mapping)
19+
- [Create bigWig coverage files](#Create-files-to-visualise-mapping-coverage)
2020
- [bedtools](#bedtools)
2121
- [bedGraphToBigWig](#bedGraphToBigWig)
2222
- [Extensive QC of alignments](#Alignment-quality-control)
@@ -150,7 +150,7 @@ toolkit for working with SAM/BAM files. It is used to sort the output from
150150
minimap2 (SAM format) and output it in compressed BAM format, and then index
151151
this file.
152152

153-
## Create files to visualise mapping
153+
## Create files to visualise mapping coverage
154154

155155
### bedtools
156156

@@ -184,7 +184,15 @@ tool that is part of a broad UCSC software suite. It has one specific
184184
function that can be guessed from it's very name. You guessed it, it converts a
185185
bedgraph to a BigWig file, that's it. Once created, the BigWig files can be
186186
loaded into a genome browser such as IGV, allowing the mapping to be visualised
187-
in a lightweight way.
187+
in a lightweight way. The bigWig format is an indexed binary format useful for
188+
displaying dense, continuous data in Genome Browsers such as the UCSC and IGV.
189+
This mitigates the need to load the much larger BAM file for data visualisation
190+
purposes which will be slower and result in memory issues. The bigWig format is
191+
also supported by various bioinformatics software for downstream processing
192+
such as meta-profile plotting.
193+
194+
bigBed are more useful for displaying distribution of reads across exon
195+
intervals as is typically observed for RNA-seq dat
188196

189197
## Alignment quality control
190198

@@ -360,8 +368,7 @@ variants for each gene locus. StringTie does not perform read correction.
360368

361369
</details>
362370

363-
[gffcompare](https://ccb.jhu.edu/software/stringtie/gff.shtml#gffcompare) can be used to compare, merge, annotate and estimate
364-
accuracy of one or more GFF files (the "query" files), when compared with a
371+
[gffcompare](https://ccb.jhu.edu/software/stringtie/gff.shtml#gffcompare) can be used to compare, merge, annotate and estimate accuracy of one or more GFF files (the "query" files), when compared with a
365372
reference annotation (also provided as GFF/GTF).
366373

367374
```
@@ -386,17 +393,16 @@ Intron chain level: 56.9 | 52.4 |
386393
<details markdown="1">
387394
<summary>Output files</summary>
388395

389-
- `multiqc/`
390-
- `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
391-
- `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
392-
- `multiqc_plots/`: directory containing static images from the report in various formats.
396+
- `transcript_quantification/oarfish/<samplename>/`
397+
- `*.quant.gz`: a tab separated file listing the quantified targets, as well as information about their length and other metadata. The num_reads column provides the estimate of the number of reads originating from each target.
398+
- `*.meta_info.json`: a JSON format file containing information about relevant parameters with which oarfish was run, and other relevant inforamtion from the processed sample apart from the actual transcript quantifications.
393399

394400
</details>
395401

396402
[oarfish](https://github.com/COMBINE-lab/oarfish) is a program for quantifying
397403
transcript-level expression from long-read (i.e. Oxford nanopore cDNA and
398404
direct RNA and PacBio) sequencing technologies. oarfish requires a sample of
399-
sequencing reads aligned to the transcriptome (currntly not to the genome). It
405+
sequencing reads aligned to the transcriptome (currently not to the genome). It
400406
handles multi-mapping reads through the use of probabilistic allocation via an
401407
expectation-maximization (EM) algorithm.
402408

modules/local/bambu/bambu/main.nf

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,10 +18,13 @@ process BAMBU {
1818
tuple val(meta), path(bam)
1919

2020
output:
21-
path "counts_gene.txt" , emit: ch_gene_counts
22-
path "counts_transcript.txt" , emit: ch_transcript_counts
23-
tuple val(meta), path("extended_annotations.gtf"), emit: bambu_extended_gtf
24-
path "versions.yml" , emit: versions
21+
path "counts_gene.txt" , emit: ch_gene_counts
22+
path "counts_transcript.txt" , emit: ch_transcript_counts
23+
tuple val(meta) , path("extended_annotations.gtf") , emit: bambu_extended_gtf
24+
//path "allTranscriptModels.gtf" , emit: bambu_all_gtf
25+
//path "supportedTranscriptModels.gtf" , emit: bambu_supported_gtf
26+
//path "novelTranscripts.gtf" , emit: bambu_novel_only_gtf
27+
path "versions.yml" , emit: versions
2528

2629
/*
2730
tuple val(meta), path("extendedAnnotations.gtf"), emit: bambu_extended_gtf

workflows/directrna.nf

Lines changed: 22 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -365,15 +365,15 @@ workflow DIRECTRNA{
365365
// BAMBU
366366
if (!params.skip_bambu) {
367367
BAMBU( ch_genome_fasta, ch_annotation_gtf, ch_bam )
368-
ch_bambu_gtf = BAMBU.out.bambu_extended_gtf
368+
ch_bambu_extended_gtf = BAMBU.out.bambu_extended_gtf
369+
//ch_bambu_supported_gtf = BAMBU.out.bambu_supported_gtf
369370
ch_versions = ch_versions.mix(BAMBU.out.versions.first())
370371
// MIX genome fasta with fasta index as this will improve GFFREADs speed
371-
GFFREAD_GETFASTA_BAMBU( ch_bambu_gtf, ch_genome_fasta_with_index, 'bambu' )
372-
ch_bambu_transcripts = GFFREAD_GETFASTA_BAMBU.out.transcripts_fa
373-
ch_versions = ch_versions.mix(GFFREAD_GETFASTA_BAMBU.out.versions.first())
372+
//GFFREAD_GETFASTA_BAMBU( ch_bambu_extended_gtf, ch_genome_fasta_with_index, 'bambu' )
373+
//ch_bambu_transcripts = GFFREAD_GETFASTA_BAMBU.out.transcripts_fa
374+
//ch_versions = ch_versions.mix(GFFREAD_GETFASTA_BAMBU.out.versions.first())
374375
}
375376

376-
// TODO
377377
// ISOQUANT
378378
if (!params.skip_isoquant) {
379379
GTF2DB( ch_annotation_gtf )
@@ -404,35 +404,39 @@ workflow DIRECTRNA{
404404
ch_jaffal_csv = JAFFAL.out.jaffal_results
405405
ch_versions = ch_versions.mix(JAFFAL.out.versions.first())
406406
}
407-
/*
407+
408408
//
409409
// Transcriptome assessment
410410
// SQANTI, gffcompare
411-
// TODO
412411
// Not done yet
413412
if (!params.skip_gffcompare) {
414413
if (!params.skip_flair) {
415414
GFFCOMPARE_FLAIR( ch_genome_fasta_with_index, ch_flair_collapsed_gtf, ch_annotation_gtf, 'flair' )
416415
ch_flair_gffcompare_stats = GFFCOMPARE_FLAIR.out.gffcompare_stats.collect{it[1]}.flatten()
417416
ch_multiqc_files = ch_multiqc_files.mix(ch_flair_gffcompare_stats.ifEmpty([]))
418417
}
419-
if (!params.skip_bambu) {
420-
GFFCOMPARE_BAMBU( ch_genome_fasta_with_index, ch_bambu_gtf, ch_annotation_gtf, 'bambu' )
421-
ch_bambu_gffcompare_stats = GFFCOMPARE_BAMBU.out.gffcompare_stats.collect{it[1]}.flatten()
422-
ch_multiqc_files = ch_multiqc_files.mix(ch_bambu_gffcompare_stats.ifEmpty([]))
418+
// TODO - this version of bambu currently only outputs the "extended annotation", which is the reference annotation + the detected transcripts in the sample, so there's no point to doing gffcompare as sensitivity and accuracy are 100%
419+
//if (!params.skip_bambu) {
420+
// GFFCOMPARE_BAMBU( ch_genome_fasta_with_index, ch_bambu_extended_gtf, ch_annotation_gtf, 'bambu' )
421+
// ch_bambu_gffcompare_stats = GFFCOMPARE_BAMBU.out.gffcompare_stats.collect{it[1]}.flatten()
422+
// ch_multiqc_files = ch_multiqc_files.mix(ch_bambu_gffcompare_stats.ifEmpty([]))
423+
//}
424+
if (!params.skip_isoquant) {
425+
GFFCOMPARE_ISOQUANT(ch_genome_fasta_with_index, ch_isoquant_gtf, ch_annotation_gtf, 'isoquant' )
426+
ch_isoquant_gffcompare_stats = GFFCOMPARE_ISOQUANT.out.gffcompare_stats.collect{it[1]}.flatten()
427+
ch_multiqc_files = ch_multiqc_files.mix(ch_isoquant_gffcompare_stats.ifEmpty([]))
423428
}
424-
// if (!params.skip_isoquant) {
425-
// GFFCOMPARE_ISOQUANT( ch_isoquant_gtf, ch_annotation_gtf, ch_genome_fasta_with_index, 'isoquant' )
426-
// ch_multiqc_files = ch_multiqc_files.mix(GFFCOMPARE_ISOQUANT.out.gffcompare_stats.ifEmpty([]))
427-
// }
428429
if (!params.skip_stringtie) {
429430
GFFCOMPARE_STRINGTIE( ch_genome_fasta_with_index, ch_stringtie_gtf, ch_annotation_gtf, 'stringtie' )
430431
ch_stringtie_gffcompare_stats = GFFCOMPARE_STRINGTIE.out.gffcompare_stats.collect{it[1]}.flatten()
431432
ch_multiqc_files = ch_multiqc_files.mix(ch_stringtie_gffcompare_stats.ifEmpty([]))
432433
}
433434
}
434435

436+
ch_multiqc_files.view()
437+
435438
/*
439+
// TODO
436440
// if (!skip_sqanti_all) {
437441
if (!skip_sqanti_qc) {
438442
if (run_flair){
@@ -468,9 +472,9 @@ workflow DIRECTRNA{
468472
if (!params.skip_flair) {
469473
OARFISH_FLAIR( ch_flair_collapsed_fa, ch_transcriptome_minimap2_index, ch_sequencing_type, 'flair' )
470474
}
471-
if (!params.skip_bambu) {
472-
OARFISH_BAMBU( ch_bambu_transcripts, ch_transcriptome_minimap2_index, ch_sequencing_type, 'bambu' )
473-
}
475+
//if (!params.skip_bambu) {
476+
// OARFISH_BAMBU( ch_bambu_transcripts, ch_transcriptome_minimap2_index, ch_sequencing_type, 'bambu' )
477+
//}
474478
if (!params.skip_isoquant) {
475479
OARFISH_ISOQUANT( ch_isoquant_transcripts, ch_transcriptome_minimap2_index, ch_sequencing_type, 'isoquant' )
476480
}

0 commit comments

Comments
 (0)