linting failures fixing

number-25 · number-25 · commit 02d4b1307b3a · 2025-10-30T08:00:12.000+10:00
diff --git a/README.md b/README.md
@@ -12,7 +12,8 @@
 
 ## Introduction
 
-**rich_longTranscriptomics** is a nextflow pipeline that is used for the processing of direct RNA nanopore sequencing data, providing multiple transcript reconstruction options, and quantification with the use of a reference genome, and transcriptome annotation.
+**LongTranscriptomics** is a nextflow pipeline that is used for the processing of direct RNA nanopore sequencing data, providing multiple transcript reconstruction options, and quantification with the use of a reference genome, and transcriptome annotation.
+
 <!-- Additionally, it performs post transcriptome reconstruction assessment, and recovery. -->
 
 The pipeline currently _only_ accepts sequencing data from directRNA Oxford
diff --git a/assets/multiqc_config.yml b/assets/multiqc_config.yml
@@ -2,11 +2,11 @@ report_comment: >
   This report has been generated by the <a href="https://github.com/number-25/LongTranscriptomics/tree/dev" target="_blank">number-25/LongTranscriptomics</a>
   analysis pipeline.
 report_section_order:
-  "number-25-rich_directRNA-methods-description":
+  "number-25-LongTranscriptomics-methods-description":
     order: -1000
   software_versions:
     order: -1001
-  "number-25-rich_directRNA-summary":
+  "number-25-LongTranscriptomics-summary":
     order: -1002
 
 export_plots: true
diff --git a/assets/nf-core-LongTranscriptomics_logo_light.png b/assets/nf-core-LongTranscriptomics_logo_light.png
diff --git a/bin/run_bambu.r b/bin/run_bambu.r
@@ -48,3 +48,6 @@ print(readlist)
 grlist <- prepareAnnotations(annot_gtf)
 se     <- bambu(reads = readlist, annotations = grlist, genome = genomeSequence, ncore = ncore, verbose = TRUE)
 writeBambuOutput(se, output_tag)
+
+#se.novel = se[mcols(se)$novelTranscript&(apply(assays(se)$fullLengthCounts >= 1,1,sum)>=1),]
+#writeBambuOutput(se.novel, output_tag)
diff --git a/docs/images/nf-core-LongTranscriptomics_logo_dark.png b/docs/images/nf-core-LongTranscriptomics_logo_dark.png
diff --git a/docs/images/nf-core-LongTranscriptomics_logo_light.png b/docs/images/nf-core-LongTranscriptomics_logo_light.png
diff --git a/docs/output.md b/docs/output.md
@@ -16,7 +16,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
 - [Reference genome mapping](#Reference-genome-mapping)
   - [minimap2](#minimap2)
   - [samtools](#samtools-sort-index)
-- [Create bigWig coverage files](#Create-files-to-visualise-mapping)
+- [Create bigWig coverage files](#Create-files-to-visualise-mapping-coverage)
   - [bedtools](#bedtools)
   - [bedGraphToBigWig](#bedGraphToBigWig)
 - [Extensive QC of alignments](#Alignment-quality-control)
@@ -150,7 +150,7 @@ toolkit for working with SAM/BAM files. It is used to sort the output from
 minimap2 (SAM format) and output it in compressed BAM format, and then index
 this file.
 
-## Create files to visualise mapping
+## Create files to visualise mapping coverage
 
 ### bedtools
 
@@ -184,7 +184,15 @@ tool that is part of a broad UCSC software suite. It has one specific
 function that can be guessed from it's very name. You guessed it, it converts a
 bedgraph to a BigWig file, that's it. Once created, the BigWig files can be
 loaded into a genome browser such as IGV, allowing the mapping to be visualised
-in a lightweight way.
+in a lightweight way. The bigWig format is an indexed binary format useful for
+displaying dense, continuous data in Genome Browsers such as the UCSC and IGV.
+This mitigates the need to load the much larger BAM file for data visualisation
+purposes which will be slower and result in memory issues. The bigWig format is
+also supported by various bioinformatics software for downstream processing
+such as meta-profile plotting.
+
+bigBed are more useful for displaying distribution of reads across exon
+intervals as is typically observed for RNA-seq dat
 
 ## Alignment quality control
 
@@ -360,8 +368,7 @@ variants for each gene locus. StringTie does not perform read correction.
 
 </details>
 
-[gffcompare](https://ccb.jhu.edu/software/stringtie/gff.shtml#gffcompare) can be used to compare, merge, annotate and estimate
-accuracy of one or more GFF files (the "query" files), when compared with a
+[gffcompare](https://ccb.jhu.edu/software/stringtie/gff.shtml#gffcompare) can be used to compare, merge, annotate and estimate accuracy of one or more GFF files (the "query" files), when compared with a
 reference annotation (also provided as GFF/GTF).
 
 ```
@@ -386,17 +393,16 @@ Intron chain level:    56.9     |    52.4    |
 <details markdown="1">
 <summary>Output files</summary>
 
-- `multiqc/`
-  - `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
-  - `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
-  - `multiqc_plots/`: directory containing static images from the report in various formats.
+- `transcript_quantification/oarfish/<samplename>/`
+  - `*.quant.gz`: a tab separated file listing the quantified targets, as well as information about their length and other metadata. The num_reads column provides the estimate of the number of reads originating from each target.
+  - `*.meta_info.json`: a JSON format file containing information about relevant parameters with which oarfish was run, and other relevant inforamtion from the processed sample apart from the actual transcript quantifications.
 
 </details>
 
 [oarfish](https://github.com/COMBINE-lab/oarfish) is a program for quantifying
 transcript-level expression from long-read (i.e. Oxford nanopore cDNA and
 direct RNA and PacBio) sequencing technologies. oarfish requires a sample of
-sequencing reads aligned to the transcriptome (currntly not to the genome). It
+sequencing reads aligned to the transcriptome (currently not to the genome). It
 handles multi-mapping reads through the use of probabilistic allocation via an
 expectation-maximization (EM) algorithm.
 
diff --git a/modules/local/bambu/bambu/main.nf b/modules/local/bambu/bambu/main.nf
@@ -18,10 +18,13 @@ process BAMBU {
     tuple val(meta), path(bam)
 
     output:
-    path "counts_gene.txt"         , emit: ch_gene_counts
-    path "counts_transcript.txt"   , emit: ch_transcript_counts
-    tuple val(meta), path("extended_annotations.gtf"), emit: bambu_extended_gtf
-    path "versions.yml"            , emit: versions
+    path "counts_gene.txt"               , emit: ch_gene_counts
+    path "counts_transcript.txt"         , emit: ch_transcript_counts
+    tuple val(meta)                      , path("extended_annotations.gtf") , emit: bambu_extended_gtf
+    //path "allTranscriptModels.gtf"       , emit: bambu_all_gtf
+    //path "supportedTranscriptModels.gtf" , emit: bambu_supported_gtf
+    //path "novelTranscripts.gtf"          , emit: bambu_novel_only_gtf
+    path "versions.yml"                  , emit: versions
 
 /*
     tuple val(meta), path("extendedAnnotations.gtf"),        emit: bambu_extended_gtf
diff --git a/workflows/directrna.nf b/workflows/directrna.nf
@@ -365,15 +365,15 @@ workflow DIRECTRNA{
     // BAMBU
     if (!params.skip_bambu) {
         BAMBU( ch_genome_fasta, ch_annotation_gtf, ch_bam )
-        ch_bambu_gtf = BAMBU.out.bambu_extended_gtf
+        ch_bambu_extended_gtf = BAMBU.out.bambu_extended_gtf
+        //ch_bambu_supported_gtf = BAMBU.out.bambu_supported_gtf
         ch_versions = ch_versions.mix(BAMBU.out.versions.first())
         // MIX genome fasta with fasta index as this will improve GFFREADs speed
-        GFFREAD_GETFASTA_BAMBU( ch_bambu_gtf, ch_genome_fasta_with_index, 'bambu' )
-        ch_bambu_transcripts = GFFREAD_GETFASTA_BAMBU.out.transcripts_fa
-        ch_versions = ch_versions.mix(GFFREAD_GETFASTA_BAMBU.out.versions.first())
+        //GFFREAD_GETFASTA_BAMBU( ch_bambu_extended_gtf, ch_genome_fasta_with_index, 'bambu' )
+        //ch_bambu_transcripts = GFFREAD_GETFASTA_BAMBU.out.transcripts_fa
+        //ch_versions = ch_versions.mix(GFFREAD_GETFASTA_BAMBU.out.versions.first())
         }
 
-    // TODO
     // ISOQUANT
     if (!params.skip_isoquant) {
         GTF2DB( ch_annotation_gtf )
@@ -404,35 +404,39 @@ workflow DIRECTRNA{
         ch_jaffal_csv = JAFFAL.out.jaffal_results
         ch_versions = ch_versions.mix(JAFFAL.out.versions.first())
         }
-/*
+
     //
     // Transcriptome assessment
     // SQANTI, gffcompare
-    // TODO
     // Not done yet
     if (!params.skip_gffcompare) {
         if (!params.skip_flair) {
             GFFCOMPARE_FLAIR( ch_genome_fasta_with_index, ch_flair_collapsed_gtf, ch_annotation_gtf, 'flair' )
             ch_flair_gffcompare_stats = GFFCOMPARE_FLAIR.out.gffcompare_stats.collect{it[1]}.flatten()
             ch_multiqc_files = ch_multiqc_files.mix(ch_flair_gffcompare_stats.ifEmpty([]))
         }
-        if (!params.skip_bambu) {
-            GFFCOMPARE_BAMBU( ch_genome_fasta_with_index, ch_bambu_gtf, ch_annotation_gtf, 'bambu' )
-            ch_bambu_gffcompare_stats = GFFCOMPARE_BAMBU.out.gffcompare_stats.collect{it[1]}.flatten()
-            ch_multiqc_files = ch_multiqc_files.mix(ch_bambu_gffcompare_stats.ifEmpty([]))
+        // TODO - this version of bambu currently only outputs the "extended annotation", which is the reference annotation + the detected transcripts in the sample, so there's no point to doing gffcompare as sensitivity and accuracy are 100%
+        //if (!params.skip_bambu) {
+        //    GFFCOMPARE_BAMBU( ch_genome_fasta_with_index, ch_bambu_extended_gtf, ch_annotation_gtf, 'bambu' )
+        //    ch_bambu_gffcompare_stats = GFFCOMPARE_BAMBU.out.gffcompare_stats.collect{it[1]}.flatten()
+        //    ch_multiqc_files = ch_multiqc_files.mix(ch_bambu_gffcompare_stats.ifEmpty([]))
+        //}
+        if (!params.skip_isoquant) {
+            GFFCOMPARE_ISOQUANT(ch_genome_fasta_with_index, ch_isoquant_gtf, ch_annotation_gtf, 'isoquant' )
+            ch_isoquant_gffcompare_stats = GFFCOMPARE_ISOQUANT.out.gffcompare_stats.collect{it[1]}.flatten()
+            ch_multiqc_files = ch_multiqc_files.mix(ch_isoquant_gffcompare_stats.ifEmpty([]))
         }
-        //  if (!params.skip_isoquant) {
-        //      GFFCOMPARE_ISOQUANT( ch_isoquant_gtf, ch_annotation_gtf, ch_genome_fasta_with_index, 'isoquant' )
-        //     ch_multiqc_files = ch_multiqc_files.mix(GFFCOMPARE_ISOQUANT.out.gffcompare_stats.ifEmpty([]))
-        // }
         if (!params.skip_stringtie) {
             GFFCOMPARE_STRINGTIE( ch_genome_fasta_with_index, ch_stringtie_gtf, ch_annotation_gtf, 'stringtie' )
             ch_stringtie_gffcompare_stats = GFFCOMPARE_STRINGTIE.out.gffcompare_stats.collect{it[1]}.flatten()
             ch_multiqc_files = ch_multiqc_files.mix(ch_stringtie_gffcompare_stats.ifEmpty([]))
         }
     }
 
+    ch_multiqc_files.view()
+
 /*
+    // TODO
     // if (!skip_sqanti_all) {
         if (!skip_sqanti_qc) {
             if (run_flair){
@@ -468,9 +472,9 @@ workflow DIRECTRNA{
             if (!params.skip_flair) {
                 OARFISH_FLAIR( ch_flair_collapsed_fa, ch_transcriptome_minimap2_index, ch_sequencing_type, 'flair' )
             }
-            if (!params.skip_bambu) {
-                OARFISH_BAMBU( ch_bambu_transcripts, ch_transcriptome_minimap2_index, ch_sequencing_type, 'bambu' )
-            }
+            //if (!params.skip_bambu) {
+            //    OARFISH_BAMBU( ch_bambu_transcripts, ch_transcriptome_minimap2_index, ch_sequencing_type, 'bambu' )
+            //}
             if (!params.skip_isoquant) {
                 OARFISH_ISOQUANT( ch_isoquant_transcripts, ch_transcriptome_minimap2_index, ch_sequencing_type, 'isoquant' )
             }