more output docs, fixing up multiqc(added nanoq and flagstats outputs), some small renaming

number-25 · number-25 · commit 2b3442ee983a · 2025-10-28T16:20:00.000+10:00
diff --git a/README.md b/README.md
@@ -12,8 +12,8 @@
 
 ## Introduction
 
-**rich_longTranscriptomics** is a nextflow pipeline that is used for the processing of direct RNA nanopore sequencing data, providing multiple transcript reconstruction, and quantification
-options with the use of a reference genome, and transcriptome annotation. Additionally, it performs post transcriptome reconstruction assessment, and recovery.
+**rich_longTranscriptomics** is a nextflow pipeline that is used for the processing of direct RNA nanopore sequencing data, providing multiple transcript reconstruction options, and quantification with the use of a reference genome, and transcriptome annotation.
+<!-- Additionally, it performs post transcriptome reconstruction assessment, and recovery. -->
 
 The pipeline currently _only_ accepts sequencing data from directRNA Oxford
 Nanopore Technologies (ONT) libraries. It is recommended to provide raw FASTQ
@@ -33,14 +33,14 @@ reads in BAM format. These are provided to the samplesheet as input.
    3. [`alfred`](https://www.gear-genomics.com/docs/alfred/)
    4. [`ngs-bits`](https://github.com/imgag/ngs-bits/tree/master)
 6. Multiple transcriptome reconstruction options, with read correction options.
-   1. [`FLAIR`](github.com/BrooksLabUCSC/flair) - allows read correction
-   2. [`bambu`](github.com/GoekeLab/bambu) - very minor read correction
+   1. [`FLAIR`](https://github.com/BrooksLabUCSC/flair) - allows read correction
+   2. [`bambu`](http://github.com/GoekeLab/bambu) - very minor read correction
    3. [`IsoQuant`](https://ablab.github.io/IsoQuant/) - allows read correction
    4. [`StringTie`](https://github.com/skovaka/stringtie2)
    <!-- 7. Fusion gene detection [`JAFFA`](github.com/Oshlack/JAFFA) -->
 7. Transcriptome assessment [`gffcompare`](https://ccb.jhu.edu/software/stringtie/gff.shtml)
 8. Transcript quantification
-   1. [oarfish](https://github.com/COMBINE-lab/oarfish) )
+   1. [`oarfish`](https://github.com/COMBINE-lab/oarfish) )
      <!-- ( [`TranSigner`](https://github.com/haydenji0731/TranSigner),
    Small test datasets for the pipeline are included in the [assets directory](https://github.com/number-25/LongTranscriptomics/assets/test_data). -->
 
diff --git a/docs/output.md b/docs/output.md
@@ -46,8 +46,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
 
 - `fastq_qc/nanoq/`
   - `*_nanoq.json`: `json` formatted file containing quality metrics.
-  - `*_nanoq.stats`: basic NANOQ report containing quality metrics.
-  - `*_nanoq_stats.verbose`: verbose NANOQ report containing quality metrics.
+  - `*_nanoq_stats.txt`: basic NANOQ report containing quality metrics.
+  - `*_nanoq_stats_verbose.txt`: verbose NANOQ report containing quality metrics.
 
 </details>
 
@@ -106,8 +106,8 @@ Top ranking read lengths (bp)
 
 [SEQUALI](https://github.com/rhpvorderman/sequali) provides general quality statistics
 about the sequence reads, along with several other features including,
-overrepresentation analysis and duplication rate estimation. It outputs the
-statistics in both ??
+over-representation analysis and duplication rate estimation. It outputs the
+statistics in both JSON and HTML format.
 
 ## Reference genome mapping
 
@@ -165,7 +165,7 @@ this file.
 
 [bedtools](https://github.com/arq5x/bedtools2) is a multipurpose
 toolkit for working with tab separated genomic formats such as GTF/GFF/BED, but
-also SAM/BAM/CRAM files. Here it is used convert the mapped BAM file to BEDGRAPH
+also SAM/BAM/CRAM files. Here it is used to convert the mapped BAM file to BEDGRAPH
 format, in preparation for conversion to BigWig.
 
 ### bedGraphToBigWig
@@ -194,7 +194,7 @@ in a lightweight way.
 <summary>Output files</summary>
 
 - `mapping_qc/samtools_flagstat/`
-  - `*.flagstat.tsv`: the output of samtools flagstat in tsv format.
+  - `*.flagstat.txt`: the output of samtools flagstat in txt format.
   </details>
 
 [samtools](http://www.htslib.org/doc/#manual-pages) flagstats provides summary
@@ -211,7 +211,7 @@ alignments for each FLAG type.
 
 </details>
 
-[cramino](https://github.com/wdecoster/cramino) is a tool for quick quality assessment of cram and bam files, intended for long read sequencing.
+[cramino](https://github.com/wdecoster/cramino) is a tool for quick quality assessment of cram and bam files, intended for long read sequencing. It will output the statistics in a simple text file which is human readable.
 
 ```
 File name       example.cram
@@ -239,14 +239,16 @@ Creation time   09/09/2022 10:53:36
 </details>
 
 [alfred](https://www.gear-genomics.com/docs/alfred/cli/) computes various
-alignment metrics and summary statistics by read group.
+alignment metrics and summary statistics by read group. The transposed output is a transformation of the alignment metrics from column format to row format for readability. TSV output is gzipped by default.
 
 ### ngs-bits
 
 <details markdown="1">
 <summary>Output files</summary>
 
-- `multiqc/`
+#### TODO
+
+- `mapping_qc/ngf-bits/`
   - `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
   - `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
   - `multiqc_plots/`: directory containing static images from the report in various formats.
@@ -305,7 +307,6 @@ instead be putative variants which should not be corrected.
 - `multiqc/`
   - `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
   - `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
-  - `multiqc_plots/`: directory containing static images from the report in various formats.
 
 </details>
 
@@ -323,10 +324,9 @@ grouping. IsoQuant, like FLAIR, provides optional read correction capabilities,
 <details markdown="1">
 <summary>Output files</summary>
 
-- `multiqc/`
-  - `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
-  - `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
-  - `multiqc_plots/`: directory containing static images from the report in various formats.
+- `transcript_reconstruction/stringtie`
+  - `KCMF1.1.stringtie.coverage.gtf`: a standalone HTML file that can be viewed in your web browser.
+  - `KCMF1.1.stringtie.transcripts.gtf`: directory containing parsed statistics from the different tools used in the pipeline.
 
 </details>
 
diff --git a/modules/local/gffread/gffread/main.nf b/modules/local/gffread/gffread/main.nf
@@ -1,5 +1,4 @@
 process GFFREAD_GETFASTA {
-    tag "$fasta"
     label 'process_single'
     conda "${moduleDir}/environment.yml"
     container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
@@ -11,6 +10,8 @@ process GFFREAD_GETFASTA {
     tuple path(genome_fasta), path(genome_fasta_index)
     val origin
 
+    //tag "$fasta"
+
     output:
     tuple val(meta), path("*.fa"),  emit: transcripts_fa
     path "versions.yml",            emit: versions
diff --git a/modules/local/gffread/gffread/nextflow.config b/modules/local/gffread/gffread/nextflow.config
@@ -1,7 +1,7 @@
 process {
     withName: 'GFFREAD_GETFASTA' {
         publishDir = [
-        path: { "${params.outdir}/transcript_reconstruction/gtf_transcripts/${meta.id}_${meta.replicate}" },
+        path: { "${params.outdir}/transcript_reconstruction/transcripts_fasta/${meta.id}_${meta.replicate}" },
         //mode: params.publish_dir_mode
         ]
     }
diff --git a/modules/local/nanoq/nanoq/main.nf b/modules/local/nanoq/nanoq/main.nf
@@ -12,8 +12,8 @@ process NANOQ {
     //val(output_format) //One of the following: fastq, fastq.gz, fastq.bz2, fastq.lzma, fasta, fasta.gz, fasta.bz2, fasta.lzma.
 
     output:
-    tuple val(meta), path("*.stats")            , emit: stats
-    tuple val(meta), path("*_stats.verbose")    , emit: verbose_stats
+    tuple val(meta), path("*_stats.txt")            , emit: stats
+    tuple val(meta), path("*_stats_verbose.txt")    , emit: verbose_stats
     tuple val(meta), path("*.json")             , emit: json_stats
     //tuple val(meta), path("*_filtered.${output_format}")                              , emit: reads
     path "versions.yml"                         , emit: versions
@@ -29,16 +29,16 @@ process NANOQ {
     //else
     """
     nanoq \\
-        -H \\
         -s \\
+        -v \\
         -i ${fastq} \\
-        > ${prefix}.stats
+        > ${prefix}_stats.txt
 
     nanoq \\
         -s \\
         -vvv \\
         -i ${fastq} \\
-        > ${prefix}_stats.verbose
+        > ${prefix}_stats_verbose.txt
 
     nanoq \\
         -s \\
@@ -56,8 +56,8 @@ process NANOQ {
     def args    = task.ext.args ?: ''
     def prefix  = task.ext.prefix ?: "${meta.id}_${meta.replicate}_nanoq"
     """
-    touch ${prefix}.stats
-    touch ${prefix}_stats.verbose
+    touch ${prefix}_stats.txt
+    touch ${prefix}_stats_verbose.txt
     touch ${prefix}.json
 
     cat <<-END_VERSIONS > versions.yml
diff --git a/modules/local/samtools/flagstat/main.nf b/modules/local/samtools/flagstat/main.nf
@@ -12,7 +12,7 @@ process SAMTOOLS_FLAGSTAT {
     //tuple val(meta2), path(fasta)
 
     output:
-    tuple val(meta), path("*.tsv"),     emit: flagstat
+    tuple val(meta), path("*.txt"),     emit: flagstat
     //tuple val(meta), path("*.cram"),    emit: cram, optional: true
     //tuple val(meta), path("*.crai"),    emit: crai, optional: true
     //tuple val(meta), path("*.csi"),     emit: csi,  optional: true
@@ -24,7 +24,7 @@ process SAMTOOLS_FLAGSTAT {
     script:
     def args      = task.ext.args ?: ''
     def prefix    = task.ext.prefix ?: "${meta.id}_${meta.replicate}.flagstat"
-    def extension = task.ext.extension ?: "tsv"
+    def extension = task.ext.extension ?: "txt"
     /*def extension = args.contains("--output-fmt sam") ? "sam" :
                     args.contains("--output-fmt cram") ? "cram" :
                     "bam"
@@ -37,7 +37,7 @@ process SAMTOOLS_FLAGSTAT {
         -@ $task.cpus \\
         -O ${extension} \\
         ${bam} \\
-        > ${prefix}.tsv \\
+        > ${prefix}.txt \\
 
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
@@ -48,9 +48,9 @@ process SAMTOOLS_FLAGSTAT {
     stub:
     def args      = task.ext.args ?: ''
     def prefix    = task.ext.prefix ?: "${meta.id}_${meta.replicate}.flagstat"
-    def extension = task.ext.extension ?: "tsv"
+    def extension = task.ext.extension ?: "txt"
     """
-    touch ${prefix}.tsv
+    touch ${prefix}.txt
     cat <<-END_VERSIONS > versions.yml
     "${task.process}":
         samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')

Original file line number	Diff line number	Diff line change
`@@ -1,7 +1,7 @@`
`1`	`1`	`process {`
`2`	`2`	`withName: 'GFFREAD_GETFASTA' {`
`3`	`3`	`publishDir = [`
`4`		`- path: { "${params.outdir}/transcript_reconstruction/gtf_transcripts/${meta.id}_${meta.replicate}" },`
	`4`	`+ path: { "${params.outdir}/transcript_reconstruction/transcripts_fasta/${meta.id}_${meta.replicate}" },`
`5`	`5`	`//mode: params.publish_dir_mode`
`6`	`6`	`]`
`7`	`7`	`}`