Skip to content

Commit 2b3442e

Browse files
committed
more output docs, fixing up multiqc(added nanoq and flagstats outputs), some small renaming
1 parent c3b3c02 commit 2b3442e

6 files changed

Lines changed: 34 additions & 33 deletions

File tree

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@
1212

1313
## Introduction
1414

15-
**rich_longTranscriptomics** is a nextflow pipeline that is used for the processing of direct RNA nanopore sequencing data, providing multiple transcript reconstruction, and quantification
16-
options with the use of a reference genome, and transcriptome annotation. Additionally, it performs post transcriptome reconstruction assessment, and recovery.
15+
**rich_longTranscriptomics** is a nextflow pipeline that is used for the processing of direct RNA nanopore sequencing data, providing multiple transcript reconstruction options, and quantification with the use of a reference genome, and transcriptome annotation.
16+
<!-- Additionally, it performs post transcriptome reconstruction assessment, and recovery. -->
1717

1818
The pipeline currently _only_ accepts sequencing data from directRNA Oxford
1919
Nanopore Technologies (ONT) libraries. It is recommended to provide raw FASTQ
@@ -33,14 +33,14 @@ reads in BAM format. These are provided to the samplesheet as input.
3333
3. [`alfred`](https://www.gear-genomics.com/docs/alfred/)
3434
4. [`ngs-bits`](https://github.com/imgag/ngs-bits/tree/master)
3535
6. Multiple transcriptome reconstruction options, with read correction options.
36-
1. [`FLAIR`](github.com/BrooksLabUCSC/flair) - allows read correction
37-
2. [`bambu`](github.com/GoekeLab/bambu) - very minor read correction
36+
1. [`FLAIR`](https://github.com/BrooksLabUCSC/flair) - allows read correction
37+
2. [`bambu`](http://github.com/GoekeLab/bambu) - very minor read correction
3838
3. [`IsoQuant`](https://ablab.github.io/IsoQuant/) - allows read correction
3939
4. [`StringTie`](https://github.com/skovaka/stringtie2)
4040
<!-- 7. Fusion gene detection [`JAFFA`](github.com/Oshlack/JAFFA) -->
4141
7. Transcriptome assessment [`gffcompare`](https://ccb.jhu.edu/software/stringtie/gff.shtml)
4242
8. Transcript quantification
43-
1. [oarfish](https://github.com/COMBINE-lab/oarfish) )
43+
1. [`oarfish`](https://github.com/COMBINE-lab/oarfish) )
4444
<!-- ( [`TranSigner`](https://github.com/haydenji0731/TranSigner),
4545
Small test datasets for the pipeline are included in the [assets directory](https://github.com/number-25/LongTranscriptomics/assets/test_data). -->
4646

docs/output.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -46,8 +46,8 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
4646

4747
- `fastq_qc/nanoq/`
4848
- `*_nanoq.json`: `json` formatted file containing quality metrics.
49-
- `*_nanoq.stats`: basic NANOQ report containing quality metrics.
50-
- `*_nanoq_stats.verbose`: verbose NANOQ report containing quality metrics.
49+
- `*_nanoq_stats.txt`: basic NANOQ report containing quality metrics.
50+
- `*_nanoq_stats_verbose.txt`: verbose NANOQ report containing quality metrics.
5151

5252
</details>
5353

@@ -106,8 +106,8 @@ Top ranking read lengths (bp)
106106

107107
[SEQUALI](https://github.com/rhpvorderman/sequali) provides general quality statistics
108108
about the sequence reads, along with several other features including,
109-
overrepresentation analysis and duplication rate estimation. It outputs the
110-
statistics in both ??
109+
over-representation analysis and duplication rate estimation. It outputs the
110+
statistics in both JSON and HTML format.
111111

112112
## Reference genome mapping
113113

@@ -165,7 +165,7 @@ this file.
165165

166166
[bedtools](https://github.com/arq5x/bedtools2) is a multipurpose
167167
toolkit for working with tab separated genomic formats such as GTF/GFF/BED, but
168-
also SAM/BAM/CRAM files. Here it is used convert the mapped BAM file to BEDGRAPH
168+
also SAM/BAM/CRAM files. Here it is used to convert the mapped BAM file to BEDGRAPH
169169
format, in preparation for conversion to BigWig.
170170

171171
### bedGraphToBigWig
@@ -194,7 +194,7 @@ in a lightweight way.
194194
<summary>Output files</summary>
195195

196196
- `mapping_qc/samtools_flagstat/`
197-
- `*.flagstat.tsv`: the output of samtools flagstat in tsv format.
197+
- `*.flagstat.txt`: the output of samtools flagstat in txt format.
198198
</details>
199199

200200
[samtools](http://www.htslib.org/doc/#manual-pages) flagstats provides summary
@@ -211,7 +211,7 @@ alignments for each FLAG type.
211211

212212
</details>
213213

214-
[cramino](https://github.com/wdecoster/cramino) is a tool for quick quality assessment of cram and bam files, intended for long read sequencing.
214+
[cramino](https://github.com/wdecoster/cramino) is a tool for quick quality assessment of cram and bam files, intended for long read sequencing. It will output the statistics in a simple text file which is human readable.
215215

216216
```
217217
File name example.cram
@@ -239,14 +239,16 @@ Creation time 09/09/2022 10:53:36
239239
</details>
240240

241241
[alfred](https://www.gear-genomics.com/docs/alfred/cli/) computes various
242-
alignment metrics and summary statistics by read group.
242+
alignment metrics and summary statistics by read group. The transposed output is a transformation of the alignment metrics from column format to row format for readability. TSV output is gzipped by default.
243243

244244
### ngs-bits
245245

246246
<details markdown="1">
247247
<summary>Output files</summary>
248248

249-
- `multiqc/`
249+
#### TODO
250+
251+
- `mapping_qc/ngf-bits/`
250252
- `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
251253
- `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
252254
- `multiqc_plots/`: directory containing static images from the report in various formats.
@@ -305,7 +307,6 @@ instead be putative variants which should not be corrected.
305307
- `multiqc/`
306308
- `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
307309
- `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
308-
- `multiqc_plots/`: directory containing static images from the report in various formats.
309310

310311
</details>
311312

@@ -323,10 +324,9 @@ grouping. IsoQuant, like FLAIR, provides optional read correction capabilities,
323324
<details markdown="1">
324325
<summary>Output files</summary>
325326

326-
- `multiqc/`
327-
- `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser.
328-
- `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline.
329-
- `multiqc_plots/`: directory containing static images from the report in various formats.
327+
- `transcript_reconstruction/stringtie`
328+
- `KCMF1.1.stringtie.coverage.gtf`: a standalone HTML file that can be viewed in your web browser.
329+
- `KCMF1.1.stringtie.transcripts.gtf`: directory containing parsed statistics from the different tools used in the pipeline.
330330

331331
</details>
332332

modules/local/gffread/gffread/main.nf

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,4 @@
11
process GFFREAD_GETFASTA {
2-
tag "$fasta"
32
label 'process_single'
43
conda "${moduleDir}/environment.yml"
54
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
@@ -11,6 +10,8 @@ process GFFREAD_GETFASTA {
1110
tuple path(genome_fasta), path(genome_fasta_index)
1211
val origin
1312

13+
//tag "$fasta"
14+
1415
output:
1516
tuple val(meta), path("*.fa"), emit: transcripts_fa
1617
path "versions.yml", emit: versions

modules/local/gffread/gffread/nextflow.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
process {
22
withName: 'GFFREAD_GETFASTA' {
33
publishDir = [
4-
path: { "${params.outdir}/transcript_reconstruction/gtf_transcripts/${meta.id}_${meta.replicate}" },
4+
path: { "${params.outdir}/transcript_reconstruction/transcripts_fasta/${meta.id}_${meta.replicate}" },
55
//mode: params.publish_dir_mode
66
]
77
}

modules/local/nanoq/nanoq/main.nf

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ process NANOQ {
1212
//val(output_format) //One of the following: fastq, fastq.gz, fastq.bz2, fastq.lzma, fasta, fasta.gz, fasta.bz2, fasta.lzma.
1313

1414
output:
15-
tuple val(meta), path("*.stats") , emit: stats
16-
tuple val(meta), path("*_stats.verbose") , emit: verbose_stats
15+
tuple val(meta), path("*_stats.txt") , emit: stats
16+
tuple val(meta), path("*_stats_verbose.txt") , emit: verbose_stats
1717
tuple val(meta), path("*.json") , emit: json_stats
1818
//tuple val(meta), path("*_filtered.${output_format}") , emit: reads
1919
path "versions.yml" , emit: versions
@@ -29,16 +29,16 @@ process NANOQ {
2929
//else
3030
"""
3131
nanoq \\
32-
-H \\
3332
-s \\
33+
-v \\
3434
-i ${fastq} \\
35-
> ${prefix}.stats
35+
> ${prefix}_stats.txt
3636
3737
nanoq \\
3838
-s \\
3939
-vvv \\
4040
-i ${fastq} \\
41-
> ${prefix}_stats.verbose
41+
> ${prefix}_stats_verbose.txt
4242
4343
nanoq \\
4444
-s \\
@@ -56,8 +56,8 @@ process NANOQ {
5656
def args = task.ext.args ?: ''
5757
def prefix = task.ext.prefix ?: "${meta.id}_${meta.replicate}_nanoq"
5858
"""
59-
touch ${prefix}.stats
60-
touch ${prefix}_stats.verbose
59+
touch ${prefix}_stats.txt
60+
touch ${prefix}_stats_verbose.txt
6161
touch ${prefix}.json
6262
6363
cat <<-END_VERSIONS > versions.yml

modules/local/samtools/flagstat/main.nf

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ process SAMTOOLS_FLAGSTAT {
1212
//tuple val(meta2), path(fasta)
1313

1414
output:
15-
tuple val(meta), path("*.tsv"), emit: flagstat
15+
tuple val(meta), path("*.txt"), emit: flagstat
1616
//tuple val(meta), path("*.cram"), emit: cram, optional: true
1717
//tuple val(meta), path("*.crai"), emit: crai, optional: true
1818
//tuple val(meta), path("*.csi"), emit: csi, optional: true
@@ -24,7 +24,7 @@ process SAMTOOLS_FLAGSTAT {
2424
script:
2525
def args = task.ext.args ?: ''
2626
def prefix = task.ext.prefix ?: "${meta.id}_${meta.replicate}.flagstat"
27-
def extension = task.ext.extension ?: "tsv"
27+
def extension = task.ext.extension ?: "txt"
2828
/*def extension = args.contains("--output-fmt sam") ? "sam" :
2929
args.contains("--output-fmt cram") ? "cram" :
3030
"bam"
@@ -37,7 +37,7 @@ process SAMTOOLS_FLAGSTAT {
3737
-@ $task.cpus \\
3838
-O ${extension} \\
3939
${bam} \\
40-
> ${prefix}.tsv \\
40+
> ${prefix}.txt \\
4141
4242
cat <<-END_VERSIONS > versions.yml
4343
"${task.process}":
@@ -48,9 +48,9 @@ process SAMTOOLS_FLAGSTAT {
4848
stub:
4949
def args = task.ext.args ?: ''
5050
def prefix = task.ext.prefix ?: "${meta.id}_${meta.replicate}.flagstat"
51-
def extension = task.ext.extension ?: "tsv"
51+
def extension = task.ext.extension ?: "txt"
5252
"""
53-
touch ${prefix}.tsv
53+
touch ${prefix}.txt
5454
cat <<-END_VERSIONS > versions.yml
5555
"${task.process}":
5656
samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')

0 commit comments

Comments
 (0)