Skip to content

Commit e87659a

Browse files
committed
removing all references to sequencing summary (samplesheet input)
1 parent 7289a08 commit e87659a

File tree

8 files changed

+23
-39
lines changed

8 files changed

+23
-39
lines changed

README.md

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,3 @@
1-
![cover image](docs/images/cover_image.jpg)
2-
31
[![nf-core CI](https://github.com/number-25/LongTranscriptomics/actions/workflows/ci.yml/badge.svg?branch=dev)](https://github.com/number-25/LongTranscriptomics/actions/workflows/ci.yml)
42
[![nf-core linting comment](https://github.com/number-25/LongTranscriptomics/actions/workflows/linting_comment.yml/badge.svg)](https://github.com/number-25/LongTranscriptomics/actions/workflows/linting_comment.yml)
53
[![GitHub Actions Linting Status](https://github.com/number-25/LongTranscriptomics/actions/workflows/linting.yml/badge.svg)](https://github.com/number-25/LongTranscriptomics/actions/workflows/linting.yml)
@@ -41,9 +39,8 @@ reads in BAM format. These are provided to the samplesheet as input.
4139
3. [`IsoQuant`](https://ablab.github.io/IsoQuant/) - allows read correction
4240
4. [`StringTie`](https://github.com/skovaka/stringtie2)
4341
<!-- 7. Fusion gene detection [`JAFFA`](github.com/Oshlack/JAFFA) -->
44-
7. Transcriptome assessment [`gffcompare`](https://ccb.jhu.edu/software/stringtie/gff.shtml)
45-
8. Transcript quantification
46-
1. [`oarfish`](https://github.com/COMBINE-lab/oarfish)
42+
7. Transcriptome assessment ( [`gffcompare`](https://ccb.jhu.edu/software/stringtie/gff.shtml) )
43+
8. Transcript quantification ( [`oarfish`](https://github.com/COMBINE-lab/oarfish) )
4744
<!-- ( [`TranSigner`](https://github.com/haydenji0731/TranSigner),
4845
Small test datasets for the pipeline are included in the [assets directory](https://github.com/number-25/LongTranscriptomics/assets/test_data). -->
4946

@@ -57,9 +54,9 @@ First, prepare a samplesheet with your input data that looks as follows:
5754
`samplesheet.csv`:
5855

5956
```csv
60-
sample,replicate,sequencing_summary_path,read_path
61-
CONTROL1,1,data/long_reads_sequencingsummary_1.txt,data/long_reads_1.fastq.gz
62-
CONTROL1,2,data/long_reads_sequencingsummary_2.txt,data/long_reads_2.fastq.gz
57+
sample,replicate,read_path
58+
CONTROL1,1,data/long_reads_1.fastq.gz
59+
CONTROL1,2,data/long_reads_2.fastq.gz
6360
```
6461

6562
Each row represents a fastq file. Replicate refers to a technical replicate, biological replicates should be named uniquely. Be sure to pay attention to sample naming, in

assets/samplesheet.csv

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
sample,replicate,sequencing_summary_path,read_path
2-
COLOBL_RNA_600,,/mnt/hdd1/MedGen_RNA_datasets/COLOBL_RNA_600/fastq_pass/sequencing_summary_PAW55908_25243b4b_de7ed5dd.txt,/mnt/hdd1/MedGen_RNA_datasets/COLOBL_RNA_600/fastq_pass/COLOBL_RNA_660.fastq.gz
1+
sample,replicate,read_path
2+
COLOBL_RNA_600,,/mnt/hdd1/MedGen_RNA_datasets/COLOBL_RNA_600/fastq_pass/COLOBL_RNA_660.fastq.gz

assets/schema_input.json

Lines changed: 1 addition & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,6 @@
1919
"errorMessage": "Replicate must be an integer",
2020
"meta": ["replicate"]
2121
},
22-
"sequencing_summary": {
23-
"type": "string",
24-
"format": "file-path",
25-
"exists": true,
26-
"pattern": "^\S+.txt$",
27-
"errorMessage": "Sequencing summary file cannot contain spaces and must have extension '.txt'"
28-
},
2922
"fastq": {
3023
"type": "string",
3124
"format": "file-path",
@@ -34,6 +27,6 @@
3427
"errorMessage": "FastQ file must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
3528
}
3629
},
37-
"required": ["sample", "sequencing_summary", "fastq"]
30+
"required": ["sample", "fastq"]
3831
}
3932
}

bin/check_samplecsv.jl

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -35,13 +35,13 @@ header = input_samplesheet[1]
3535
split_header = split(header, ',')
3636

3737
## length
38-
length(split_header) == 4 || throw("The header is the incorrect size, check how many columns you have provided (4 is required")
38+
length(split_header) == 3 || throw("The header is the incorrect size, check how many columns you have provided (3 is required")
3939
println("sample sheet has the correct number of columns")
4040

4141
## check that the header names are correct
42-
header_names = ("sample", "replicate", "sequencing_summary_path", "read_path")
42+
header_names = ("sample", "replicate"", "read_path")
4343
for colname in header_names
44-
colname split_header || throw("column names are incorrectly spelled, ensure that they are sample,replicate,sequencing_summary_path,readpath")
44+
colname ∈ split_header || throw("column names are incorrectly spelled, ensure that they are sample,replicate,readpath")
4545
end
4646
println("the header names are spelled correctly")
4747
@@ -50,9 +50,9 @@ samplesheet_body = input_samplesheet[2:end]
5050
for row in samplesheet_body
5151
rownumber = 1
5252
split_row = split(row, ',')
53-
length(split(row, ',')) == 4 || throw("row number $(rownumber) has the incorrect number of columns, please check formatting")
53+
length(split(row, ',')) == 3 || throw("row number $(rownumber) has the incorrect number of columns, please check formatting")
5454
# check to see if sample name is a single string and not spaced
55-
first_column, second_column, third_column, fourth_column = split_row[1:end]
55+
first_column, second_column, third_column = split_row[1:end]
5656
!occursin(' ', first_column) || throw("sample name is separated by a space, please format it so that it is one continuous string")
5757
# check to see if the replicate is an interger (if provided))
5858
if !isempty(second_column)
@@ -62,12 +62,8 @@ for row in samplesheet_body
6262
throw("The replicate is not an integer, please change it to one e.g 1, 2")
6363
end
6464
end
65-
# check to see if sequencing summary exists and isn't empty
66-
path_to_summary = nextflow_path * '/' * third_column
67-
ispath(path_to_summary) || throw("sequencing summary file doesn't exist, or the path pointing to it is incorrect")
68-
# is it empty?
6965
# check to see if the reads path points to a valid path or a valid file
70-
path_to_reads = nextflow_path * '/' * fourth_column
66+
path_to_reads = nextflow_path * '/' * third
7167
ispath(path_to_reads) || isfile(path_to_reads) || throw("the path to the reads either doesn't exist, or the path pointing to a specific fastq file doesn't exist, please check paths")
7268
if ispath(path_to_reads) && !isfile(path_to_reads)
7369
!isempty(readdir(glob"*.fq", path_to_reads)) ||

docs/usage.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,16 +19,15 @@ The pipeline will auto-detect whether the sequencing summary files, and reads ar
1919
A final samplesheet file consisting of long-read data may look something like the one below. This is for **one biological** sample which has been sequenced twice, giving two technical replicates.
2020

2121
```csv title="samplesheet.csv"
22-
sample,replicate,sequencing_summary_path,read_path
23-
CONTROL1,1,data/long_reads_sequencingsummary_1.txt,data/long_reads_1.fastq.gz
24-
CONTROL1,2,data/long_reads_sequencingsummary_2.txt,data/long_reads_2.fastq.gz
22+
sample,replicate,read_path
23+
CONTROL1,1,data/long_reads_1.fastq.gz
24+
CONTROL1,2,data/long_reads_2.fastq.gz
2525
```
2626

2727
| Column | Description |
2828
| ------------------------- | ------------------------------------------------------------------------ |
2929
| `sample` | Sample name. |
3030
| `replicate` | Technical replicate number |
31-
| `sequencing_summary_path` | Full path to nanopore sequencing summary file (usually a .txt file).gz". |
3231
| `read_path` | Full path to fastq reads. |
3332

3433
An [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline.

subworkflows/local/input_check/main.nf

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ workflow INPUT_CHECK {
2323
ch_sample.view() */
2424

2525
emit:
26-
ch_sample // [ sample, replicate, sequencing_summary_file, path_to_reads ]
26+
ch_sample // [ sample, replicate, path_to_reads ]
2727
//ch_versions = ch_versions.mix(CHECK_SAMPLESHEET.out.versions.first())
2828
}
2929

@@ -32,8 +32,7 @@ def get_sample_info(LinkedHashMap row) {
3232
// create meta map
3333
def meta = [:]
3434
meta.id = row.sample
35-
meta.replicate = row.replicate
36-
meta.sequencing_summary = row.sequencing_summary_path
35+
meta.replicate = row.replicate
3736
//meta.fastq = row.read_path
3837

3938
// add path(s) of the fastq file to the meta map
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
sample,replicate,sequencing_summary_path,read_path
2-
KCMF1,1,assets/test_data/dummy_sequencing_summary.txt,assets/test_data/sample_nobc_dx.fastq.gz
1+
sample,replicate,read_path
2+
KCMF1,1,assets/test_data/sample_nobc_dx.fastq.gz
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
sample,replicate,sequencing_summary_path,read_path
2-
COLOBL_RNA600,1,assets/fulltest_data/sequencing_summary_PAW55908_25243b4b_de7ed5dd.txt,assets/fulltest_data/COLOBL_RNA_660.fastq.gz
1+
sample,replicate,read_path
2+
COLOBL_RNA600,1,assets/fulltest_data/COLOBL_RNA_660.fastq.gz

0 commit comments

Comments
 (0)