|
4 | 4 |
|
5 | 5 | ## Introduction |
6 | 6 |
|
7 | | -<!-- nf-core: Add documentation about anything specific to running your pipeline. For general topics, please point to (and add to) the main nf-core website. --> |
8 | 7 |
|
9 | 8 | ## Samplesheet input |
10 | 9 |
|
11 | | -You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 3 columns, and a header row as shown in the examples below. |
| 10 | +You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. Use this parameter to specify its location. It has to be a comma-separated file with 4 columns, and a header row as shown in the examples below. It is best to specify this input parameter in `nextflow.config` or your own custom config file. |
12 | 11 |
|
13 | 12 | ```bash |
14 | 13 | --input '[path to samplesheet file]' |
15 | 14 | ``` |
16 | | - |
17 | | -### Multiple runs of the same sample |
18 | | - |
19 | | -The `sample` identifiers have to be the same when you have re-sequenced the same sample more than once e.g. to increase sequencing depth. The pipeline will concatenate the raw reads before performing any downstream analysis. Below is an example for the same sample sequenced across 3 lanes: |
20 | | - |
21 | | -```csv title="samplesheet.csv" |
22 | | -sample,fastq_1,fastq_2 |
23 | | -CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz |
24 | | -CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz |
25 | | -CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz |
26 | | -``` |
27 | | - |
28 | 15 | ### Full samplesheet |
29 | 16 |
|
30 | | -The pipeline will auto-detect whether a sample is single- or paired-end using the information provided in the samplesheet. The samplesheet can have as many columns as you desire, however, there is a strict requirement for the first 3 columns to match those defined in the table below. |
| 17 | +The pipeline will auto-detect whether the sequencing summary files, and reads are in the paths listed on the samplesheet. Each row represents a fastq file. Replicate refers to a technical replicate, biological replicates should be named uniquely. Be sure to pay attention to sample naming, in |
| 18 | +order to avoid duplication and file overwriting. |
31 | 19 |
|
32 | | -A final samplesheet file consisting of both single- and paired-end data may look something like the one below. This is for 6 samples, where `TREATMENT_REP3` has been sequenced twice. |
| 20 | +A final samplesheet file consisting of long-read data may look something like the one below. This is for **one biological** sample which has been sequenced twice, giving two technical replicates. |
33 | 21 |
|
34 | 22 | ```csv title="samplesheet.csv" |
35 | | -sample,fastq_1,fastq_2 |
36 | | -CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz |
37 | | -CONTROL_REP2,AEG588A2_S2_L002_R1_001.fastq.gz,AEG588A2_S2_L002_R2_001.fastq.gz |
38 | | -CONTROL_REP3,AEG588A3_S3_L002_R1_001.fastq.gz,AEG588A3_S3_L002_R2_001.fastq.gz |
39 | | -TREATMENT_REP1,AEG588A4_S4_L003_R1_001.fastq.gz, |
40 | | -TREATMENT_REP2,AEG588A5_S5_L003_R1_001.fastq.gz, |
41 | | -TREATMENT_REP3,AEG588A6_S6_L003_R1_001.fastq.gz, |
42 | | -TREATMENT_REP3,AEG588A6_S6_L004_R1_001.fastq.gz, |
| 23 | +sample,replicate,sequencing_summary_path,read_path |
| 24 | +CONTROL1,1,data/long_reads_sequencingsummary_1.txt,data/long_reads_1.fastq.gz |
| 25 | +CONTROL1,2,data/long_reads_sequencingsummary_2.txt,data/long_reads_2.fastq.gz |
43 | 26 | ``` |
44 | 27 |
|
45 | 28 | | Column | Description | |
|
0 commit comments