|
| 1 | +### Introduction |
| 2 | +The sub-workflow runs all steps required to generate an [athena](https://github.com/msk-access/athena) coverage report: |
| 3 | + |
| 4 | +1. Annotate BED file. |
| 5 | +2. Generating Coverage statistics |
| 6 | +3. Generating Coverage Report |
| 7 | + |
| 8 | + |
| 9 | +## CWL athena_report.cwl |
| 10 | + |
| 11 | +- CWL specification 1.2 |
| 12 | +- Use example_inputs.yaml to see the inputs to the cwl |
| 13 | +- Example Command using [toil](https://toil.readthedocs.io): |
| 14 | + |
| 15 | +```bash |
| 16 | + > cwltool athena_report/athena_report.cwl athena_report/example_inputs_juno.yaml |
| 17 | +``` |
| 18 | +**If at MSK, using the JUNO cluster having installed toil-msk version 3.21.1 you can use the following command** |
| 19 | + |
| 20 | +### Using toil-cwl-runner |
| 21 | + |
| 22 | +```bash |
| 23 | +#Using CWLTOOL |
| 24 | +> toil-cwl-runner --singularity athena_report/athena_report.cwl athena_report/example_inputs_juno.yaml |
| 25 | +``` |
| 26 | + |
| 27 | + |
| 28 | +### Usage: |
| 29 | + |
| 30 | +``` |
| 31 | +Usage: qc_collapsed_bam.cwl [OPTIONS] |
| 32 | +
|
| 33 | + This tool runs all steps associated with generating an athena coverage report. |
| 34 | +
|
| 35 | +Options: |
| 36 | +
|
| 37 | +-p / --panel_bed : Input panel bed file; must have ONLY the following 4 columns chromosome, start position, end position, gene/transcript |
| 38 | +
|
| 39 | +-t / --transcript_file : Transcript annotation file, contains required gene and exon information. Must have ONLY the following 6 columns: |
| 40 | +chromosome, start, end, gene, transcript, exon |
| 41 | +
|
| 42 | +-c / --coverage_file : Per base coverage file (output from mosdepth or similar) |
| 43 | +
|
| 44 | +-s / -chunk_size : (optional) nrows to split per-base coverage file for intersecting, with <16GB RAM can lead to bedtools intersect failing. Reccomended values: 16GB RAM -> 20000000; 8GB RAM -> 10000000 |
| 45 | +
|
| 46 | +-n / --output_name : (optional) Prefix for naming output file, if not given will use name from per base coverage file |
| 47 | +
|
| 48 | +--file: annotated bed file on which to generate report from |
| 49 | +
|
| 50 | +--build: text file with build number used for alignment, output from mosdepth (optional) |
| 51 | +
|
| 52 | +--outfile: output file name prefix, if not given the input file name will be used as the name prefix |
| 53 | +
|
| 54 | +--thresholds: threshold values to calculate coverage for as comma seperated integers (default: 10, 20, 30, 50, 100) |
| 55 | +
|
| 56 | +--flagstat: flagstat file for sample, required for generating run statistics (in development) |
| 57 | +
|
| 58 | +--cores: Number of CPU cores to utilise, for larger numbers of genes this will drastically reduce run time. If not given will use maximum available |
| 59 | +
|
| 60 | +-s / --snps: VCF(s) of known SNPs to check coverage of (optional; i.e. HGMD, ClinVar) |
| 61 | +
|
| 62 | +-t / --threshold: threshold value defining sub-optimal coverage (optional; default if not given: 20) |
| 63 | +
|
| 64 | +-n / --sample_name: name for title of report (optional; gene_stats file name will be used if not given) |
| 65 | +
|
| 66 | +-o / --output: name for output report (optional; sample name will be used if not given) |
| 67 | +
|
| 68 | +-p / --panel: panel bed file used for initial annotation, name will be displayed in summary of report (optional) |
| 69 | +
|
| 70 | +-l / --limit: number of genes at which to limit including full gene plots, large numbers of genes may take a long time to generate the plots (optional) |
| 71 | +
|
| 72 | +-m / --summary: boolean flag to add clinical report summary text in summary section, includes list of all genes with transcripts (optional; default False) |
| 73 | +
|
| 74 | +``` |
0 commit comments