-
Notifications
You must be signed in to change notification settings - Fork 1
V2 #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: devel
Are you sure you want to change the base?
V2 #25
Changes from all commits
fec349f
14e50db
c8bcd5c
d337468
1fcb0ff
795ef4b
98a3c91
e3f9463
039c9c1
b9d61e4
3f30490
ebffe99
7ee17a0
6581f42
dd73cdb
5c508ad
3deefb5
ddcc243
a54cdb2
42dc60a
fe9784a
35753d1
2f57156
ad28090
eeef6ce
e11574d
0f6bd90
6b63cab
41215ed
57f9876
d8c30c4
9b9ff39
fc28a35
f07db09
154b30f
b9eb7b4
dfd875f
8c3a757
ca8f5af
32d6c9f
348bd52
8466545
271fed2
e7a7763
b20469e
debed83
8c72147
4010340
801b424
bc41122
891ddb3
a443459
dec6328
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| subworkflows/prepare_input_visium_hd.nf | ||
| .claude/ |
This file was deleted.
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clare (human) comment: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another note: the sample sheet that Clare generated for review uses the example data in the repo, where it only contains 10x 5' v2 ONT reads. Both files are actually also identical, which means that Bambu is most likely going to generate the same read class files for both. Also, this makes it hard to validate if the pipeline still truly functions as intended when using other chemistries/technologies. Maybe an extra 1 to 2 samples can be sourced and included in the sample sheet to reflect the functionality? |
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| technology,fwd_primer_f,fwd_primer_r,rev_primer_f,rev_primer_r,TSO_f,TSO_r | ||
| 10x3v2,CTACACGACGCTCTTCCGATCT,AGATCGGAAGAGCGTCGTGTAG,CCCATGTACTCTGCGTTGATACCACTGCTT,AAGCAGTGGTATCAACGCAGAGTACATGGG,, | ||
| 10x3v3,CTACACGACGCTCTTCCGATCT,AGATCGGAAGAGCGTCGTGTAG,CCCCTCTGCGTTGATACCACTGCTT,AAGCAGTGGTATCAACGCAGAGGGG,, | ||
| 10x3v4,CTACACGACGCTCTTCCGATCT,AGATCGGAAGAGCGTCGTGTAG,CCCCTCTGCGTTGATACCACTGCTT,AAGCAGTGGTATCAACGCAGAGGGG,, | ||
| 10x5v2,CTACACGACGCTCTTCCGATCT,AGATCGGAAGAGCGTCGTGTAG,GTACTCTGCGTTGATACCACTGCTT,AAGCAGTGGTATCAACGCAGAGTAC,TTTCTTATATGGG,CCCATATAAGAAA | ||
| 10x5v3,CTACACGACGCTCTTCCGATCT,AGATCGGAAGAGCGTCGTGTAG,GTACTCTGCGTTGATACCACTGCTT,AAGCAGTGGTATCAACGCAGAGTAC,TTTCTTATATGGG,CCCATATAAGAAA | ||
| visium-v1,CTACACGACGCTCTTCCGATCT,AGATCGGAAGAGCGTCGTGTAG,CCCATGTACTCTGCGTTGATACCACTGCTT,AAGCAGTGGTATCAACGCAGAGTACATGGG,, | ||
| visium-v2,CTACACGACGCTCTTCCGATCT,AGATCGGAAGAGCGTCGTGTAG,CCCATGTACTCTGCGTTGATACCACTGCTT,AAGCAGTGGTATCAACGCAGAGTACATGGG,, | ||
| visium-v3,CTACACGACGCTCTTCCGATCT,AGATCGGAAGAGCGTCGTGTAG,CCCATGTACTCTGCGTTGATACCACTGCTT,AAGCAGTGGTATCAACGCAGAGTACATGGG,, | ||
| visium-v4,CTACACGACGCTCTTCCGATCT,AGATCGGAAGAGCGTCGTGTAG,CCCATGTACTCTGCGTTGATACCACTGCTT,AAGCAGTGGTATCAACGCAGAGTACATGGG,, | ||
| visium-v5,CTACACGACGCTCTTCCGATCT,AGATCGGAAGAGCGTCGTGTAG,CCCATGTACTCTGCGTTGATACCACTGCTT,AAGCAGTGGTATCAACGCAGAGTACATGGG,, |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| technology,barcode_path,spatial_coordinate_path | ||
| 10x3v2,737K-august-2016.txt, | ||
| 10x3v3,3M-february-2018_TRU.txt.gz, | ||
| 10x3v4,3M-3pgex-may-2023_TRU.txt.gz, | ||
| 10x5v2,737K-august-2016.txt, | ||
| 10x5v3,3M-5pgex-jan-2023.txt.gz, | ||
| visium-v1,visium-v1.txt,visium-v1_coordinates.txt | ||
| visium-v2,visium-v2.txt,visium-v2_coordinates.txt | ||
| visium-v3,visium-v3.txt,visium-v3_coordinates.txt | ||
| visium-v4,visium-v4.txt,visium-v4_coordinates.txt | ||
| visium-v5,visium-v5.txt,visium-v5_coordinates.txt |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| technology,left_flank,barcode,umi,right_flank | ||
| 10x3v2,CTACACGACGCTCTTCCGATCT,????????????????,??????????,TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT | ||
| 10x3v3,CTACACGACGCTCTTCCGATCT,????????????????,????????????,TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT | ||
| 10x3v4,CTACACGACGCTCTTCCGATCT,????????????????,????????????,TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT | ||
| 10x5v2,CTACACGACGCTCTTCCGATCT,????????????????,??????????,TTTCTTATATGGG | ||
| 10x5v3,CTACACGACGCTCTTCCGATCT,????????????????,????????????,TTTCTTATATGGG | ||
| visium-v1,CTACACGACGCTCTTCCGATCT,????????????????,????????????,TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT | ||
| visium-v2,CTACACGACGCTCTTCCGATCT,????????????????,????????????,TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT | ||
| visium-v3,CTACACGACGCTCTTCCGATCT,????????????????,????????????,TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT | ||
| visium-v4,CTACACGACGCTCTTCCGATCT,????????????????,????????????,TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT | ||
| visium-v5,CTACACGACGCTCTTCCGATCT,????????????????,????????????,TTTTTTTTTTTTTTTTTTTTTTTTTTTTTT |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,116 @@ | ||
| #!/usr/bin/env python3 | ||
| import argparse | ||
| import sys | ||
|
|
||
| DNA_COMPLEMENT = str.maketrans("ACGTNacgtn", "TGCANtgcan") | ||
|
|
||
| def parse_args(): | ||
| parser = argparse.ArgumentParser(description = "Reverse Complementing FASTQ files generated by Flexiplex") | ||
| parser.add_argument("-i", "--input", default = '-', help = "Input FASTQ file") | ||
| parser.add_argument("-o", "--output", default = '-', help = "Output FASTQ file") | ||
| args = parser.parse_args() | ||
|
|
||
| return args | ||
|
|
||
| def modify_read_id(id): | ||
| """ | ||
| Reverses strand direction tag in the Read ID. Example: GGAATCTCAAGCGCAA_TGGTCTTATTAA#9862034a-576a-44ad-bab9-30e8e9927dde_+1of1 | ||
| will be modified to GGAATCTCAAGCGCAA_TGGTCTTATTAA#9862034a-576a-44ad-bab9-30e8e9927dde_-1of1 | ||
|
|
||
| Args: | ||
| id (str): Read ID | ||
|
|
||
| Returns: | ||
| str: Modified Read ID | ||
| """ | ||
| rev_dict = {'+': '-', '-': '+'} | ||
| id_list = id.split('_') | ||
|
|
||
| # Exception handling in the event of unusual Flexiplex Read ID format | ||
| try: | ||
| strand_dir = id_list[-1][0] | ||
| id_list[-1] = rev_dict.get(strand_dir, strand_dir) + id_list[-1][1:] | ||
| except IndexError: | ||
| pass | ||
|
|
||
| return "_".join(id_list) | ||
|
|
||
| def modify_read_description(header): | ||
| """ | ||
| Reverses strand direction tag in the Description Header. Example: GGAATCTCAAGCGCAA_TGGTCTTATTAA#9862034a-576a-44ad-bab9-30e8e9927dde_+1of1 CB:Z:GGAATCTCAAGCGCAA UB:Z:TGGTCTTATTAA | ||
| will be modified to GGAATCTCAAGCGCAA_TGGTCTTATTAA#9862034a-576a-44ad-bab9-30e8e9927dde_-1of1 CB:Z:GGAATCTCAAGCGCAA UB:Z:TGGTCTTATTAA | ||
|
|
||
| Args: | ||
| header (str): Description header | ||
|
|
||
| Returns: | ||
| str: Modified description header | ||
| """ | ||
|
|
||
| header_list = header.split() | ||
| original_read_id = header_list[0] | ||
| modified_read_id = modify_read_id(original_read_id) | ||
|
|
||
| # Preserves trailing tags like CB:Z: / UB:Z: | ||
| return modified_read_id + header[len(original_read_id):] | ||
|
|
||
| def reverse_complement_seq(seq): | ||
| """ | ||
| Reverse complements a DNA sequence | ||
|
|
||
| Args: | ||
| seq (str): DNA sequence | ||
|
|
||
| Returns: | ||
| str: Sequence of the reverse complement | ||
| """ | ||
| return seq[::-1].translate(DNA_COMPLEMENT) | ||
|
|
||
| def reverse_phred_scores(phred_scores): | ||
| """ | ||
| Reverses Phred Quality Sequence | ||
|
|
||
| Args: | ||
| phred_scores (str): Phred quality score of the forward strand | ||
|
|
||
| Returns: | ||
| str: Phred quality score of the reverse complement | ||
| """ | ||
| return phred_scores[::-1] | ||
|
|
||
| if __name__ == "__main__": | ||
| # Parse arguments | ||
| args = parse_args() | ||
| f_in = sys.stdin if args.input == '-' else open(args.input, 'r') | ||
| f_out = sys.stdout if args.output == '-' else open(args.output, 'w') | ||
|
|
||
| # Track number of reads processed | ||
| reads_processed = 0 | ||
|
|
||
| while True: | ||
| # Retrieve information for each read (stored in 4 lines) | ||
| header = f_in.readline().rstrip() | ||
| # Stop once header is empty | ||
| if not header: | ||
| break | ||
|
|
||
| dna_seq = f_in.readline().rstrip() | ||
| f_in.readline() # Read separator line but do not store it | ||
| phred_seq = f_in.readline().rstrip() | ||
|
|
||
| # Get header, DNA sequence and Phred sequence for reverse complement | ||
| rc_header = modify_read_description(header) | ||
| rc_dna_seq = reverse_complement_seq(dna_seq) | ||
| rc_phred_seq = reverse_phred_scores(phred_seq) | ||
|
|
||
| # Write output | ||
| f_out.write(f"{rc_header}\n{rc_dna_seq}\n+\n{rc_phred_seq}\n") | ||
|
|
||
| # Increment read counter | ||
| reads_processed += 1 | ||
| if reads_processed % 1000000 == 0: | ||
| sys.stderr.write(f"\rProcessed {reads_processed/1000000} million reads") | ||
|
|
||
| # Close file handles if not stdin/stdout | ||
| if args.input != '-': f_in.close() | ||
| if args.output != '-': f_out.close() |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| FROM mambaorg/micromamba:git-c0f93d2-amazon2023 | ||
|
|
||
| RUN micromamba install -y -n base \ | ||
| -c conda-forge \ | ||
| -c bioconda \ | ||
| minimap2=2.30 \ | ||
| samtools=1.23 \ | ||
| procps-ng \ | ||
| && micromamba clean -ay | ||
|
|
||
| ENV PATH=/opt/conda/bin:$PATH |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| FROM mambaorg/micromamba:git-c0f93d2-amazon2023 | ||
|
|
||
| RUN micromamba install -y -n base \ | ||
| -c conda-forge \ | ||
| -c bioconda \ | ||
| chopper=0.12.0 \ | ||
| flexiplex=1.02.5 \ | ||
| cutadapt=5.2 \ | ||
| pigz=2.8 \ | ||
| procps-ng \ | ||
| && micromamba clean -ay | ||
|
|
||
| ENV PATH=/opt/conda/bin:$PATH |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| FROM rocker/r-base:4.4.1 | ||
|
|
||
| # install system dependencies | ||
| RUN apt-get update && apt-get install -y \ | ||
| libcurl4-openssl-dev \ | ||
| procps && rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # install Seurat Object (v5.3.0), Seurat (v5.4.0), and Bambu | ||
| RUN R -e "install.packages(c('pak', 'devtools', 'BiocManager'), repos='https://cloud.r-project.org')" | ||
| RUN R -e "pak::pkg_install(c('SeuratObject@5.3.0', 'Seurat@5.4.0', 'GoekeLab/bambu@devel_pre_v4'))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clare (human) comments :)
suggested to have a more comprehensive gitignore e.g.