Skip to content

make_fragment_file silently drops marked duplicates, making frac_duplicates always 0 and fragment counts always 1 #464

Description

@anagchar

Hello,

I am passing my BAM file, which has been pre-processed already, to the snap.pp.make_fragment_file and all duplicate-flagged reads are silently dropped before SnapATAC2's own dedup pass. The results that I got are the following:

  1. frac_duplicates in the returned stats is always 0.0
  2. Column 5 (count) in the output fragments file is always 1 for every row

What can I do in order to retain the duplicates as well, not only the primary reads? Is there a parameter I missed, or a recommended workflow for users whose BAMs come from a pipeline that already runs MarkDuplicates? I'd like column 5 and frac_duplicates to reflect the real per-fragment read counts.

Environment

  • SnapATAC2 2.9.0, Python 3.11
  • Long-read single-end scATAC BAM file (ONT), Picard MarkDuplicates with --BARCODE_TAG CB

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions