Skip to content

Commit 725cb9f

Browse files
committed
Merge branch 'ar/changelog050' into 'master'
Updates Changelog for v0.5.0 See merge request machine-learning/modkit!286
2 parents e8b411e + f9b5a74 commit 725cb9f

47 files changed

Lines changed: 825 additions & 203 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,16 @@ All notable changes to this project will be documented in this file.
44
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
55
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
66

7+
## [v0.5.0]
8+
### Adds
9+
- [open-chromatin] Adds open chromatin prediction subcommand for 6mA MTase-treated DNA
10+
- [all] Fallback to ML 254 when threshold is estimated as 1.0
11+
### Changes
12+
- [all] Refactor to workspaces
13+
- [modbam, check-tags] Adds `--head <n>` option to take first `n` reads
14+
### Fixes
15+
- [stats] Allow BED5 input regions and header/comment lines
16+
717
## [v0.4.4]
818
### Adds
919
- [extract] Adds alignment start and end columns

book/src/advanced_usage.md

Lines changed: 126 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -15,58 +15,60 @@ Nanopore
1515
Usage: modkit <COMMAND>
1616
1717
Commands:
18-
pileup Tabulates base modification calls across genomic positions. This
19-
command produces a bedMethyl formatted file. Schema and
20-
description of fields can be found in the README
21-
adjust-mods Performs various operations on BAM files containing base
22-
modification information, such as converting base modification
23-
codes and ignoring modification calls. Produces a BAM output
24-
file
25-
update-tags Renames Mm/Ml to tags to MM/ML. Also allows changing the mode
26-
flag from silent '.' to explicitly '?' or '.'
27-
sample-probs Calculate an estimate of the base modification probability
28-
distribution
29-
summary Summarize the mod tags present in a BAM and get basic
30-
statistics. The default output is a totals table (designated by
31-
'#' lines) and a modification calls table. Descriptions of the
32-
columns can be found in the README
33-
call-mods Call mods from a modbam, creates a new modbam with probabilities
34-
set to 100% if a base modification is called or 0% if called
35-
canonical
36-
extract Extract read-level base modification information from a modBAM
37-
into a tab-separated values table
38-
repair Repair MM and ML tags in one bam with the correct tags from
39-
another. To use this command, both modBAMs _must_ be sorted by
40-
read name. The "donor" modBAM's reads must be a superset of the
41-
acceptor's reads. Extra reads in the donor are allowed, and
42-
multiple reads with the same name (secondary, etc.) are allowed
43-
in the acceptor. Reads with an empty SEQ field cannot be
44-
repaired and will be rejected. Reads where there is an ambiguous
45-
alignment of the acceptor to the donor will be rejected (and
46-
logged). See the full documentation for details
47-
dmr Perform DMR test on a set of regions. Output a BED file of
48-
regions with the score column indicating the magnitude of the
49-
difference. Find the schema and description of fields can in the
50-
README as well as a description of the model and method. See
51-
subcommand help for additional details
52-
pileup-hemi Tabulates double-stranded base modification patters (such as
53-
hemi-methylation) across genomic motif positions. This command
54-
produces a bedMethyl file, the schema can be found in the online
55-
documentation
56-
validate Validate results from a set of mod-BAM files and associated BED
57-
files containing the ground truth modified base status at
58-
reference positions
59-
motif Various commands to search for, evaluate, or further regine
60-
sequence motifs enriched for base modification. Also can
61-
generate BED files of motif locations
62-
entropy Use a mod-BAM to calculate methylation entropy over genomic
63-
windows
64-
localize Investigate patterns of base modifications, by aggregating
65-
pileup counts "localized" around genomic features of interest
66-
stats Calculate base modification levels over regions
67-
bedmethyl Utilities to work with bedMethyl files
68-
modbam Utilities to work with modBAM files
69-
help Print this message or the help of the given subcommand(s)
18+
pileup Tabulates base modification calls across genomic positions.
19+
This command produces a bedMethyl formatted file. Schema and
20+
description of fields can be found in the README
21+
adjust-mods Performs various operations on BAM files containing base
22+
modification information, such as converting base modification
23+
codes and ignoring modification calls. Produces a BAM output
24+
file
25+
update-tags Renames Mm/Ml to tags to MM/ML. Also allows changing the mode
26+
flag from silent '.' to explicitly '?' or '.'
27+
sample-probs Calculate an estimate of the base modification probability
28+
distribution
29+
summary Summarize the mod tags present in a BAM and get basic
30+
statistics. The default output is a totals table (designated
31+
by '#' lines) and a modification calls table. Descriptions of
32+
the columns can be found in the README
33+
call-mods Call mods from a modbam, creates a new modbam with
34+
probabilities set to 100% if a base modification is called or
35+
0% if called canonical
36+
extract Extract read-level base modification information from a modBAM
37+
into a tab-separated values table
38+
repair Repair MM and ML tags in one bam with the correct tags from
39+
another. To use this command, both modBAMs _must_ be sorted by
40+
read name. The "donor" modBAM's reads must be a superset of
41+
the acceptor's reads. Extra reads in the donor are allowed,
42+
and multiple reads with the same name (secondary, etc.) are
43+
allowed in the acceptor. Reads with an empty SEQ field cannot
44+
be repaired and will be rejected. Reads where there is an
45+
ambiguous alignment of the acceptor to the donor will be
46+
rejected (and logged). See the full documentation for details
47+
dmr Perform DMR test on a set of regions. Output a BED file of
48+
regions with the score column indicating the magnitude of the
49+
difference. Find the schema and description of fields can in
50+
the README as well as a description of the model and method.
51+
See subcommand help for additional details
52+
pileup-hemi Tabulates double-stranded base modification patters (such as
53+
hemi-methylation) across genomic motif positions. This command
54+
produces a bedMethyl file, the schema can be found in the
55+
online documentation
56+
validate Validate results from a set of mod-BAM files and associated
57+
BED files containing the ground truth modified base status at
58+
reference positions
59+
motif Various commands to search for, evaluate, or further regine
60+
sequence motifs enriched for base modification. Also can
61+
generate BED files of motif locations
62+
entropy Use a mod-BAM to calculate methylation entropy over genomic
63+
windows
64+
localize Investigate patterns of base modifications, by aggregating
65+
pileup counts "localized" around genomic features of interest
66+
stats Calculate base modification levels over regions
67+
bedmethyl Utilities to work with bedMethyl files
68+
modbam Utilities to work with modBAM files
69+
open-chromatin Identify regions of open chromatin based on exogenous 6mA
70+
signal
71+
help Print this message or the help of the given subcommand(s)
7072
7173
Options:
7274
-h, --help Print help
@@ -1507,7 +1509,7 @@ Options:
15071509
15081510
Output Options:
15091511
-o, --out-table <OUT_TABLE> Specify the output file to write the results
1510-
table
1512+
table, or "-"/"stdout" for stdout
15111513
--force Force overwrite the output file
15121514
--no-header Don't add the header describing the columns to
15131515
the output
@@ -1521,6 +1523,73 @@ Compute Options:
15211523
2]
15221524
```
15231525

1526+
## open-chromatin predict
1527+
```text
1528+
Usage: modkit open-chromatin predict [OPTIONS] --model <MODEL_PATH> --output <OUTPUT> <IN_BAM>
1529+
1530+
Arguments:
1531+
<IN_BAM>
1532+
Input modBAM with 6mA base modification calls
1533+
1534+
Options:
1535+
--model <MODEL_PATH>
1536+
Path to directory with open-chromatin model
1537+
1538+
--batch-size <BATCH_SIZE>
1539+
Collect this many windows of data for each run through the model
1540+
1541+
[default: 1024]
1542+
1543+
--super-batch-size <SUPER_BATCH_SIZE>
1544+
Number of "batches" to collect at once, see documentation for exact
1545+
calculation or run with --dryrun to see output
1546+
1547+
[default: 100]
1548+
1549+
--step-size <STEP_SIZE>
1550+
Number of base pairs to step along the genome or genomic intervals to
1551+
make predictions. Smaller numbers will result in better resolution but
1552+
take more computation
1553+
1554+
[default: 25]
1555+
1556+
-t, --min-coverage <MIN_COVERAGE>
1557+
Require this many reads covering each prediction window
1558+
1559+
[default: 5]
1560+
1561+
--threshold <FILTER_THRESHOLD>
1562+
Only emit records/windows where the open chromatin probability is
1563+
greater than or equal to this value
1564+
1565+
-o, --output <OUTPUT>
1566+
Output bedGraph file, or "-"/"stdout" to write to stdout
1567+
1568+
--force
1569+
Force overwrite output file
1570+
1571+
--log-filepath <LOG_FILEPATH>
1572+
Path to file to write logs to, setting this parameter is recommended
1573+
1574+
--include-bed <INCLUDE_BED>
1575+
BED file of regions over which to make predictions
1576+
1577+
--region <REGION>
1578+
Region in the format <chrom>:start-end over which to make predictions
1579+
1580+
--suppress-progress
1581+
Hide the progress bar
1582+
1583+
--no-header
1584+
Don't print header in output
1585+
1586+
--dryrun
1587+
Perform setup and validations, but stop before performing inference
1588+
1589+
-h, --help
1590+
Print help (see a summary with '-h')
1591+
```
1592+
15241593
## extract full
15251594
```text
15261595
Transform the probabilities from the MM/ML tags in a modBAM into a table
@@ -2750,4 +2819,8 @@ Selection Options:
27502819
Process only the specified region of the BAM when collecting
27512820
probabilities. Format should be <chrom_name>:<start>-<end> or
27522821
<chrom_name>
2822+
2823+
--head <HEAD>
2824+
Use the first N reads without sampling. Shorthand for --ignore-index
2825+
--num-reads N
27532826
```

0 commit comments

Comments
 (0)