Skip to content

Commit e859c4d

Browse files
committed
Adding option to change exon order for 1:M fields, leafcutter annotation needs genomic position.
1 parent 08d76a7 commit e859c4d

1 file changed

Lines changed: 47 additions & 2 deletions

File tree

workflow/scripts/splicing_annotation.py

Lines changed: 47 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
_HELP = dedent("""
1414
@Usage:
1515
$ ./splicing_annotation.py [-h] [--version] \\
16+
[--sort-exons-by-exon-order] \\
1617
--exon-ann EXON_ANN_FILE \\
1718
--output OUTPUT_FILE
1819
@About:
@@ -28,7 +29,7 @@
2829
• exon_id.1|exon_id.2|...
2930
• exon_number.1|exon_number.2|...
3031
• exon_seqname
31-
• exon_start:exon_end.1|exon_start.2:exon_end.2|...
32+
• exon_start.1:exon_end.1|exon_start.2:exon_end.2|...
3233
• exon_strand
3334
3435
This file has 1:M exon information collapsed by
@@ -47,6 +48,42 @@
4748
exon information. This represent the
4849
transcript model for each gene.
4950
@Options:
51+
--sort-exons-by-exon-order
52+
By default, 1:M exon information is
53+
sorted by seqname, exon_start, exon_end,
54+
and strand. This results in 1:M exon
55+
information being sorted by their genomic
56+
position which is not the same as their
57+
splicing order for transcripts on the
58+
negative strand.
59+
The default behavior will result in 1:M
60+
exon information being reporting in the
61+
following order:
62+
• Positive strand transcripts:
63+
• exon.1, exon.2, exon.3, ...
64+
• Negative strand transcripts:
65+
• ..., exon.3, exon.2, exon.1
66+
If this option IS provided, the order
67+
will be reversed for negative strand
68+
transcripts to reflect the correct
69+
splicing order, meaning it will be
70+
sorted by exon order instead of
71+
genomic position. The order will be:
72+
• Positive strand transcripts:
73+
• exon.1, exon.2, exon.3, ...
74+
• Negative strand transcripts:
75+
• exon.1, exon.2, exon.3, ...
76+
It is worth noting that if this option
77+
IS NOT provided (default behavior),
78+
1:M exon_start_end information related
79+
to exon location will be listed in
80+
increasing order for negative strand
81+
transcripts-- whereas if this option
82+
is provided, 1:M exon_start_end info
83+
will be listed in decreasing order for
84+
negative strand transcripts.
85+
• Default: False (i.e exons are
86+
sorted by genomic position).
5087
-h, --help
5188
Shows help message and exits.
5289
-v, --version
@@ -170,6 +207,14 @@ def parse_cli_arguments():
170207
required=True,
171208
help=argparse.SUPPRESS
172209
)
210+
# Sort exons by exon order,
211+
# not by genomic position
212+
parser.add_argument(
213+
'--sort-exons-by-exon-order',
214+
action='store_true',
215+
default=False,
216+
help=argparse.SUPPRESS,
217+
)
173218
# Get version information
174219
parser.add_argument(
175220
'-v', '--version',
@@ -426,7 +471,7 @@ def get_with_default(line_list, column_name_idx_dict, column_name, default_value
426471
# for the first exon in the list
427472
# to determine if the order
428473
# needs to be reversed.
429-
if v[EXON_1toM_KEY][0][PARSE_1toM_COLUMNS.index("exon_strand")] == "-":
474+
if v[EXON_1toM_KEY][0][PARSE_1toM_COLUMNS.index("exon_strand")] == "-" and args.sort_exons_by_exon_order:
430475
# If the strand is negative,
431476
# reverse the order of the exon
432477
# information to reflect the

0 commit comments

Comments
 (0)