Skip to content

Commit ed68cce

Browse files
committed
updated readme and improved performance
1 parent 671617d commit ed68cce

3 files changed

Lines changed: 56 additions & 8 deletions

File tree

CAI/CAI.py

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
from itertools import chain
2-
from . import genetic_codes
2+
from genetic_codes import genetic_codes
33
from scipy.stats.mstats import gmean
44

55
def _synonymous_codons(genetic_code_dict):
@@ -31,12 +31,12 @@ def RSCU(sequences, genetic_code=1):
3131
if len(sequence) % 3 != 0:
3232
raise ValueError("Input sequence not divisible by three")
3333
if len(sequence) == 0:
34-
raise ValueError("Cannot include empty sequence in input")
34+
raise ValueError("Input sequence cannot be empty")
3535

3636
# count the number of each codon in the sequences
3737
sequences = [[sequence[i:i+3].upper() for i in range(0, len(sequence), 3)] for sequence in sequences]
38-
codons = list(chain.from_iterable(sequences))
39-
counts = {i: codons.count(i) for i in set(genetic_code.keys())}
38+
codons = list(chain.from_iterable(sequences)) # flat list of all codons (to be used for counting)
39+
counts = {i: codons.count(i) for i in genetic_code.keys()}
4040

4141
# "if a certain codon is never used in the reference set... assign [it] a value of 0.5" (page 1285)
4242
for codon in counts:
@@ -103,9 +103,12 @@ def CAI(sequence, weights=[], RSCUs=[], sequences=[], genetic_code=1):
103103
# determine the synonymous codons in the genetic code
104104
synonymous_codons = _synonymous_codons(genetic_codes[genetic_code])
105105

106+
# find codons without synonyms
107+
non_synonymous_codons = [codon for codon in synonymous_codons.keys() if len(synonymous_codons[codon]) == 1]
108+
106109
# create a list of the weights for the sequqence, not counting codons without synonyms (page 1285)
107110
try:
108-
sequence_weights = [weights[codon] for codon in sequence if (len(synonymous_codons[codon]) != 1)]
111+
sequence_weights = [weights[codon] for codon in sequence if codon not in non_synonymous_codons]
109112
except KeyError, e:
110113
raise KeyError("Bad weights dictionary passed: missing weight for codon " + str(e))
111114

README.md

Lines changed: 47 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ An implementation of Sharp and Li's 1987 formulation of the codon adaption index
77
Sharp, P. M., & Li, W. H. (1987). The codon adaptation index--a measure of directional synonymous codon usage bias, and its potential applications. _Nucleic Acids Research_, 15(3), 1281–1295.
88

99
## Installation
10+
This module is available from PyPi and can be downloaded with the following command:
11+
1012
pip install CAI
1113

1214
## Usage
@@ -23,14 +25,57 @@ print CAI("ATGGATTAC...", sequences=["ATGTTTGCTAAA", "ATGCGATACAGC",...])
2325
### Advanced Usage
2426
If you have already computed the weights or RSCU values of the reference set, you can supply `CAI()` with one or the other as arguments. They must be formatted as a dictionary and contain values for every codon.
2527

26-
To calculate the RSCU without calculating the CAI, you can use `RSCU()`. `RSCU()`s only required parameter a list of sequences.
28+
**_N.B._ if you are computing large numbers of CAIs with the same reference sequences, first calculate their weights and then pass that to `CAI()` to eliminate redundant computation.**
29+
30+
To calculate RSCU without calculating CAI, you can use `RSCU()`. `RSCU()`'s only required argument is a list of sequences.
2731

28-
Similarly, to calculate the weights of a reference set, you can use `relative_adaptiveness()`. `relative_adaptiveness()` takes either a list of sequences as the `sequences` parameter or a dictionary of RSCUs as the `RSCUs` parameter.
32+
Similarly, to calculate the weights of reference sequences, you can use `relative_adaptiveness()`. `relative_adaptiveness()` takes either a list of sequences as the `sequences` parameter or a dictionary of RSCUs as the `RSCUs` parameter.
2933

3034
### Other Genetic Codes
3135

3236
All functions in CAI support an optional `genetic_code` parameter, which is set by default to 1 (the standard genetic code). You may set it to any genetic code within [gc.prt](/gc.prt).
3337

38+
## API Reference
39+
### `RSCU(sequences, genetic_code=1)`
40+
41+
Argument | Details
42+
--------- | -------
43+
sequences | List of DNA sequence strings. Required.
44+
genetic_code | Integer containing the genetic code ID. Optional.
45+
46+
#### Output
47+
A dictionary containing every codon as the key and its RSCU as the value.
48+
49+
### `relative_adaptiveness(sequences=[], RSCUs={}, genetic_code=1)`
50+
51+
Argument | Details
52+
--------- | -------
53+
sequences | List of DNA sequence strings. Optional.
54+
RSCUs | Dictionary of RSCU values for each codon. Optional.
55+
genetic_code | Integer containing the genetic code ID. Optional.
56+
57+
#### Note
58+
One of `sequences` or `RSCUs` is required.
59+
60+
#### Output
61+
A dictionary containing every codon as the key and its weight as the value.
62+
63+
### `CAI(sequence, weights=[], RSCUs=[], sequences=[], genetic_code=1)`
64+
65+
Argument | Details
66+
--------- | -------
67+
sequence | String of DNA sequence to calculate CAI for. Required.
68+
weights | Dictionary of weight values for each codon. Optional.
69+
RSCUs | Dictionary of RSCU values for each codon. Optional.
70+
sequences | List of DNA sequence strings. Required.
71+
genetic_code | Integer containing the genetic code ID. Optional.
72+
73+
#### Note
74+
One of `sequences`, `RSCUs`, or `weights` is required.
75+
76+
#### Output
77+
A float of the CAI of the sequence.
78+
3479
## Contributing
3580
Feel free to contribute, open issues, or let me know about bugs. Anything is welcome!
3681

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
setup(
44
name = 'CAI',
55
packages = ["CAI"],
6-
version = '0.1.6',
6+
version = '0.1.7',
77
description = 'Python implementation of codon adaptation index',
88
author = 'Benjamin Lee',
99
author_email = 'benjamin_lee@college.harvard.edu',

0 commit comments

Comments
 (0)