Skip to content

Commit 86dc6ca

Browse files
authored
Merge pull request #40 from saxbophone/develop
v0.7.0 - Public Release
2 parents 8238b47 + 583f2a7 commit 86dc6ca

20 files changed

+1084
-105
lines changed

.travis.yml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
language: python
2+
python:
3+
- "2.7"
4+
- "3.3"
5+
- "3.4"
6+
- "3.5"
7+
- "3.6"
8+
- "pypy3.5"
9+
install:
10+
- make install-deps
11+
script:
12+
- make
13+
- make stress-test

LICENSE

Lines changed: 373 additions & 1 deletion
Large diffs are not rendered by default.

Makefile

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,9 @@ clean:
1313
rm -rf basest.egg-info build dist
1414

1515
lint:
16-
flake8 basest tests setup.py
16+
flake8 basest tests setup.py stress_test.py
1717
isort -rc -c basest tests
18-
isort -c setup.py
18+
isort -c setup.py stress_test.py
1919

2020
test:
2121
coverage run --source='basest' tests/__main__.py
@@ -25,5 +25,8 @@ cover:
2525

2626
tests: clean lint test cover
2727

28+
stress-test:
29+
python stress_test.py
30+
2831
package:
2932
python setup.py sdist bdist_wheel

README.md

Lines changed: 80 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@ It is also not just 8-bit binary data that could be serialised. Any collection o
1212

1313
This library is my implementation of a generic, base-to-base converter which addresses this last point. An encoder and decoder for every binary-to-text format currently existing can be created and used with this library, requiring only for the details of the desired format to be given. Due to its flexibility, the library also makes it trivial to invent new wonderful and interesting base-to-base serialisation/conversion formats (I myself plan to work on and release one that translates binary files into a purely emoji-based format!).
1414

15+
One limitation of the library is that it cannot encode data from a smaller input base to a larger output base with padding on the input (i.e. if you're encoding from base 2 to base 1000, you need to ensure that the number of input symbols exactly matches the encoding ratio you're using). This is an accepted limitation due to the complexities of implementing a padding system that works in the same manner as base-64 and others but which can be extended to any arbitrary base.
16+
1517
So, I hope you find this library fun, useful or both!
1618

1719
## Installation
@@ -43,20 +45,20 @@ There is a functional interface and a class-based interface (the class-based one
4345
To use the class-based interface, you will need to create a subclass of `basest.encoders.Encoder` and override attributes of the class, as shown below (using base64 as an example):
4446

4547
```py
46-
>>> from basest.encoders import Encoder
47-
>>>
48-
>>> class CustomEncoder(Encoder):
49-
... input_base = 256
50-
... output_base = 64
51-
... input_ratio = 3
52-
... output_ratio = 4
53-
... # these attributes are only required if using decode() and encode()
54-
... input_symbol_table = [chr(c) for c in range(256)]
55-
... output_symbol_table = [
56-
... s for s in 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
57-
... ]
58-
... padding_symbol = '='
59-
>>>
48+
from basest.encoders import Encoder
49+
50+
class CustomEncoder(Encoder):
51+
input_base = 256
52+
output_base = 64
53+
input_ratio = 3
54+
output_ratio = 4
55+
# these attributes are only required if using decode() and encode()
56+
input_symbol_table = [chr(c) for c in range(256)]
57+
output_symbol_table = [
58+
s for s in 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
59+
]
60+
padding_symbol = '='
61+
6062
```
6163

6264
> **Note:** You must subclass `Encoder`, you cannot use it directly!
@@ -67,36 +69,36 @@ Subclasses of `Encoder` have the following public methods available:
6769
`encode()` will encode an iterable of symbols in the class' **input symbol table** into an iterable of symbols in the class' **output symbol table**, observing the chosen encoding ratios and padding symbol.
6870

6971
```py
70-
>>> encoder = CustomEncoder()
71-
>>> encoder.encode(['c', 'a', 'b', 'b', 'a', 'g', 'e', 's'])
72-
['Y', '2', 'F', 'i', 'Y', 'm', 'F', 'n', 'Z', 'X', 'M', '=']
72+
encoder = CustomEncoder()
73+
encoder.encode(['c', 'a', 'b', 'b', 'a', 'g', 'e', 's'])
74+
# -> ['Y', '2', 'F', 'i', 'Y', 'm', 'F', 'n', 'Z', 'X', 'M', '=']
7375
```
7476

7577
#### Encode Raw
7678
`encode_raw()` works just like `encode()`, except that symbols are not interpreted. Instead, plain integers within range 0->(base - 1) should be used. the value of the base is used as the padding symbol.
7779

7880
```py
79-
>>> encoder = CustomEncoder()
80-
>>> encoder.encode_raw([1, 2, 3, 4, 5, 6, 7])
81-
[0, 16, 8, 3, 1, 0, 20, 6, 1, 48, 64, 64]
81+
encoder = CustomEncoder()
82+
encoder.encode_raw([1, 2, 3, 4, 5, 6, 7])
83+
# -> [0, 16, 8, 3, 1, 0, 20, 6, 1, 48, 64, 64]
8284
```
8385

8486
#### Decode from one base to another
8587
`decode()` works in the exact same way as `encode()`, but in the inverse.
8688

8789
```py
88-
>>> encoder = CustomEncoder()
89-
>>> encoder.decode(['Y', '2', 'F', 'i', 'Y', 'm', 'F', 'n', 'Z', 'X', 'M', '='])
90-
['c', 'a', 'b', 'b', 'a', 'g', 'e', 's']
90+
encoder = CustomEncoder()
91+
encoder.decode(['Y', '2', 'F', 'i', 'Y', 'm', 'F', 'n', 'Z', 'X', 'M', '='])
92+
# -> ['c', 'a', 'b', 'b', 'a', 'g', 'e', 's']
9193
```
9294

9395
#### Decode Raw
9496
`decode_raw()` works just like `decode()`, except that symbols are not interpreted. Instead, plain integers within range 0->(base - 1) should be used. the value of the base is used as the padding symbol.
9597

9698
```py
97-
>>> encoder = CustomEncoder()
98-
>>> encoder.decode_raw([0, 16, 8, 3, 1, 0, 20, 6, 1, 48, 64, 64])
99-
[1, 2, 3, 4, 5, 6, 7]
99+
encoder = CustomEncoder()
100+
encoder.decode_raw([0, 16, 8, 3, 1, 0, 20, 6, 1, 48, 64, 64])
101+
# -> [1, 2, 3, 4, 5, 6, 7]
100102
```
101103

102104
### Functional Interface
@@ -107,33 +109,33 @@ Return the input data, encoded into the specified base using the specified encod
107109
Returns the output data as a list of items that are guaranteed to be in the **output symbol table**, or the **output padding** symbol.
108110

109111
```py
110-
>>> import basest
111-
>>>
112-
>>> basest.core.encode(
113-
... input_base=256,
114-
... input_symbol_table=[chr(c) for c in range(256)],
115-
... output_base=64,
116-
... output_symbol_table=[
117-
... s for s in 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
118-
... ],
119-
... output_padding='=', input_ratio=3, output_ratio=4,
120-
... input_data='falafel'
121-
... )
122-
['Z', 'm', 'F', 's', 'Y', 'W', 'Z', 'l', 'b', 'A', '=', '=']
112+
import basest
113+
114+
basest.core.encode(
115+
input_base=256,
116+
input_symbol_table=[chr(c) for c in range(256)],
117+
output_base=64,
118+
output_symbol_table=[
119+
s for s in 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
120+
],
121+
output_padding='=', input_ratio=3, output_ratio=4,
122+
input_data='falafel'
123+
)
124+
# -> ['Z', 'm', 'F', 's', 'Y', 'W', 'Z', 'l', 'b', 'A', '=', '=']
123125
```
124126

125127
#### Encode Raw
126128
Similar to the function above, `basest.core.encode_raw` will encode one base into another, but only accepts and returns arrays of integers (e.g. bytes would be passed as integers between 0-255, not as `byte` objects). As such, it omits the **padding** and **symbol table** arguments, but is otherwise identical in function and form to `encode`.
127129

128130
```py
129-
>>> import basest
130-
>>>
131-
>>> basest.core.encode_raw(
132-
... input_base=256, output_base=85,
133-
... input_ratio=4, output_ratio=5,
134-
... input_data=[99, 97, 98, 98, 97, 103, 101, 115]
135-
... )
136-
[31, 79, 81, 71, 52, 31, 25, 82, 13, 76]
131+
import basest
132+
133+
basest.core.encode_raw(
134+
input_base=256, output_base=85,
135+
input_ratio=4, output_ratio=5,
136+
input_data=[99, 97, 98, 98, 97, 103, 101, 115]
137+
)
138+
# -> [31, 79, 81, 71, 52, 31, 25, 82, 13, 76]
137139
```
138140

139141
#### Decode from one encoded base to another.
@@ -143,33 +145,33 @@ Returns the output data as a list of items that are guaranteed to be in the **ou
143145
> This is essentially the inverse of `encode()`
144146
145147
```py
146-
>>> import basest
147-
>>>
148-
>>> basest.core.decode(
149-
... input_base=64,
150-
... input_symbol_table=[
151-
... s for s in 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
152-
... ],
153-
... input_padding='=',
154-
... output_base=256, output_symbol_table=[chr(c) for c in range(256)],
155-
... input_ratio=4, output_ratio=3,
156-
... input_data='YWJhY3VzIFpaWg=='
157-
... )
158-
['a', 'b', 'a', 'c', 'u', 's', ' ', 'Z', 'Z', 'Z']
148+
import basest
149+
150+
basest.core.decode(
151+
input_base=64,
152+
input_symbol_table=[
153+
s for s in 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
154+
],
155+
input_padding='=',
156+
output_base=256, output_symbol_table=[chr(c) for c in range(256)],
157+
input_ratio=4, output_ratio=3,
158+
input_data='YWJhY3VzIFpaWg=='
159+
)
160+
# -> ['a', 'b', 'a', 'c', 'u', 's', ' ', 'Z', 'Z', 'Z']
159161
```
160162

161163
#### Decode Raw
162164
Similar to the function above, `basest.core.decode_raw` will decode from one base to another, but only accepts and returns arrays of integers (e.g. base64 would be passed as integers between 0-65 (65 is for the padding symbol), not as `str` objects). As such, it omits the **padding** and **symbol table** arguments, but is otherwise identical in function and form to `decode`.
163165

164166
```py
165-
>>> import basest
166-
>>>
167-
>>> basest.core.decode_raw(
168-
... input_base=85, output_base=256,
169-
... input_ratio=5, output_ratio=4,
170-
... input_data=[31, 79, 81, 71, 52, 31, 25, 82, 13, 76]
171-
... )
172-
[99, 97, 98, 98, 97, 103, 101, 115]
167+
import basest
168+
169+
basest.core.decode_raw(
170+
input_base=85, output_base=256,
171+
input_ratio=5, output_ratio=4,
172+
input_data=[31, 79, 81, 71, 52, 31, 25, 82, 13, 76]
173+
)
174+
# -> [99, 97, 98, 98, 97, 103, 101, 115]
173175
```
174176

175177
#### Finding the best encoding ratio from one base to any base within a given range
@@ -178,14 +180,14 @@ For a given **input base** (e.g. base-256 / 8-bit Bytes), a given desired **outp
178180
Returns tuples containing an integer as the first item (representing the output base that is most efficient) and a tuple as the second, containing two integers representing the ratio of **input base** symbols to **output base** symbols.
179181

180182
```py
181-
>>> import basest
182-
>>>
183-
>>> basest.core.best_ratio(input_base=256, output_bases=[94], chunk_sizes=range(1, 256))
184-
(94, (68, 83))
185-
>>> basest.core.best_ratio(input_base=256, output_bases=[94], chunk_sizes=range(1, 512))
186-
(94, (458, 559))
187-
>>> basest.core.best_ratio(input_base=256, output_bases=range(2, 95), chunk_sizes=range(1, 256))
188-
(94, (68, 83))
189-
>>> basest.core.best_ratio(input_base=256, output_bases=range(2, 334), chunk_sizes=range(1, 256))
190-
(333, (243, 232))
183+
import basest
184+
185+
basest.core.best_ratio(input_base=256, output_bases=[94], chunk_sizes=range(1, 256))
186+
# -> (94, (68, 83))
187+
basest.core.best_ratio(input_base=256, output_bases=[94], chunk_sizes=range(1, 512))
188+
# -> (94, (458, 559))
189+
basest.core.best_ratio(input_base=256, output_bases=range(2, 95), chunk_sizes=range(1, 256))
190+
# -> (94, (68, 83))
191+
basest.core.best_ratio(input_base=256, output_bases=range(2, 334), chunk_sizes=range(1, 256))
192+
# -> (333, (243, 232))
191193
```

basest/__init__.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,16 @@
1-
#!/usr/bin/python
21
# -*- coding: utf-8 -*-
2+
#
3+
# Copyright (C) 2016, 2018, Joshua Saxby <joshua.a.saxby@gmail.com>
4+
#
5+
# This Source Code Form is subject to the terms of the Mozilla Public
6+
# License, v. 2.0. If a copy of the MPL was not distributed with this
7+
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
8+
#
39
from __future__ import (
410
absolute_import, division, print_function, unicode_literals
511
)
612

7-
from . import core, encoders
13+
from . import core, encoders, exceptions
814

915

10-
__all__ = ['core', 'encoders']
16+
__all__ = ['core', 'encoders', 'exceptions']

basest/core/__init__.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
1-
#!/usr/bin/python
21
# -*- coding: utf-8 -*-
2+
#
3+
# Copyright (C) 2016, 2018, Joshua Saxby <joshua.a.saxby@gmail.com>
4+
#
5+
# This Source Code Form is subject to the terms of the Mozilla Public
6+
# License, v. 2.0. If a copy of the MPL was not distributed with this
7+
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
8+
#
39
from __future__ import (
410
absolute_import, division, print_function, unicode_literals
511
)

basest/core/best_ratio.py

Lines changed: 45 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,55 @@
1-
#!/usr/bin/python
21
# -*- coding: utf-8 -*-
2+
#
3+
# Copyright (C) 2016, 2018, Joshua Saxby <joshua.a.saxby@gmail.com>
4+
#
5+
# This Source Code Form is subject to the terms of the Mozilla Public
6+
# License, v. 2.0. If a copy of the MPL was not distributed with this
7+
# file, You can obtain one at http://mozilla.org/MPL/2.0/.
8+
#
39
from __future__ import (
410
absolute_import, division, print_function, unicode_literals
511
)
612

713
from math import ceil, log
814

915

10-
INF = float('infinity')
16+
# an easy way to store positive infinity in a manner compatible with Python 2.x
17+
INF = float('inf')
1118

1219

1320
def _encoding_ratio(base_from, base_to, chunk_sizes):
1421
"""
1522
An algorithm for finding the most efficient encoding ratio
1623
from one base to another within a range limit.
1724
"""
25+
# a ratio of 1:Infinity is the theoretical worst possible ratio
1826
best_ratio = (1.0, INF)
1927
for s in chunk_sizes:
20-
match = ceil(log(base_from ** s, base_to))
28+
# validate each chunk size here
29+
if not isinstance(s, int):
30+
raise TypeError('chunk sizes must be list of ints')
31+
'''
32+
We need to work out how many digits in the output base are needed to
33+
represent a number s digits long in the input base.
34+
35+
The number of values represented by an s digit long number in the input
36+
base is `base_from ** s`
37+
38+
The number of digits in base x needed to represent n values is
39+
`ceil(logx(n))`
40+
41+
Altogether this is `ceil(logx(base_from ** s))`
42+
43+
This can be simplified using the law `n log(x) = log(x ** n)`
44+
45+
To become the following:
46+
'''
47+
match = ceil(s * log(base_from, base_to))
48+
# the efficiency ratio is input:output
2149
ratio = (float(s), match)
50+
# ratio efficiences can be compared by dividing them like fractions
2251
if (ratio[0] / ratio[1]) > (best_ratio[0] / best_ratio[1]):
52+
# this is the new best ratio found so far
2353
best_ratio = ratio
2454
return (int(best_ratio[0]), int(best_ratio[1]))
2555

@@ -30,14 +60,26 @@ def best_ratio(input_base, output_bases, chunk_sizes):
3060
sizes, find the most efficient encoding ratio.
3161
Returns the chosen output base, and the chosen encoding ratio.
3262
"""
63+
# validate input base type
64+
if not isinstance(input_base, int):
65+
raise TypeError('input base must be of int type')
66+
67+
# we will store the most efficient output base here
3368
encoder = 0
69+
# a ratio of 1:Infinity is the theoretical worst possible ratio
3470
best_ratio = (1.0, INF)
3571
for base_to in output_bases:
72+
# validate each output base here
73+
if not isinstance(base_to, int):
74+
raise TypeError('output bases must be list of ints')
75+
# get the best encoding ratio for this base out of all chunk sizes
3676
ratio = _encoding_ratio(input_base, base_to, chunk_sizes)
77+
# if it's more efficient, then set it as the most efficient one yet
3778
if (
3879
(float(ratio[0]) / float(ratio[1])) >
3980
(float(best_ratio[0]) / float(best_ratio[1]))
4081
):
4182
best_ratio = ratio
4283
encoder = base_to
84+
# we now have the best output base and ratio for it
4385
return encoder, (int(best_ratio[0]), int(best_ratio[1]))

0 commit comments

Comments
 (0)