You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+80-78Lines changed: 80 additions & 78 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,6 +12,8 @@ It is also not just 8-bit binary data that could be serialised. Any collection o
12
12
13
13
This library is my implementation of a generic, base-to-base converter which addresses this last point. An encoder and decoder for every binary-to-text format currently existing can be created and used with this library, requiring only for the details of the desired format to be given. Due to its flexibility, the library also makes it trivial to invent new wonderful and interesting base-to-base serialisation/conversion formats (I myself plan to work on and release one that translates binary files into a purely emoji-based format!).
14
14
15
+
One limitation of the library is that it cannot encode data from a smaller input base to a larger output base with padding on the input (i.e. if you're encoding from base 2 to base 1000, you need to ensure that the number of input symbols exactly matches the encoding ratio you're using). This is an accepted limitation due to the complexities of implementing a padding system that works in the same manner as base-64 and others but which can be extended to any arbitrary base.
16
+
15
17
So, I hope you find this library fun, useful or both!
16
18
17
19
## Installation
@@ -43,20 +45,20 @@ There is a functional interface and a class-based interface (the class-based one
43
45
To use the class-based interface, you will need to create a subclass of `basest.encoders.Encoder` and override attributes of the class, as shown below (using base64 as an example):
44
46
45
47
```py
46
-
>>>from basest.encoders import Encoder
47
-
>>>
48
-
>>>classCustomEncoder(Encoder):
49
-
... input_base =256
50
-
... output_base =64
51
-
... input_ratio =3
52
-
... output_ratio =4
53
-
...# these attributes are only required if using decode() and encode()
54
-
... input_symbol_table = [chr(c) for c inrange(256)]
55
-
... output_symbol_table = [
56
-
... s for s in'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
57
-
... ]
58
-
... padding_symbol ='='
59
-
>>>
48
+
from basest.encoders import Encoder
49
+
50
+
classCustomEncoder(Encoder):
51
+
input_base =256
52
+
output_base =64
53
+
input_ratio =3
54
+
output_ratio =4
55
+
# these attributes are only required if using decode() and encode()
56
+
input_symbol_table = [chr(c) for c inrange(256)]
57
+
output_symbol_table = [
58
+
s for s in'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
59
+
]
60
+
padding_symbol ='='
61
+
60
62
```
61
63
62
64
> **Note:** You must subclass `Encoder`, you cannot use it directly!
@@ -67,36 +69,36 @@ Subclasses of `Encoder` have the following public methods available:
67
69
`encode()` will encode an iterable of symbols in the class' **input symbol table** into an iterable of symbols in the class' **output symbol table**, observing the chosen encoding ratios and padding symbol.
`encode_raw()` works just like `encode()`, except that symbols are not interpreted. Instead, plain integers within range 0->(base - 1) should be used. the value of the base is used as the padding symbol.
77
79
78
80
```py
79
-
>>>encoder = CustomEncoder()
80
-
>>>encoder.encode_raw([1, 2, 3, 4, 5, 6, 7])
81
-
[0, 16, 8, 3, 1, 0, 20, 6, 1, 48, 64, 64]
81
+
encoder = CustomEncoder()
82
+
encoder.encode_raw([1, 2, 3, 4, 5, 6, 7])
83
+
# -> [0, 16, 8, 3, 1, 0, 20, 6, 1, 48, 64, 64]
82
84
```
83
85
84
86
#### Decode from one base to another
85
87
`decode()` works in the exact same way as `encode()`, but in the inverse.
`decode_raw()` works just like `decode()`, except that symbols are not interpreted. Instead, plain integers within range 0->(base - 1) should be used. the value of the base is used as the padding symbol.
Similar to the function above, `basest.core.encode_raw` will encode one base into another, but only accepts and returns arrays of integers (e.g. bytes would be passed as integers between 0-255, not as `byte` objects). As such, it omits the **padding** and **symbol table** arguments, but is otherwise identical in function and form to `encode`.
127
129
128
130
```py
129
-
>>>import basest
130
-
>>>
131
-
>>>basest.core.encode_raw(
132
-
...input_base=256, output_base=85,
133
-
...input_ratio=4, output_ratio=5,
134
-
...input_data=[99, 97, 98, 98, 97, 103, 101, 115]
135
-
...)
136
-
[31, 79, 81, 71, 52, 31, 25, 82, 13, 76]
131
+
import basest
132
+
133
+
basest.core.encode_raw(
134
+
input_base=256, output_base=85,
135
+
input_ratio=4, output_ratio=5,
136
+
input_data=[99, 97, 98, 98, 97, 103, 101, 115]
137
+
)
138
+
# -> [31, 79, 81, 71, 52, 31, 25, 82, 13, 76]
137
139
```
138
140
139
141
#### Decode from one encoded base to another.
@@ -143,33 +145,33 @@ Returns the output data as a list of items that are guaranteed to be in the **ou
143
145
> This is essentially the inverse of `encode()`
144
146
145
147
```py
146
-
>>>import basest
147
-
>>>
148
-
>>>basest.core.decode(
149
-
...input_base=64,
150
-
...input_symbol_table=[
151
-
... s for s in'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
152
-
... ],
153
-
...input_padding='=',
154
-
...output_base=256, output_symbol_table=[chr(c) for c inrange(256)],
155
-
...input_ratio=4, output_ratio=3,
156
-
...input_data='YWJhY3VzIFpaWg=='
157
-
...)
158
-
['a', 'b', 'a', 'c', 'u', 's', '', 'Z', 'Z', 'Z']
148
+
import basest
149
+
150
+
basest.core.decode(
151
+
input_base=64,
152
+
input_symbol_table=[
153
+
s for s in'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
154
+
],
155
+
input_padding='=',
156
+
output_base=256, output_symbol_table=[chr(c) for c inrange(256)],
Similar to the function above, `basest.core.decode_raw` will decode from one base to another, but only accepts and returns arrays of integers (e.g. base64 would be passed as integers between 0-65 (65 is for the padding symbol), not as `str` objects). As such, it omits the **padding** and **symbol table** arguments, but is otherwise identical in function and form to `decode`.
#### Finding the best encoding ratio from one base to any base within a given range
@@ -178,14 +180,14 @@ For a given **input base** (e.g. base-256 / 8-bit Bytes), a given desired **outp
178
180
Returns tuples containing an integer as the first item (representing the output base that is most efficient) and a tuple as the second, containing two integers representing the ratio of **input base** symbols to **output base** symbols.
0 commit comments