Merge pull request #21 from saxbophone/develop

saxbophone · web-flow · commit 8238b47d0556 · 2016-11-14T21:54:26.000Z
v0.6.2
diff --git a/README.md b/README.md
@@ -189,165 +189,3 @@ Returns tuples containing an integer as the first item (representing the output
 >>> basest.core.best_ratio(input_base=256, output_bases=range(2, 334), chunk_sizes=range(1, 256))
 (333, (243, 232))
 ```
-
-## Further Examples
-
-#### Base-78, using emoji as output (just for fun)
-> **Note:** This example is aimed at Python 3 and may not work on Python 2 without some modification (or at all).
-
-Unicode character ranges `0x1F601` through to `0x1F64F` are allocated for *emoticon emoji*. This range provides us with 78 characters to play with.
-
-First of all, let's find us some appropriate encoding ratios within given ranges:
-
-```py
->>> from basest.core import best_ratio
->>> best_ratio(256, [78], range(2, 1024))
-(78, (1019, 1297))  # hmm, maybe a bit too big
->>> best_ratio(256, [78], range(2, 16))
-(78, (7, 9))  # we could probably go a bit larger but this will do
-```
-
-Now, let's choose a padding character from one of the other Unicode emoji codepages. I decided to choose the `bear face` emoji (:bear: / 🐻), codepoint `0x1F43B`.
-
-With these chosen parameters and a body of input data (will use text for this example), we can put it all together:
-
-```py
->>> from basest.core import encode
->>> # input data variable
->>> message = ...
->>> output = encode(
-...     256, [chr(i) for i in range(256)],  # input base and symbol table
-...     78, [chr(0x1F601 + o) for o in range(78)],  # output base and symbol table
-...     chr(0x1F43B),  # padding character
-...     7, 9,  # encoding ratio
-...     message
-... )
-```
-
-Given this input message (in ASCII):
-
-```
-Fourscore and seven years ago our fathers brought forth on this
-continent a new nation, conceived in liberty and dedicated to the
-proposition that all men are created equal.
-Now we are engaged in a great civil war, testing whether that nation
-or any nation so conceived and so dedicated can long endure. We are
-met on a great battle field of that war. We have come to dedicate a
-portion of that field, as a final resting place for those who here
-gave their lives that that nation might live. It is altogether
-fitting and proper that we should do this.
-But, in a larger sense, we can not dedicate - we can not consecrate
-- we can not hallow - this ground. The brave men, living and dead,
-who struggled here, have consecrated it, far above our poor power to
-add or detract. The world will little note, nor long remember, what
-we say here, but it can never forget what they did here. It is for
-us the living, rather, to be dedicated here to the unfinished work
-which they who fought here have thus far so nobly advanced. It is
-rather for us to be here dedicated to the great task remaining
-before us - that from these honored dead we take increased devotion
-to that cause for which they gave the last full measure of devotion
-- that we here highly resolve that these dead shall not have died in
-vain - that this nation, under God, shall have a new birth of
-freedom - and that government of the people, by the people, for the
-people, shall not perish from the earth.
-```
-
-We get this output:
-
-😃😉😳😿😷😤😿😺🙆😗🙆🙃😢😼🙊🙋😧😡😇😴🙎🙉😋😧😲😑😍😙🙊😖😿😰😿😂😼😤😖😔😤
-
-😜😅😬😃😝😉😖😃😭😷🙇😥😅😗😰😇😳🙊😎😟😔😌😝🙍😘🙃🙂😬😧😻😟😠😏😇😴😻😬😹😆
-
-😌🙊😈😘😲😣😐😜😣😐😆😨😗😶😂😔😟🙎😃😗🙍😗😶😂😡😘😜😓😛😩😖😴😩😰😸😩😈😜😊
-
-😗😵🙆😙😗🙌😹😼😃😇😴😞😕😼😟😲🙊😡😕🙂😰🙀😫😊😼😗😗😕😭😤😕😝🙃🙇😽😔😕😂😲
-
-😹😺😏😎😬😂😇😵😅🙄😚😎😛😑😣😗🙆😹🙄😦😃😂😝😾😗🙆😮😯😘🙍🙃🙂😏😇😳🙅😎😱😈
-
-😛🙌😼😗😱😷😊😄😐😎😵🙀😘😨😉😭😇😧🙁🙇😝😕🙂😫🙋😅🙌😺🙀🙍😑😉🙄😹😞😕😣😟😅
-
-😕😂😨🙋😶😯😨😟😈😕😁🙁😔🙎😡😾😅😓😇😳🙃😹😪😥😞🙍😖😘🙃🙂😧🙎🙊😈😦😃😇😵😔
-
-😞😥😒😗😎😏😕🙂😵😷😖😤😣😧🙍😙😪😡😻😄😓😟😄😱😇😵😅🙄😘😫😩😖😩😕😂😲😿😻😣
-
-😆😦🙆😘😣😾😶😄😓😓😨😅😕😂😲😿😻😣😇😡😌😗🙁😹😄😨😝🙌😡😊😖😴🙋😱😢😷😹🙈😍
-
-😕😭😤😫😺😼😲🙇😖😕😲😂😴😎😟😯😭😅😇😳🙎😹🙈😩🙆😨🙊😗😶😊😩🙉😲🙍😃😲😘😨😈
-
-😮😹😸🙍😩😙😕😂😨🙋😬😲🙁😆😇😇😴😻😬😹😃😧😓🙄😘😨😉😭😇😞😘😭😦😘🙉😈😾😼😧
-
-😵😭😒😕🙂😓😑😕😺😱😁😩😘🙈😛🙎😭😰🙅😖😄😘😤😳😾😽🙂😆🙋😱😕😂😼😦🙎😝😨😊🙄
-
-😕😽😦😤😺😔😳😁😆😕😲😂😴😎😟😯😬😏😔🙊😁😠😮😷😥😺😼😗🙆😮😯😖😸😽😭🙅😖😣😱
-
-😦😧😺😭😲😐😗😕🙆😙🙆😨😇😫😥😔🙋😞😱🙃😟😬😪😓😇😴🙊😄😗😹😆😧😇😖😏😪😍😽😾
-
-😊😯😦😇😴😏😳😔😋😃😝😹😗🙆🙈😙😯😢😬😪😮😇😴😙😒😂😿🙈😑😬😕😂😼😦🙎😟🙂🙋😖
-
-😖😴😶😽😪😠🙅😘😼😘😳🙀🙉😽😬😌🙃😱😘🙈😛🙎😭😰🙄🙆😨😘🙈😡😌😏😿😚😾😠😖😔😃
-
-😼😽😟🙊🙎😭😕😾😛😑😊😰😱😢🙄😘😳🙀😭😪😽😙😑😸😕🙂😺😜😒😷😒😹😓😖😴🙂😌😊😸
-
-😹😮😓😕😂😕😠🙆🙀😏😌😦😘😈😆😈🙅😉😅🙄😈😘🙃🙂🙅😹😲😕😖😻😗🙇😄😒😅😑😺🙍🙄
-
-😇😵😅🙄😜😎😑🙃😇😎😳🙋😕🙄😴😒😌😹😇😳🙃😹😬😷🙀😟😈😕🙂😯😑😫😼😇😗😠😕😾😑
-
-😣😺😫😶😴🙆😕😂😔😊😏🙈😸😔😳😕😱😽😌😠😔😞😥😑😕😽😥😈😴😕😅😟😽😕😡😧🙈😄😗
-
-😈🙇😥😇😳🙎🙎😻😋😝😦😾😘😧🙄😟🙋😟😪😴😃😙😪😑😽🙀🙈😐😭🙅😗😶😳😏😈😵😝🙂😟
-
-😗😖😯😤😫😼😄😳🙅😖😤😊😩😱😔😶😰😢😙😊😻😓😈😈😳😚🙋😕😽😦😉🙍😋😞🙅🙊😇😴😱
-
-😲😋😉😦😙😉😖😴🙋😷😣😧😚😭😦😗😵🙉🙅😣🙂😵😢😯😊😄😹😙😌😵🙀🙄😾😘🙈🙍😐😨🙊
-
-😑😉😮😕😭😤😛😚😂😋😃😡😇😴😙😌😈😧😄🙆😲😗🙆😰😎😶🙈😟😙😢😘🙈😍😠😆😣😔😢😳
-
-😇😴😏😞😡😵😁😝😛😗🙇😈🙌😆😻😍🙂😕😇😴🙀😥😧😛🙉😣😥😗🙇😍🙃😞😑🙂😂🙎😃😋😛
-
-😡😫😑😨😗🙃😇😴😅😶😹🙌😏🙃🙈😘🙄😷😭😽😟😚😕😕😙😪🙄😎😊😋😟😁😠😖😴😚🙉😏🙀
-
-😧🙄😋😘🙈😯😯😛😮😦🙈🙇😕😾😑😣😶😦🙉😤😯😗😖😯😗😭😱😕😧😂😗😥🙎😛😘😥😐😣😅
-
-😇😵😔😨😽🙅😑😗😄😕😽😦😣😎😃😝🙁😹😕🙂😰😩😽😹🙉😫🙊😘🙃🙂😰🙍🙌😫😛🙆😗😱😷
-
-😝😘😼😠😧🙋😇😴😏😳😔😔🙁😫😑😇😵😔😨😽🙅😒😑😑😖😣🙅😉😓😵😢😱😁😇😴😙😒😂😿
-
-🙉😸😿😐😈😄😂🙍😡😭😇😃😗🙆🙁😷🙆😨😝🙌😓😖😣🙃😡😗😐😅🙁😲😗😶😊😻😞😼🙍😺😯
-
-😖😣🙄🙌🙆😼😗😌😦😇😳🙉🙈😮🙍😠😨😷😖😳😼😽🙅😩🙂😘😛😖😣🙄🙍😒😯😤😋😸😇😵😅
-
-🙄😚😑😞😠😙😖😄😆😳😯😗😣😿🙊😕😭😤😱😷😄😕😉😃😙😪😡🙀🙆😴😱😲😯😖😣🙅😉😓😸
-
-😅😓😼😇😴😏😳😕😰🙆😤😯😇😴😙😒😂😿🙉😋😽😕😂😼😦🙎😟🙂🙋😦😘😳🙀😴🙇🙊🙌😆🙊
-
-😗🙁😹😔🙄😸😰😼😬😇😳🙅😂😽😦😖🙆😒😕🙁😹😊😚😛🙈😖😿😖😴😻😓😴🙁🙄😥😵😕🙂😯
-
-😑😥🙅🙄😠🙃😙😋😄😜😥🙊😐😸😍😕😽😦😒🙁😥😬😥😽😕😱😽😌😠😔😞😥😑😕🙁😸🙃🙎😂
-
-🙀🙍😉😖😣🙃😡😔🙊😪😇😂😘🙃🙂🙁😓🙄😮😲😷😘😨😉😾🙁😗😪🙍😮😗😶😊😉😏😫😆🙈😪
-
-😘😨😈😚😟🙁🙄😸😠😇😵😅🙄😘😫😩😖😡😘😨😺😰😸😖😺😬🙀😘😸😊😑🙁😟😏😰😖😘😨😉
-
-😱😅😩😷😆😄😕😭😤😱😲😝😠😓😺😗😅🙉😇😾😐🙌😢😡😕🙁😫😾🙆😰🙀😎😻😕🙂🙄😔😞😯
-
-😧😐😅😃😌😪😟😑😚😋😡😂😘🙃🙂😧🙋😙😋😜😤😇😴😏😳😔😋😃😧😭😖😳😼🙇😿😸😐😭😚
-
-😙🙅🙌😃😢😸😷😝😩😘🙈😜😆😃😿🙉😫😸😘🙃🙂😬😪😤😵🙎🙃😗😥🙎😉😫😿😬😄😠😇😴😻
-
-😠🙀😽😝😐🙅😗🙆🙍😖🙁😿😆😵😒😇😵😅🙄😘😫😩😖😲😕😽😦😒🙁😥😬😥😽😖😤😊😘😏😧
-
-🙍😾🙋😘😨😉🙇🙁🙂😛🙉😖😇😵😅🙄😘😫😩😖😯😖😣🙄🙎😸😲😂😊😜😕😁😱😗🙋😺🙀🙊😛
-
-😗😑😳😮😜😞😽😋😅😕😂😼😦🙎😝😲😳😖😕😭😤😜🙃🙊😪😨😤😖😴😣😓🙃🙋😓😆😋😕😂😱
-
-😡😏😗😕🙂😦😇😴😶😣😄😔😛🙍😇😊😆😈😶😟😅🙀😞😁😇😲😔😗😬😆😇😋😈😖😣😱😛😃😿
-
-😑😱😒😙😚😏🙆😘😣😇😽🙆😙😥🙈😍😃😕😕😩😩😇😴😻😠😶😿🙄😎😤😕🙁😺😝😥😙😵😘😏
-
-😕😂😕😠🙆🙀😹🙀😘😘🙃🙂😭🙍😾😨😩😏😗😶😩😋😮😴🙄😟🙈😕🙍😨😜😐😴😇😵😉😕🙂😢
-
-😈😃🙆😴🙎😇😕😒🙋😸🙀🙋😪😳🙁😘😈😆😄🙅😔😫😗😉😇😴😏😳😔😋😃😝😹😕😼😈🙌😫🙍
-
-😓😅🙍😕😾😑😣😸😺😹😚🙇😗😑😳😮😜😞😽😋😍😕🙂😰😰😌😎😔😙😣😘😨😺😰😸😖😺😬🙀
-
-😇😴😊😧😯😞😄😎😵😃😅😓🐻🐻🐻🐻🐻🐻
diff --git a/basest/core/decode.py b/basest/core/decode.py
@@ -18,21 +18,24 @@ def decode_raw(input_base, output_base, input_ratio, output_ratio, input_data):
     base64 input would be in the range 0-63).
     """
     # create a 'workon' copy of the input data so we don't end up changing it
-    before = list(input_data)
+    input_workon = list(input_data)
     # count number of padding symbols
-    padding_length = before.count(input_base)
+    padding_length = input_workon.count(input_base)
     # now, replace all padding symbols with the maximmum symbol
     '''
     Explanation: This solution is for bases that don't match up exactly, given
     their chosen ratios. It was inspired by the same technique that is used in
     base85/ascii85 decoding and does not negatively impact 'perfect' aligning
     bases such as base64.
     '''
-    before = [(s if s != input_base else input_base - 1) for s in before]
+    input_workon = [
+        (s if s != input_base else input_base - 1) for s in input_workon
+    ]
     # use the encode_raw function to convert the data
     output_data = encode_raw(
         input_base=input_base, output_base=output_base,
-        input_ratio=input_ratio, output_ratio=output_ratio, input_data=before
+        input_ratio=input_ratio, output_ratio=output_ratio,
+        input_data=input_workon
     )
     # strip off the unnecessary padding symbols if there was padding
     [output_data.pop() for _ in range(padding_length)]
@@ -53,11 +56,14 @@ def decode(
     """
     # create workon copy of input data and convert symbols to raw ints
     # NOTE: input symbol table here includes the padding character
-    before = symbols_to_ints(input_data, input_symbol_table + [input_padding])
+    input_workon = symbols_to_ints(
+        input_data, input_symbol_table + [input_padding]
+    )
     # use decode_raw() to decode the data
     output_data = decode_raw(
         input_base=input_base, output_base=output_base,
-        input_ratio=input_ratio, output_ratio=output_ratio, input_data=before
+        input_ratio=input_ratio, output_ratio=output_ratio,
+        input_data=input_workon
     )
     # convert raw output data back to symbols using output symbol table
     return ints_to_symbols(output_data, output_symbol_table)
diff --git a/basest/core/encode.py b/basest/core/encode.py
@@ -7,6 +7,20 @@
 from .utils import ints_to_symbols, symbols_to_ints
 
 
+def _nearest_length(input_length, input_ratio):
+    """
+    Returns the nearest data length from the input data that is divisible by
+    the input ratio, using overlap if there is any.
+    """
+    # calculate the amount of overlap (if any)
+    overlap = input_length % input_ratio
+    # calculate the nearest input length that can contain our length
+    return (
+        input_length if overlap == 0
+        else ((((input_length - overlap) // input_ratio) + 1) * input_ratio)
+    )
+
+
 def encode_raw(input_base, output_base, input_ratio, output_ratio, input_data):
     """
     Given an input base, an output base, input ratio, output ratio and input
@@ -17,39 +31,25 @@ def encode_raw(input_base, output_base, input_ratio, output_ratio, input_data):
     output would be in the range 0-63).
     """
     # create a 'workon' copy of the input data so we don't end up changing it
-    before = list(input_data)
+    input_workon = list(input_data)
     # store length of input data for future reference
-    input_length = len(before)
-    # calculate the amount of overlap (if any)
-    overlap = input_length % input_ratio
-    '''
-    get the nearest data length from the input data that is divisible by
-    the input ratio, using overlap if there is any
-    '''
-    input_nearest_length = (
-        input_length if overlap == 0
-        else (
-            (
-                (
-                    (input_length - overlap) // input_ratio
-                ) + 1
-            ) * input_ratio
-        )
-    )
+    input_length = len(input_workon)
+    # get nearest data length that the input data fits in
+    input_nearest_length = _nearest_length(input_length, input_ratio)
     # calculate the amount of padding needed
     padding_length = (input_nearest_length - input_length)
     # get the output length, based on nearest divisible input length
     output_length = (input_nearest_length // input_ratio) * output_ratio
     # create a new list for the output data
     output_data = [0] * output_length
     # extend the input_data to the nearest divisible length (for padding)
-    before.extend([0] * padding_length)
+    input_workon.extend([0] * padding_length)
     # encode the data - store each group of input_ratio symbols in a number
     for i in range(0, input_nearest_length, input_ratio):
         store = 0
         for j in range(0, input_ratio):
             # store value of symbol
-            symbol = before[i + j]
+            symbol = input_workon[i + j]
             # upscale it if neccessary, in a little-endian manner
             symbol *= (input_base ** (input_ratio - j - 1))
             # add to store
@@ -58,15 +58,15 @@ def encode_raw(input_base, output_base, input_ratio, output_ratio, input_data):
         now that store contains the value of a number of symbols, separate this
         out to the output symbols
         '''
-        for j in range(0, output_ratio):
+        for k in range(0, output_ratio):
             # convert output array index
-            index = ((i // input_ratio) * output_ratio) + j
+            index = ((i // input_ratio) * output_ratio) + k
             # re-interpret the number in terms of output base
-            symbol = store // (output_base ** (output_ratio - j - 1))
+            symbol = store // (output_base ** (output_ratio - k - 1))
             # store at the calculated position
             output_data[index] = symbol
             # decrement the store variable, having now encoded part of it
-            store -= (symbol * (output_base ** (output_ratio - j - 1)))
+            store -= (symbol * (output_base ** (output_ratio - k - 1)))
     # set padding bytes to padding symbol, if needed
     for i in range(output_length - padding_length, output_length):
         output_data[i] = output_base
@@ -86,11 +86,12 @@ def encode(
     symbol.
     """
     # create workon copy of input data and convert symbols to raw ints
-    before = symbols_to_ints(input_data, input_symbol_table)
+    input_workon = symbols_to_ints(input_data, input_symbol_table)
     # use encode_raw() to encode the data
     output_data = encode_raw(
         input_base=input_base, output_base=output_base,
-        input_ratio=input_ratio, output_ratio=output_ratio, input_data=before
+        input_ratio=input_ratio, output_ratio=output_ratio,
+        input_data=input_workon
     )
     # convert raw output data back to symbols using output symbol table
     # NOTE: output symbol table here includes the padding character
diff --git a/setup.py b/setup.py
@@ -30,7 +30,7 @@ def retrieve_deps(filepath):
 
 setup(
     name='basest',
-    version='0.6.1',
+    version='0.6.2',
     description=(
         'Converts symbols from any number base to any other number base'
     ),