|
| 1 | +# jRLE |
| 2 | +## A Simple Run-Length Encoder |
| 3 | +Simple run-length encoder written in C++, written for an employment test. |
| 4 | + |
| 5 | +## How To Download And Use |
| 6 | +Download the latest release from [the releases tab](https://github.com/Trimatix/cpp-run-length-encoder/releases). |
| 7 | + |
| 8 | +To use, invoke jRLE.exe from the command line with the function as your first argument, and your file path as the second argument. |
| 9 | +- Function must be either `-e` for encoding, or `-d` for decoding. |
| 10 | +- File must have the extension ".txt" and use ASCII encoding. |
| 11 | + |
| 12 | +## Terminology |
| 13 | +### Tokens |
| 14 | +In this program, the term "token" refers to a description of a string containing one or more of a single character. |
| 15 | +Note: While unencoded and encoded tokens are generally distinguished, it is reasonable to encode an already encoded string. However, a string must be encoded at least once before it can be decoded. |
| 16 | + |
| 17 | ++ An example of an unencoded token ("dToken"): aaa |
| 18 | ++ An example of an encoded token ("eToken"): 3a |
| 19 | + |
| 20 | +### Encoding Format |
| 21 | +- For character sequences of length less than 9, the encoding is the character count followed by the character. |
| 22 | + |
| 23 | +E.g: "aaa" -> "3a" |
| 24 | + |
| 25 | +- For character sequences of length more than 9, the encoding is prefixed by a '#' character. |
| 26 | + |
| 27 | +E.g: "aaaaaaaaaa" -> "#10a" |
| 28 | + |
| 29 | +- For sequences of digit characters, the string is postfixed by a '#' character. |
| 30 | + |
| 31 | +E.g: "111" -> "31#" |
| 32 | + |
| 33 | +- For sequences of # characters, the string is postfixed by a '#' character. |
| 34 | + |
| 35 | +E.g: "###" -> "3##" |
| 36 | + |
| 37 | +### Long Sequences |
| 38 | +"Long sequence" refers to a string containing 10 or more of a single character. |
| 39 | + |
| 40 | ++ An example of an unencoded long sequence: aaaaaaaaaa |
| 41 | ++ An example of an encoded long sequence: #10a |
| 42 | + |
| 43 | +### #-Cases |
| 44 | +1. "#-case a" refers to the special case where a token is a long sequence |
| 45 | +2. "#-case b" where a token consists of '#' chars |
| 46 | +3. "#-case-c" where a token consists of a number |
| 47 | + |
| 48 | +#-case c is an issue, as the token must be separated from the next, lest the token be confused with the next one's char count. |
| 49 | + |
| 50 | + Written by Jasper Law 2020 |
| 51 | + |
| 52 | + https://github.com/Trimatix/cpp-run-length-encoder |
0 commit comments