Skip to content

Commit 8fe5c0d

Browse files
authored
Merge pull request #8 from CenterForSecureEnergyInformatics/readme-markdown
Converted documentation to Markdown
2 parents 41c00d8 + 0d4194e commit 8fe5c0d

File tree

7 files changed

+87
-59
lines changed

7 files changed

+87
-59
lines changed
Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,18 @@
1+
Overview
2+
---
3+
14
DCCLI is a command line application which allows compressing and decompressing (referred to as encoding and decoding henceforth) files using DCLib.
25

3-
Usage: <input file> <output file> <list of encoders/decoders with options>
4-
The list of encoders/decoders is separated by a separate #. Each encoder/decoder must specify either 'encode' or 'decode', followed by the encoder/decoder name. Options can be specified separately after that. They affect only encoder/decoder that precedes them in the command line. Options are specified as <name>=<value> or <name> for boolean options.
5-
Example: input.dat output.dat encode copy # decode copy blocksize=8
6+
Usage: `<input file> <output file> <list of encoders/decoders with options>`
7+
8+
The list of encoders/decoders is separated by a separate #. Each encoder/decoder must specify either `encode` or `decode`, followed by the encoder/decoder name. Options can be specified separately after that. They affect only encoder/decoder that precedes them in the command line. Options are specified as `<name>=<value>` or `<name>` for boolean options.
9+
10+
Example: `input.dat output.dat encode copy # decode copy blocksize=8`
11+
12+
Notes on usage
13+
---
614

7-
Notes on usage:
8-
* If a fractional number of bytes (i.e., a number of bits not divisible by eight) is written to the output file, decoding said output file later may lead to errors at the last byte when processing the superfluous bits at the end of the file
9-
* When using only one encoder/decoder, data read from the input file is processed and written directly (buffered) to the output file, requiring no additional memory. If, however, multiple encoders/decoders are used, data read from the input file is processed and written to a temporary buffer. For all but the last encoder/decoder, data is read from this temporary buffer, processed and written to another temporary buffer. For the last encoder/decoder, data from this temporary buffer is read, processed and written to the output file. Since all data is processed by one encoder/decoder after another, all intermediate data will be held in the described temporary buffers. Processing large files can therefore lead to high memory consumption
10-
* The size of the temporary buffers described above may be reduced at compile-time via TEMP_BUFFER_SIZE. However, since the buffers resize themselves automatically, TEMP_BUFFER_SIZE is only their initial size, which is no indicator of the acutal memory consumption when processing larger files with more than one encoder/decoder
11-
* The size of the input and output file buffers may be reduced at compile-time via READ_BUFFER_SIZE and WRITE_BUFFER_SIZE. Both are guaranteed to remain unchanged throughout the execution of the program
15+
* If a fractional number of bytes (i.e., a number of bits not divisible by eight) is written to the output file, decoding said output file later may lead to errors at the last byte when processing the superfluous bits at the end of the file.
16+
* When using only one encoder/decoder, data read from the input file is processed and written directly (buffered) to the output file, requiring no additional memory. If, however, multiple encoders/decoders are used, data read from the input file is processed and written to a temporary buffer. For all but the last encoder/decoder, data is read from this temporary buffer, processed and written to another temporary buffer. For the last encoder/decoder, data from this temporary buffer is read, processed and written to the output file. Since all data is processed by one encoder/decoder after another, all intermediate data will be held in the described temporary buffers. Processing large files can therefore lead to high memory consumption.
17+
* The size of the temporary buffers described above may be reduced at compile-time via `TEMP_BUFFER_SIZE`. However, since the buffers resize themselves automatically, `TEMP_BUFFER_SIZE` is only their initial size, which is no indicator of the acutal memory consumption when processing larger files with more than one encoder/decoder
18+
* The size of the input and output file buffers may be reduced at compile-time via `READ_BUFFER_SIZE` and `WRITE_BUFFER_SIZE`. Both are guaranteed to remain unchanged throughout the execution of the program.

DataCompressor/DCIOLib/doc/overview.txt

Lines changed: 0 additions & 12 deletions
This file was deleted.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
Overview
2+
---
3+
4+
DCIOLib is a library which allows performing bit-wise reading and writing operations on buffers which are linked to either files or memory.
5+
6+
`buffer.h` (`buffer_t`): A buffer implementation for byte-wise reading and writing operations. It can be resized, if necessary, while retaining the old data. Life cycle: `AllocateBuffer` -> `InitBuffer` -> (read, write or other operations) -> `UninitBuffer` -> `FreeBuffer`.
7+
8+
`file_buffer.h` (`file_buffer_t`): A buffer implementation for byte-wise reading and writing operations on files. It wraps a `buffer_t` and can thus also be used to read or write in memory. It is possible to switch between reading and writing. Life cycle: `AllocateFileBuffer` -> `InitFileBuffer` with an opened file or `InitFileBufferInMemory` -> (read, write or other operations) -> `UninitFileBuffer` -> `FreeFileBuffer`.
9+
10+
`bit_file_buffer.h` (`bit_file_buffer_t`): A buffer implementation for bit-wise reading and writing on files or in memory. It provides single-bit and constant-bit-size read/write access. It uses uses a `file_buffer_t` which needs to be initialized and uninitialized separately. It is possible to switch from writing to reading; the opposite way is not supported. Life cycle: `AllocateBitFileBuffer` -> `InitBitFileBuffer` with an initialized `file_buffer_t` instance -> (read, write or other operations) -> `UninitBitFileBuffer` -> `FreeBitFileBuffer`.
11+
12+
Notes on usage
13+
---
14+
15+
* Although `bit_file_buffer_t` cannot be changed from reading mode back to writing mode, it is possible to reset the buffer, which discards buffered data.
16+
* When `file_buffer_t` is used to write to memory, the underlying buffer will be automatically resized when it is too small.
17+
* `file_buffer_t` and `bit_file_buffer_t` flush contents automatically when they are uninitialized. To do so before uninitializing, an explicit flush operation is required.

DataCompressor/DCLib/doc/overview.txt

Lines changed: 0 additions & 27 deletions
This file was deleted.

DataCompressor/DCLib/doc/readme.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
Overview
2+
---
3+
4+
DCLib is a library which allows compressing and decompressing (referred to as encoding and decoding henceforth) data from buffers (see DCIOLib).
5+
6+
`enc_dec.h`: Allows listing and using all implemented encoders/decoders as well as their options. Life cycle: `GetEncoder` -> (optional option configuration, see below) -> `enc_dec_t.encoder` (for encoding) or `enc_dec_t.decoder` (for decoding) call on initialized input and output bit buffers. Optional option configuration: (optional) `OptionNameExists` -> (optional) `EncoderSupportsOption` -> `GetOptionType` -> `GetAllowedOptionValueRange` -> `SetOptionValue<Type>`.
7+
8+
Encoders/decoders
9+
---
10+
11+
* aggregate: Sums of `num_values` (option name) consecutive floating-point values (no decoder!).
12+
* bac: Performs binary arithmetic coding as implemented by Witten et al.
13+
* copy: Copies the input to the output, i.e., it performs no compression whatsoever. This encoder/decoder operates on blocks of `blocksize` (option name) bits size.
14+
* csv: Reads lines of comma-separated values and converts the strings in column number `column` (option name) of each line to a list of (binary) floating-point values when encoding; performs the reverse conversion when decoding and inserts blank columns if necessary.
15+
* diff: Encodes (signed) differences between consecutive (unsigned) values of `valuesize` (option name) bits size when encoding; reconstructs (unsigned) values of `valuesize` (option name) bits size from their consecutive (signed) differences when decoding
16+
* lzmh: Performs LZMH coding and decoding from Ringwelski et al. This is an integrated third-party implementation.
17+
* normalize: Converts floating-point values to (signed) integer values of `valuesize` (option name) bits size when encoding; performs the reverse conversion when decoding. To preserve decimal places after the decimal point, all values are multiplied by `normalization_factor` (option name) when encoding, and divided when decoding.
18+
* seg: Creates Exponential Golomb code words from values when encoding; reconstructs Exponential Golomb code words when decoding. All values are `valuesize` (option name) bits in size and signed.
19+
20+
Supported encoder input and output formats
21+
---
22+
23+
Note: Decoder input and formats are reversed, if there is a decoder).
24+
25+
* aggregate: binary float in, binary float out
26+
* bac: arbitrary in, binary out
27+
* copy: arbitrary in, arbitrary out
28+
* csv: ASCII float in, binary float out
29+
* diff: unsigned int in, signed int out
30+
* lzmh: ASCII float in, binary out
31+
* normalize: float in, signed int out
32+
* seg: signed int in, binary out
33+
34+
Notes on usage
35+
---
36+
37+
* GetEncoderNames requires a `char*` array with `GetNumberOfEncoders` fields.
38+
* When adding or renaming encoders/decoders or options, make sure the arrays remain sorted by name. Otherwise, the find operations will not work as expected.

DataCompressor/common/doc/overview.txt

Lines changed: 0 additions & 12 deletions
This file was deleted.
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
Overview
2+
---
3+
4+
common is a collection of headers for other libraries
5+
6+
`log.h`: Provides a printf-style logging macro
7+
8+
`io.h`: Provides types for file I/O as well as ftell and fopen macros (for 64-bit file I/O on platform supports it)
9+
10+
`err_codes.h`: Provides constants for common errors
11+
12+
Notes on usage
13+
---
14+
15+
* `IO_SIZE_BITS` specifies the number of bits used for file-I/O-related operations. In particular, the size of return values for Read/Write functions in dependent libraries are based on it.
16+
* If `IO_SIZE_BITS` is the same size as size_t, the Read/Write functions in dependent libraries do not work properly if the MSB of a size_t variable specifying the size to be read/written is used. For example, if `IO_SIZE_BITS` is 32 and `sizeof(size_t)` is 4, the maximum size (parameter value) that the Read/Write function can work with is `2^31 - 1`, i.e., the 32nd bit cannot be used. If it is used, the return value of the functions will be interpreted as an error (since it is interpreted as a negative number).
17+
* Error codes have to be negative in order to distinguish them from return values which signal the amount of bytes read/written (which is positive).

0 commit comments

Comments
 (0)