Skip to content

Commit 3acedbf

Browse files
committed
tighten readme
1 parent 7355a4f commit 3acedbf

1 file changed

Lines changed: 77 additions & 149 deletions

File tree

README.md

Lines changed: 77 additions & 149 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,7 @@
77
# KEncode
88

99
**Compact, ASCII-safe encodings and ultra-small binary serialization for Kotlin,
10-
ideal for URLs, headers, file names, and other size-limited channels. Produces
11-
short and predictable payloads.**
10+
ideal for URLs, headers, file names, and other size-limited channels.**
1211

1312
![Maven Central](https://img.shields.io/maven-central/v/com.eignex/kencode.svg?label=Maven%20Central)
1413
![Build](https://github.com/eignex/kencode/actions/workflows/build.yml/badge.svg)
@@ -19,38 +18,28 @@ short and predictable payloads.**
1918
> with strict character or length limits such as URLs, file names, Kubernetes
2019
> labels, and log keys.
2120
22-
> It provides high-performance radix and base encoders, efficient integer
23-
> coding, optional checksums, and a compact bit-packed serializer for flat data
24-
> models.
21+
> It provides compact radix/base encoders, efficient integer coding, optional
22+
> checksums, and a bit-packed serializer for flat data models.
2523
2624
---
2725

2826
## Overview
2927

30-
KEncode provides **three focused entry points**, all aimed at producing compact,
31-
ASCII-safe representations:
28+
KEncode has **three focused entry points**, all aimed at compact, ASCII-safe
29+
representations:
3230

33-
1. **ByteEncoding codecs**: `Base62` / `Base36` / `Base64` / `Base85`
34-
Low-level encoders/decoders for `ByteArray` values.
35-
Useful when you already have binary data and only need an ASCII-safe
36-
representation.
31+
1. **ByteEncoding codecs**: `Base62` / `Base36` / `Base64` / `Base85`
32+
Low-level encoders/decoders for byte arrays when you already have binary
33+
data.
3734

38-
2. **Standalone BinaryFormat**: `PackedFormat`
39-
Produce compact binary payloads from Kotlin objects using
40-
`kotlinx.serialization` `BinaryFormat`.
41-
`PackedFormat` produces very small from flat structures using bitmasks,
42-
varints, but no object nesting. Use `kotlinx.serialization.ProtoBuf` instead
43-
when you need nested types, lists, or maps.
35+
2. **Standalone BinaryFormat**: `PackedFormat`
36+
Compact binary serialization for flat structures using bitmasks and
37+
varints.
38+
Use `kotlinx.serialization.ProtoBuf` instead when you need nesting, lists, or
39+
maps.
4440

45-
3. **Standalone StringFormat**: `EncodedFormat`
46-
Produce string payloads from Kotlin objects using `kotlinx.serialization`
47-
`StringFormat`.
48-
Encompasses a `BinaryFormat` + optional checksum + `ByteEncoding` text codec.
49-
Use when you want a single `encodeToString` / `decodeFromString` API that
50-
yields short, deterministic tokens.
51-
52-
KEncode focuses on minimal outputs; encrypt the payload first if it contains
53-
sensitive information.
41+
3. **Standalone StringFormat**: `EncodedFormat`
42+
Adds checksum + text encoding on top of a binary format, providing very s
5443

5544
---
5645

@@ -76,18 +65,14 @@ Minimal example using the default `EncodedFormat` (`Base62` + `PackedFormat`):
7665
```kotlin
7766
@Serializable
7867
data class Payload(
79-
@VarUInt
80-
val id: ULong, // varint
81-
82-
@VarInt
83-
val delta: Int, // zig-zag + varint
84-
85-
val urgent: Boolean, // joined to bitset
68+
@VarUInt val id: ULong, // varint
69+
@VarInt val delta: Int, // zig-zag + varint
70+
val urgent: Boolean, // joined to bitset
8671
val sensitive: Boolean,
8772
val external: Boolean,
88-
val handled: Instant?, // nullable, tracked via bitset
73+
val handled: Instant?, // nullable, tracked via bitset
8974

90-
val type: PayloadType // encoded as varint
75+
val type: PayloadType // encoded as varint
9176
)
9277

9378
enum class PayloadType { TYPE1, TYPE2, TYPE3 }
@@ -103,11 +88,8 @@ val payload = Payload(
10388
)
10489

10590
val encoded = EncodedFormat.encodeToString(payload)
106-
println(encoded)
10791
// Example: 0fiXYI (this specific payload fits in 4 raw bytes)
108-
10992
val decoded = EncodedFormat.decodeFromString<Payload>(encoded)
110-
assert(payload == decoded)
11193
```
11294

11395
---
@@ -117,25 +99,13 @@ assert(payload == decoded)
11799
You can use the encoders standalone on raw byte arrays.
118100

119101
```kotlin
120-
121102
val bytes = "any byte data".encodeToByteArray()
103+
println(Base36.encode(bytes)) // 0ksef5o4kvegb70nre15t
104+
println(Base62.encode(bytes)) // 2BVj6VHhfNlsGmoMQF
105+
println(Base64.encode(bytes)) // YW55IGJ5dGUgZGF0YQ==
106+
println(Base85.encode(bytes)) // @;^?5@X3',+Cno&@/
122107

123-
println(Base36.encode(bytes))
124-
// 0ksef5o4kvegb70nre15t
125-
126-
println(Base62.encode(bytes))
127-
// 2BVj6VHhfNlsGmoMQF
128-
129-
println(Base64.encode(bytes))
130-
// YW55IGJ5dGUgZGF0YQ==
131-
132-
println(Base85.encode(bytes))
133-
// @;^?5@X3',+Cno&@/
134-
```
135-
136-
Decoding is symmetric:
137-
138-
```kotlin
108+
// Decoding is symmetric:
139109
val back = Base62.decode("2BVj6VHhfNlsGmoMQF")
140110
```
141111

@@ -150,20 +120,13 @@ binary format and still get compact, ASCII-safe strings:
150120
@Serializable
151121
data class ProtoBufRequired(val map: Map<String, Int>)
152122

153-
val payload = ProtoBufRequired(
154-
mapOf("k1" to 1285, "k2" to 9681)
155-
)
156-
123+
val payload = ProtoBufRequired(mapOf("k1" to 1285, "k2" to 9681))
157124
val format = EncodedFormat(binaryFormat = ProtoBuf)
158-
159-
val encoded = format.encodeToString(payload)
160-
println(encoded)
161-
// 05cAKYGWf6gBgtZVpkqPEWOYH
162-
125+
val encoded = format.encodeToString(payload) // 05cAKYGWf6gBgtZVpkqPEWOYH
163126
val decoded = format.decodeFromString<ProtoBufRequired>(encoded)
164127
```
165128

166-
This example relies on kotlinx protobuf implementation, which you install:
129+
This example relies on kotlinx protobuf, which you install:
167130

168131
```kotlin
169132
implementation("org.jetbrains.kotlinx:kotlinx-serialization-protobuf:1.9.0")
@@ -173,7 +136,7 @@ implementation("org.jetbrains.kotlinx:kotlinx-serialization-protobuf:1.9.0")
173136

174137
## Encryption
175138

176-
Typical pattern when you need confidentiality:
139+
Typical confidential payload pattern:
177140

178141
1. Serialize (PackedFormat or ProtoBuf).
179142
2. Encrypt with your crypto library.
@@ -216,7 +179,6 @@ cipher.init(Cipher.DECRYPT_MODE, key, IvParameterSpec(iv8received))
216179
val decoded = Base62.decode(encoded)
217180
val decrypted = cipher.doFinal(received)
218181
val result: SensitiveData = PackedFormat.decodeFromByteArray(decrypted)
219-
println(result)
220182
```
221183

222184
---
@@ -227,116 +189,88 @@ You can add a CRC checksum to an `EncodedFormat`. On decode, a mismatch
227189
throws, so you get a simple integrity check on the serialized payload.
228190

229191
```kotlin
230-
231192
@Serializable
232193
data class Command(val id: Int, val payload: String)
233194

234-
val format = EncodedFormat(
235-
checksum = Crc32, // or Crc16
236-
)
237-
238-
val original = Command(42, "restart-worker")
239-
val encoded = format.encodeToString(original)
240-
println(encoded)
241-
242-
// Tampering will fail:
243-
// val corrupted = encoded.dropLast(1) + "x"
244-
// format.decodeFromString<Command>(corrupted) // throws "Checksum mismatch."
245-
195+
val format = EncodedFormat(checksum = Crc32)
196+
val encoded = format.encodeToString(Command(42, "restart-worker"))
246197
val decoded = format.decodeFromString<Command>(encoded)
247198
```
248199

249200
---
250201

251202
## PackedFormat explanation
252203

253-
`PackedFormat` is a `BinaryFormat` optimized for small, flat structures:
254-
255-
* No nested objects, lists, or maps.
256-
* Booleans and nullability encoded as bitmasks.
257-
* Optional varint / zig-zag for `Int`/`Long` via annotations.
204+
`PackedFormat` is a `BinaryFormat` designed to produce very small payloads for *
205+
*flat** Kotlin data classes. It avoids nesting and collections, allowing a
206+
compact and deterministic binary layout.
258207

259208
### Field layout
260209

261-
For a single class:
210+
For a single data class, the encoding consists of:
262211

263-
1. **Flags varlong**:
212+
1. **Flags varlong**
264213

265-
* First `N` bits for booleans (in declaration order).
266-
* Next `M` bits for nullable fields (in declaration order).
267-
* Boolean bit = `true`/`false`.
268-
Nullable bit = `1` means `null`, `0` means non-null.
214+
A single varlong encodes:
215+
* **Boolean bits** — one per boolean property, in declaration order
216+
(`1` = true, `0` = false)
217+
* **Nullability bits** — one per nullable property
218+
(`1` = null, `0` = non-null)
269219

270-
2. **Payload bytes** (non-boolean fields only, in declaration order):
220+
2. **Payload bytes**
271221

272-
* Fixed-size primitives: `Byte`, `Short`, `Int`, `Long`, `Float`, `Double`.
273-
* `String`: `[varint length][UTF-8 bytes]`.
274-
* `Char`: UTF-8 encoding of a single `Char`.
275-
* Enum: varint ordinal.
276-
* Nullable fields:
277-
* If null: only the null-bit is set; no payload bytes.
278-
* If non-null: encoded exactly like the non-null case.
222+
After the flags, each non-boolean field is encoded in declaration order:
279223

280-
Top-level nullable values are encoded with one varlong flag:
224+
* Fixed primitives (`Byte`, `Short`, `Int`, `Long`, `Float`, `Double`)
225+
* `String`: `[varint length][UTF-8 bytes]`
226+
* `Char`: UTF-8 encoding
227+
* `Enum`: ordinal as varint
228+
* Nullable fields omit payload bytes when null
281229

282-
* Bit 0 = `0` → non-null (value follows).
283-
* Bit 0 = `1` → null (no payload).
230+
A top-level nullable value is encoded with a single bit: `1` = null, `0` =
231+
present.
284232

285233
### VarInt / VarUInt annotations
286234

287-
Varint support is opt-in to keep fixed-width behavior as default:
235+
Use varint-style encodings for compact integer fields:
288236

289237
```kotlin
290238
@Serializable
291239
data class Counters(
292-
@VarUInt val seq: Long, // good for monotonically increasing IDs
293-
@VarInt val delta: Int // good for small positive/negative changes
240+
@VarUInt val seq: Long, // unsigned varint
241+
@VarInt val delta: Int // zig-zag + varint
294242
)
295243
```
296244

297-
Internally:
298-
299-
* `@VarUInt` uses an unsigned varint.
300-
* `@VarInt` uses zig-zag + varint, so small negative numbers are compact.
301-
302-
### Limitations
303-
304-
If you need:
305-
306-
* Nested objects
307-
* Lists / arrays / maps
308-
* Polymorphism
309-
310-
then use `ProtoBuf` or another `BinaryFormat` with `EncodedFormat`.
311-
312245
---
313246

314247
## EncodedFormat explanation
315248

316-
`EncodedFormat` composes three concerns:
249+
`EncodedFormat` provides a single `StringFormat` API that produces short,
250+
ASCII-safe tokens by composing three layers:
317251

318-
1. **Binary format** (`BinaryFormat`):
319-
* Default is `PackedFormat`.
320-
* Can be any `BinaryFormat` (ProtoBuf, CBOR, etc.).
252+
1. **Binary format**
253+
Default is `PackedFormat`, but any `BinaryFormat` (e.g. ProtoBuf) can be used.
321254

322-
2. **Checksum** (`Checksum?`):
323-
* Optional CRC-16 or CRC-32, or a custom implementation.
324-
* Appended to the binary payload and verified on decode.
255+
2. **Checksum (optional)**
256+
Supports `Crc16`, `Crc32`, or a custom implementation.
257+
The checksum is appended to the binary payload and verified during decode.
325258

326-
3. **Text codec** (`ByteEncoding`):
327-
* Base62 by default, but you can swap `Base36`, `Base64`, `Base64UrlSafe`,
328-
`Base85`, or custom.
259+
3. **Text codec**
260+
Converts the final bytes into a compact ASCII representation.
261+
Default is `Base62`, with alternatives such as `Base36`, `Base64`,
262+
`Base64UrlSafe`, `Base85`, or custom alphabets.
329263

330-
Typical customization:
264+
This makes it easy to generate stable, compact identifiers suitable for URLs,
265+
headers, filenames, cookies, and system labels.
331266

332267
```kotlin
333-
334268
@Serializable
335269
data class Event(val id: Long, val name: String)
336270

337271
val format = EncodedFormat(
338-
codec = Base36, // file-name friendly
339-
checksum = Crc16, // short checksum
272+
codec = Base36,
273+
checksum = Crc16,
340274
binaryFormat = ProtoBuf
341275
)
342276

@@ -348,25 +282,19 @@ val back = format.decodeFromString<Event>(token)
348282

349283
## Base encoders
350284

351-
KEncode does not intend to support ALL encoding variants, just the useful ones.
352-
They are the standard Base64, compact Base85, and alphanumeric only Base36/62.
353-
For all the implementations you can customize the alphabet if needed.
285+
KEncode includes a focused set of practical ASCII-safe encoders: `Base36`,
286+
`Base62`, `Base64`, and `Base85`. All implementations allow custom alphabets.
354287

355288
### Base64 and URL-safe Base64
356-
357289
* RFC 4648–compatible.
358-
* 3 input bytes → 4 characters, with `=` padding.
359-
* URL-safe variant (`Base64UrlSafe`) uses `-` and `_` instead of `+` and `/`.
360-
361-
### Base85 (ASCII85 / Z85-style)
290+
* 3 bytes → 4 characters (`=` padding).
291+
* URL-safe variant substitutes `-` and `_`.
362292

293+
### Base85
363294
* 4 bytes → 5 characters.
364-
* Supports final partial group (1–3 bytes → 2–4 chars).
365-
* No delimiters (`<~ ~>`) and no `z` compression.
295+
* Supports partial final groups (1–3 bytes).
296+
* No delimiters or `z` compression.
366297

367298
### Base36 / Base62 / custom alphabets
368-
369-
Backed by `BaseRadix`, these encoders operate in fixed-size blocks with
370-
deterministic lengths for safe decoding. No padding is used. A naïve
371-
implementation without blocks is simpler but has an `O(n^2)` run time, where `n`
372-
is the length of the bytes to encode.
299+
Built on `BaseRadix`, these encoders use fixed-size blocks for predictable
300+
lengths and safe decoding, without padding. Custom alphabets are supported.

0 commit comments

Comments
 (0)