Skip to content

Commit e673775

Browse files
LessUpCopilot
andcommitted
docs: add ordered architecture deepening design
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent cbdfd7c commit e673775

1 file changed

Lines changed: 305 additions & 0 deletions

File tree

Lines changed: 305 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,305 @@
1+
# Ordered Architecture Deepening Design
2+
3+
## Summary
4+
5+
This design decomposes the current architecture work into five ordered, contract-preserving phases:
6+
7+
1. Shared Buffer Layer policy module
8+
2. Streaming Layer wrapper module
9+
3. Frequency Table module
10+
4. Magic Number and header parsing module
11+
5. Writer adapter module
12+
13+
The goal is to increase **depth** at the existing **seams** without changing CompressKit's public CLI semantics, binary formats, security limits, or cross-language behavior.
14+
15+
## Context
16+
17+
CompressKit currently exposes the right top-level layers:
18+
19+
- CLI Entry Point
20+
- Buffer Layer
21+
- Streaming Layer
22+
- Algorithm Core
23+
- Shared Utilities
24+
25+
The friction is inside several shallow modules where the public **interface** is small, but important policy is duplicated or leaked across nearby callers. The strongest examples are:
26+
27+
- Buffer growth and `BUF_TOO_SMALL` retry semantics split across Go and Rust Buffer Layer implementations
28+
- Repeated Streaming Layer wrappers around `BufferedEncoder` and `BufferedDecoder`
29+
- Frequency Table logic split between Shared Utilities and Algorithm Core modules
30+
- Magic Number and header validation repeated across algorithms and languages
31+
- Writer adapters that also own output-buffer policy
32+
33+
## Constraints
34+
35+
- Preserve the unified CLI shape: `<binary> <encode|decode> <input> <output>`
36+
- Preserve current binary formats and cross-language compatibility
37+
- Preserve Streaming Layer and Buffer Layer public behavior required by:
38+
- `openspec/specs/core-architecture/spec.md`
39+
- `openspec/specs/encoding-project/spec.md`
40+
- `openspec/specs/cross-language-testing/spec.md`
41+
- Preserve security limits: 4 GiB max input, 1 GiB max decoded output
42+
- Keep error messages in English
43+
- Avoid OpenSpec changes by keeping all public contracts stable
44+
45+
## Approaches Considered
46+
47+
### Approach A: Big-bang refactor across all five modules
48+
49+
Refactor all five shallow areas in one pass, then repair tests and conformance issues at the end.
50+
51+
**Pros**
52+
53+
- Fastest path to the target architecture on paper
54+
- Can shape the final implementation holistically
55+
56+
**Cons**
57+
58+
- High risk of cross-language drift
59+
- Large review surface
60+
- Poor fault isolation when conformance fails
61+
- Hard to prove which phase introduced a behavior change
62+
63+
### Approach B: Ordered phase-by-phase deepening at existing seams
64+
65+
Deepen one module cluster at a time, starting with the Buffer Layer policy and finishing with the Writer adapter after the lower-level policy is concentrated.
66+
67+
**Pros**
68+
69+
- Best locality for review and debugging
70+
- Each phase can preserve the existing interface while improving implementation shape
71+
- Easier to validate against existing test and conformance gates
72+
- Minimizes risk of accidental public API drift
73+
74+
**Cons**
75+
76+
- Requires temporary coexistence of old and new internal paths during each phase
77+
- Takes longer than a big-bang change
78+
79+
**Recommendation:** choose this approach.
80+
81+
### Approach C: Start from algorithm-specific modules and generalize later
82+
83+
Refactor Huffman, Arithmetic, Range Coder, and RLE one by one, then extract common modules afterward.
84+
85+
**Pros**
86+
87+
- Keeps each algorithm change small
88+
- Fits teams that work algorithm-by-algorithm
89+
90+
**Cons**
91+
92+
- Repeats the same design work four times
93+
- Encourages shallow abstractions because the shared seam is postponed
94+
- Weak leverage until the final extraction
95+
96+
## Recommended Design
97+
98+
### Architecture principle
99+
100+
Each phase deepens an existing module by moving policy-heavy implementation behind a smaller, more stable **interface**. The rule is to concentrate behavior where the deletion test is strongest: deleting the deepened module should make complexity reappear across many callers.
101+
102+
### Phase 1: Shared Buffer Layer policy module
103+
104+
**Scope**
105+
106+
- Go: `algorithms/shared/go/codec/buffer.go`, `buffer_loop.go`, nearby callers such as `files.go`
107+
- Rust: `algorithms/shared/rust/src/codec/buffer.rs`
108+
109+
**Problem**
110+
111+
The Buffer Layer public interface is already correct, but the implementation of retry loops, growth policy, output limits, and `BUF_TOO_SMALL` transactionality is duplicated per language and partially reused by adjacent modules.
112+
113+
**Design**
114+
115+
- Concentrate full-buffer orchestration into one deeper internal module per language
116+
- Put these concerns behind the Buffer Layer seam:
117+
- initial buffer sizing
118+
- growth policy
119+
- retry loop for `process()` and `finish()`
120+
- size-limit enforcement
121+
- preservation of already-written bytes across retries
122+
- Keep the external Buffer Layer entry points unchanged
123+
124+
**Dependency category**
125+
126+
In-process only
127+
128+
**Why first**
129+
130+
Phase 5 depends on this policy. Starting here avoids re-solving the same output-buffer problem later in the Writer adapter.
131+
132+
### Phase 2: Streaming Layer wrapper module
133+
134+
**Scope**
135+
136+
- Go algorithm wrapper modules under `algorithms/*/go/streaming.go`
137+
- Parallel Rust wrapper entry points built on `BufferedEncoder` and `BufferedDecoder`
138+
139+
**Problem**
140+
141+
The wrappers are shallow modules that mainly wire Algorithm Core functions into shared streaming helpers. Understanding the Streaming Layer requires bouncing between repeated wrapper files and the shared implementation.
142+
143+
**Design**
144+
145+
- Move wrapper construction into a deeper shared module per language
146+
- Let algorithm modules provide only the Algorithm Core encode/decode hook
147+
- Keep algorithm-facing constructor names and Streaming Layer behavior stable
148+
149+
**Dependency category**
150+
151+
In-process only
152+
153+
**Why second**
154+
155+
This phase builds directly on the stronger shared buffering/story from Phase 1 and reduces repeated wrapper code before touching shared binary-format helpers.
156+
157+
### Phase 3: Frequency Table module
158+
159+
**Scope**
160+
161+
- Shared Utilities for Frequency Table behavior
162+
- Static-model algorithms: Huffman, Arithmetic, Range Coder
163+
164+
**Problem**
165+
166+
Frequency Table build, scaling, cumulative-table generation, and serialization are spread across Shared Utilities and Algorithm Core modules. The seam leaks binary-format knowledge and scaling policy.
167+
168+
**Design**
169+
170+
- Introduce one deep Frequency Table module per language for static-model algorithms
171+
- Concentrate:
172+
- symbol counting
173+
- EOF insertion
174+
- scaling policy
175+
- cumulative-table construction
176+
- serialization and deserialization
177+
- Leave algorithm-specific coding logic in Algorithm Core modules
178+
179+
**Dependency category**
180+
181+
In-process only
182+
183+
**Why third**
184+
185+
This is the first phase that materially touches shared binary-format helpers. By this point, the Buffer Layer and Streaming Layer refactors should already have reduced surrounding noise.
186+
187+
### Phase 4: Magic Number and header parsing module
188+
189+
**Scope**
190+
191+
- Decode-time format identification and header reads in Go, Rust, and C++
192+
193+
**Problem**
194+
195+
Magic Number validation and basic header parsing are scattered across many decode implementations with inconsistent error wording and duplicated structure.
196+
197+
**Design**
198+
199+
- Introduce one dedicated format/header module per language
200+
- Concentrate:
201+
- Magic Number validation
202+
- truncated-header detection
203+
- basic header shape checks
204+
- consistent mapping to existing error kinds
205+
- Keep algorithm-specific payload decoding in the Algorithm Core module
206+
207+
**Dependency category**
208+
209+
In-process only
210+
211+
**Why fourth**
212+
213+
It depends on the shared format concepts clarified during Phase 3, but should remain separate because format identification is a distinct seam from Frequency Table policy.
214+
215+
### Phase 5: Writer adapter module
216+
217+
**Scope**
218+
219+
- Go: `algorithms/shared/go/codec/writer.go`
220+
- Rust: `algorithms/shared/rust/src/codec/write.rs`
221+
222+
**Problem**
223+
224+
The Writer adapter is supposed to bridge an `Encoder` to an output writer, but it also owns buffer growth and retry policy. That makes the adapter shallow at its transport role and too deep in the wrong place.
225+
226+
**Design**
227+
228+
- Slim the Writer adapter down to transport responsibilities
229+
- Reuse the policy concentrated in Phase 1 for output-buffer handling
230+
- Keep the Writer adapter interface stable
231+
232+
**Dependency category**
233+
234+
In-process only
235+
236+
**Why last**
237+
238+
This phase should consume the deeper Buffer Layer policy rather than invent its own policy. Doing it earlier would either duplicate design work or create a premature seam.
239+
240+
## Resulting Module Shape
241+
242+
After all five phases, the high-level call flow remains:
243+
244+
`CLI Entry Point -> Buffer Layer -> Streaming Layer -> Algorithm Core`
245+
246+
The difference is that more policy sits behind deeper shared modules:
247+
248+
- Buffer Layer policy becomes one concentrated module per language
249+
- Streaming Layer wrappers become shared construction logic instead of repeated glue
250+
- Frequency Table behavior becomes a coherent static-model module
251+
- Magic Number and header checks become a dedicated format module
252+
- Writer adapter becomes a thin adapter over already-deep buffer policy
253+
254+
## Error Handling
255+
256+
- Preserve existing error kinds and security-limit behavior
257+
- Keep `BUF_TOO_SMALL` transactional
258+
- Keep decode-side truncation/corruption reporting aligned with current contract
259+
- Normalize error paths only when they map to existing semantics; do not invent new public error categories
260+
261+
## Testing Strategy
262+
263+
### Phase gates
264+
265+
- After every phase: `make test` and `make lint`
266+
- After Phases 3 and 4: also run `make test-conformance`
267+
268+
### Test focus by phase
269+
270+
- **Phase 1:** strengthen Buffer Layer interface tests around retries, preserved bytes, growth stops, and limit enforcement
271+
- **Phase 2:** keep Streaming Layer tests focused on constructor behavior and shared lifecycle semantics, not duplicated wrapper internals
272+
- **Phase 3:** add focused Frequency Table tests for scaling, EOF handling, cumulative tables, and serialization invariants
273+
- **Phase 4:** add focused header parsing tests for bad magic, truncation, and corrupt header cases
274+
- **Phase 5:** test the Writer adapter through its own interface while reusing Phase 1 policy coverage
275+
276+
### Cross-language stance
277+
278+
- Public behavior must stay aligned across C++17, Go, and Rust
279+
- Internal module shapes may differ by language as long as the seam and observable behavior remain consistent
280+
281+
## Risks and Mitigations
282+
283+
### Risk: accidental public interface drift
284+
285+
**Mitigation:** keep all public constructor names, Buffer Layer functions, and CLI behavior unchanged.
286+
287+
### Risk: cross-language divergence
288+
289+
**Mitigation:** keep each phase contract-preserving and use conformance gates when touching shared binary-format behavior.
290+
291+
### Risk: creating new shallow pass-through modules
292+
293+
**Mitigation:** apply the deletion test at every extraction. If deleting a module only removes indirection, do not keep it.
294+
295+
## Non-goals
296+
297+
- Changing binary formats
298+
- Changing CLI semantics
299+
- Introducing new algorithms
300+
- Reworking Range Coder large-file performance
301+
- Adding speculative abstraction across languages
302+
303+
## Decision
304+
305+
Proceed with **Approach B**: one contract-preserving deepening phase at a time, in the order listed above.

0 commit comments

Comments
 (0)