Skip to content

Commit 5c80c27

Browse files
committed
refactor: migrate to OpenSpec framework
- Replace custom SDD with OpenSpec spec-driven development - Convert specs/ to openspec/specs/ with OpenSpec format - Requirements now use SHALL/MUST keywords with Scenario blocks - Rewrite AGENTS.md for OpenSpec workflow (/opsx:propose, /opsx:apply, /opsx:archive) - Add Makefile targets for OpenSpec commands (spec-init, spec-list, spec-status) - Generate AI tool integration for Claude Code and Cursor - Archive legacy specs/ to .archive/specs-legacy-20260423/ Breaking change: specs/ directory moved to openspec/specs/
1 parent e896446 commit 5c80c27

27 files changed

Lines changed: 3353 additions & 38 deletions

File tree

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Specifications
2+
3+
This directory contains all specification documents for the Encoding project. The project follows **Spec-Driven Development (SDD)**, meaning all code implementations must be based on these specs as the Single Source of Truth.
4+
5+
## Directory Structure
6+
7+
```
8+
specs/
9+
├── product/ # Product requirements (PRD)
10+
│ └── encoding-project.md
11+
├── rfc/ # Technical design documents (RFCs)
12+
│ └── 0001-core-architecture.md
13+
├── api/ # API interface definitions
14+
│ └── README.md # (Not applicable for this CLI project)
15+
├── db/ # Database schemas
16+
│ └── README.md # (Not applicable for this CLI project)
17+
└── testing/ # Test specifications
18+
└── cross-language.md
19+
```
20+
21+
| Directory | Purpose | Status |
22+
|-----------|---------|--------|
23+
| `product/` | Product feature definitions and acceptance criteria | ✅ Active |
24+
| `rfc/` | Technical design documents (architecture, patterns, decisions) | ✅ Active |
25+
| `api/` | API interface definitions (OpenAPI, GraphQL schemas) | ⚪ N/A |
26+
| `db/` | Database model definitions | ⚪ N/A |
27+
| `testing/` | Test specifications and cross-language verification rules | ✅ Active |
28+
29+
## Current Specs
30+
31+
### Product Requirements
32+
- [Encoding Project](product/encoding-project.md) - Project goals, algorithm implementations, file format compatibility, security and quality requirements
33+
34+
### Technical Design (RFCs)
35+
- [RFC-0001: Core Architecture](rfc/0001-core-architecture.md) - Directory structure, CLI patterns, frequency table format, CI/CD workflow design, error handling strategy
36+
37+
### Testing Specifications
38+
- [Cross-Language Testing](testing/cross-language.md) - Correctness tests, benchmark tests, known issues, future improvements
39+
40+
## SDD Workflow
41+
42+
This project follows Spec-Driven Development. The workflow is:
43+
44+
1. **Review Specs First** - Read relevant specs before coding
45+
2. **Update Specs First** - Propose spec changes for new features before implementation
46+
3. **Implement to Spec** - Code must 100% adhere to spec definitions
47+
4. **Test Against Spec** - Write tests based on acceptance criteria
48+
49+
For complete AI workflow instructions, see [AGENTS.md](../AGENTS.md).
50+
51+
## Contributing to Specs
52+
53+
When adding new features or making architectural changes:
54+
55+
1. Create or update the relevant spec document first
56+
2. Follow the existing naming conventions:
57+
- Product specs: `feature-name.md` in `product/`
58+
- RFCs: `NNNN-short-title.md` in `rfc/` (e.g., `0002-oauth2-implementation.md`)
59+
- Test specs: `feature-or-area.md` in `testing/`
60+
3. Include clear acceptance criteria
61+
4. Reference related specs where applicable
Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# Encoding Project - Product Requirements
2+
3+
## Overview
4+
5+
This encoding algorithm collection project aims to provide educational, cross-language compression algorithm implementations with cross-language verification.
6+
7+
## Project Goals
8+
9+
1. **Educational Value**: Clear implementations for learning compression algorithms
10+
2. **Cross-Language Comparison**: Same algorithm in C++17, Go, and Rust
11+
3. **Verification**: Cross-language encode/decode compatibility testing
12+
4. **Open Source Best Practices**: Full documentation, CI/CD, and community standards
13+
14+
## Algorithm Implementations
15+
16+
| Algorithm | Languages | Status |
17+
|-----------|-----------|--------|
18+
| Huffman | C++, Go, Rust | ✅ Complete |
19+
| Arithmetic Coding | C++, Go, Rust | ✅ Complete |
20+
| Range Coder | C++, Go, Rust | ✅ Complete |
21+
| RLE | C++, Go, Rust | ✅ Complete |
22+
23+
## File Format Compatibility
24+
25+
All implementations must share identical binary formats to enable cross-language verification:
26+
27+
- **Huffman**: Magic `HFMN` + frequency table + bit stream
28+
- **Arithmetic**: Magic `AENC` + frequency table + bit stream
29+
- **Range Coder**: Magic `RCNC` + frequency table + byte stream
30+
- **RLE**: (count, value) pairs with 4-byte LE count
31+
32+
## Security Requirements
33+
34+
1. **Input Size Validation**: Maximum 4 GiB to prevent frequency overflow
35+
2. **Output Size Validation**: Maximum 1 GiB to prevent decompression bombs
36+
3. **Memory Safety**: RAII in C++, proper error handling in all languages
37+
38+
## Quality Requirements
39+
40+
1. **Code Style**: Consistent formatting per language conventions
41+
2. **Error Messages**: All in English for consistency
42+
3. **Documentation**: Bilingual README, English code comments
43+
4. **Testing**: Unit tests + cross-language correctness tests
44+
45+
## Infrastructure Requirements
46+
47+
### CI/CD Pipeline
48+
49+
- Build all implementations on every push/PR
50+
- Run unit tests for Go and Rust
51+
- Verify cross-language encode/decode correctness
52+
- Check required files (LICENSE, CONTRIBUTING, etc.)
53+
54+
### Documentation
55+
56+
- VitePress documentation site
57+
- Algorithm guides with complexity analysis
58+
- Getting started guide
59+
- Project structure reference
60+
61+
### Open Source Standards
62+
63+
- MIT License
64+
- CONTRIBUTING.md with development setup
65+
- CODE_OF_CONDUCT.md (Contributor Covenant)
66+
- SECURITY.md with vulnerability reporting
67+
- Issue and PR templates
68+
69+
## Acceptance Criteria
70+
71+
- [ ] All four algorithms implemented in C++17, Go, and Rust
72+
- [ ] Cross-language encode/decode compatibility verified
73+
- [ ] CI/CD pipeline passing on all pushes
74+
- [ ] VitePress documentation site published
75+
- [ ] All security validations in place
76+
- [ ] Bilingual README (English/Chinese)
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# RFC-0001: Core Architecture
2+
3+
## Status
4+
**Status**: Accepted
5+
**Created**: 2024
6+
**Updated**: 2026
7+
8+
## Architecture Overview
9+
10+
```
11+
encoding/
12+
├── .github/
13+
│ ├── workflows/
14+
│ │ ├── ci.yml # Build, test, correctness verification
15+
│ │ └── pages.yml # VitePress docs deployment
16+
│ ├── ISSUE_TEMPLATE/
17+
│ │ ├── bug_report.md
18+
│ │ └── feature_request.md
19+
│ └── PULL_REQUEST_TEMPLATE.md
20+
├── specs/ # Spec-Driven Development documents
21+
│ ├── product/ # Product requirements
22+
│ ├── rfc/ # Technical design documents
23+
│ ├── api/ # API definitions
24+
│ ├── db/ # Database schemas (if applicable)
25+
│ └── testing/ # Test specifications
26+
├── docs/
27+
│ ├── .vitepress/config.mts # VitePress configuration
28+
│ ├── index.md # Documentation landing page
29+
│ ├── guide/
30+
│ │ ├── getting-started.md
31+
│ │ ├── algorithms.md
32+
│ │ └── project-structure.md
33+
│ └── public/
34+
├── algorithms/huffman/
35+
│ ├── cpp/main.cpp
36+
│ ├── go/main.go
37+
│ └── rust/main.rs
38+
├── algorithms/arithmetic/
39+
│ ├── cpp/main.cpp
40+
│ ├── go/main.go
41+
│ └── rust/main.rs
42+
├── algorithms/range/
43+
│ ├── cpp/main.cpp
44+
│ ├── go/ (library + cmd/)
45+
│ └── rust/ (Cargo.toml + src/)
46+
├── algorithms/rle/
47+
│ ├── cpp/main.cpp
48+
│ ├── go/main.go
49+
│ └── rust/main.rs
50+
├── tests/
51+
│ ├── gen_testdata.py
52+
│ └── data/
53+
├── Makefile
54+
├── package.json
55+
├── LICENSE
56+
├── README.md
57+
├── README.zh-CN.md
58+
├── CHANGELOG.md
59+
├── CONTRIBUTING.md
60+
├── CODE_OF_CONDUCT.md
61+
└── SECURITY.md
62+
```
63+
64+
## Component Design
65+
66+
### Algorithm Modules
67+
68+
Each algorithm follows a consistent CLI pattern across all languages:
69+
70+
#### C++ Pattern
71+
```cpp
72+
int main(int argc, char** argv) {
73+
if (argc != 4) { /* usage */ }
74+
string mode = argv[1]; // "encode" or "decode"
75+
string input = argv[2];
76+
string output = argv[3];
77+
// Process...
78+
}
79+
```
80+
81+
#### Go Pattern
82+
```go
83+
func main() {
84+
if len(os.Args) != 4 { /* usage */ }
85+
mode := os.Args[1]
86+
inputPath := os.Args[2]
87+
outputPath := os.Args[3]
88+
// Process...
89+
}
90+
```
91+
92+
#### Rust Pattern
93+
```rust
94+
fn main() {
95+
let args: Vec<String> = env::args().collect();
96+
if args.len() != 4 { /* usage */ }
97+
let mode = &args[1];
98+
let input = &args[2];
99+
let output = &args[3];
100+
// Process...
101+
}
102+
```
103+
104+
## Frequency Table Format
105+
106+
All static-model algorithms share the same frequency table structure:
107+
108+
```
109+
+------------------+------------------------+
110+
| Field | Format |
111+
+------------------+------------------------+
112+
| Symbol count | 4 bytes LE (uint32) |
113+
| Frequency[0] | 4 bytes LE (uint32) |
114+
| ... | ... |
115+
| Frequency[256] | 4 bytes LE (uint32) |
116+
| Frequency[EOF] | 4 bytes LE (uint32) |
117+
+------------------+------------------------+
118+
```
119+
120+
## CI/CD Workflow Design
121+
122+
```
123+
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
124+
│ build-cpp │ │ build-go │ │ build-rust │
125+
│ (matrix) │ │ │ │ │
126+
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
127+
│ │ │
128+
└───────────────────┼───────────────────┘
129+
130+
┌──────▼──────┐
131+
│ correctness │
132+
│ tests │
133+
└─────────────┘
134+
```
135+
136+
## Error Handling Strategy
137+
138+
1. **Input Validation**: Check file size before reading
139+
2. **Decompression Limits**: Track output size to prevent bombs
140+
3. **Error Messages**: English, descriptive, actionable
141+
4. **Exit Codes**: 0 for success, 1 for errors
142+
143+
## Performance Considerations
144+
145+
1. **Buffered I/O**: Use bufio in Go, ifstream/ofstream in C++
146+
2. **Frequency Scaling**: Scale to maxTotal (2^24) for numerical stability
147+
3. **Memory**: Process streams where possible, avoid loading entire files
148+
149+
## Documentation Site Design
150+
151+
- **Landing Page**: Project overview, target audience, reading paths
152+
- **Getting Started**: Environment requirements, build commands, CLI usage
153+
- **Algorithms**: Theory, complexity analysis, implementation differences
154+
- **Project Structure**: Directory layout, CLI conventions, file formats
155+
156+
## Decisions
157+
158+
### Why Three Languages?
159+
- **C++17**: Industry standard, manual memory management learning
160+
- **Go**: Modern systems language, excellent concurrency
161+
- **Rust**: Memory safety without garbage collector
162+
163+
### Why This Directory Structure?
164+
- Algorithm-first organization for easy navigation
165+
- Language subdirectories within each algorithm
166+
- Shared test data and CI configuration at root level
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Cross-Language Testing Specification
2+
3+
## Overview
4+
5+
This spec defines the cross-language verification strategy for the encoding project, ensuring that implementations in C++17, Go, and Rust produce compatible output.
6+
7+
## Test Strategy
8+
9+
### 1. Correctness Tests
10+
11+
All implementations must pass cross-language encode/decode tests:
12+
13+
```bash
14+
# Encode with language A, decode with language B
15+
./algorithms/huffman/cpp/huffman_cpp encode input.txt output.huf
16+
./algorithms/huffman/go/huffman_go decode output.huf restored.txt
17+
diff input.txt restored.txt # Must be identical
18+
```
19+
20+
### 2. Test Data Generation
21+
22+
Test data is generated using `tests/gen_testdata.py` and includes:
23+
24+
- Random binary data
25+
- Text files
26+
- Repetitive patterns (for RLE)
27+
- Edge cases (empty files, single byte, etc.)
28+
29+
### 3. Benchmark Tests
30+
31+
Performance benchmarks run across all implementations:
32+
33+
```bash
34+
make bench
35+
```
36+
37+
Results are compared across:
38+
- Compression ratio
39+
- Encode speed
40+
- Decode speed
41+
- Memory usage
42+
43+
## Known Issues
44+
45+
### Range Coder Performance
46+
47+
- **Issue**: Decode hangs for files >500KB
48+
- **Workaround**: CI uses 100KB test file
49+
- **Status**: Under investigation
50+
51+
## Future Test Improvements
52+
53+
- [ ] Fix range coder decode performance issue
54+
- [ ] Add adaptive probability model tests
55+
- [ ] Add LZ77/LZSS algorithm tests
56+
- [ ] Add benchmark visualization
57+
- [ ] Add WebAssembly builds
58+
- [ ] Add Python bindings tests
59+
60+
## Acceptance Criteria
61+
62+
- [ ] All algorithms produce identical output across C++, Go, and Rust
63+
- [ ] Benchmarks run successfully on all implementations
64+
- [ ] No memory leaks or safety issues
65+
- [ ] CI pipeline passes on all platforms

0 commit comments

Comments
 (0)