Skip to content

Commit 7ba1c57

Browse files
committed
feat: Introduce encoding modes with restructured examples, updated core encoder/decoder, and new related tests and documentation.
1 parent de74046 commit 7ba1c57

165 files changed

Lines changed: 10209 additions & 576 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/llm-evals.yml

Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
name: LLM Evaluations
2+
3+
on:
4+
pull_request:
5+
paths:
6+
- 'src/core/**'
7+
- 'src/schema/**'
8+
- 'src/binary/**'
9+
- 'src/evals/**'
10+
- 'benchmarks/**'
11+
push:
12+
branches: [main]
13+
schedule:
14+
# Run weekly on Sunday at midnight
15+
- cron: '0 0 * * 0'
16+
workflow_dispatch:
17+
inputs:
18+
eval_type:
19+
description: 'Type of evaluation to run'
20+
required: true
21+
default: 'smoke'
22+
type: choice
23+
options:
24+
- smoke
25+
- full
26+
27+
jobs:
28+
smoke-test:
29+
name: Smoke Test Evaluations
30+
runs-on: ubuntu-latest
31+
if: github.event_name == 'pull_request' || github.event.inputs.eval_type == 'smoke'
32+
33+
steps:
34+
- name: Checkout code
35+
uses: actions/checkout@v4
36+
37+
- name: Setup Node.js
38+
uses: actions/setup-node@v4
39+
with:
40+
node-version: '18'
41+
cache: 'npm'
42+
43+
- name: Install dependencies
44+
run: npm ci
45+
46+
- name: Build project
47+
run: npm run build
48+
49+
- name: Run smoke evaluations
50+
env:
51+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
52+
AZURE_API_KEY: ${{ secrets.AZURE_API_KEY }}
53+
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
54+
run: npm run eval:smoke
55+
continue-on-error: true
56+
57+
- name: Check for regressions
58+
id: regression_check
59+
run: npm run eval:check-regressions
60+
continue-on-error: true
61+
62+
- name: Upload eval results
63+
uses: actions/upload-artifact@v4
64+
with:
65+
name: smoke-eval-results
66+
path: benchmarks/results/
67+
retention-days: 30
68+
69+
- name: Comment PR with results
70+
if: github.event_name == 'pull_request'
71+
uses: actions/github-script@v7
72+
with:
73+
script: |
74+
const fs = require('fs');
75+
const path = require('path');
76+
77+
try {
78+
const resultPath = path.join(process.cwd(), 'benchmarks/results/latest.json');
79+
if (!fs.existsSync(resultPath)) {
80+
console.log('No results file found');
81+
return;
82+
}
83+
84+
const results = JSON.parse(fs.readFileSync(resultPath, 'utf-8'));
85+
86+
const passed = results.passed ? '✅' : '❌';
87+
const duration = (results.duration / 1000).toFixed(1);
88+
89+
let comment = `## ${passed} LLM Evaluation Results\n\n`;
90+
comment += `**Duration:** ${duration}s\n`;
91+
comment += `**Status:** ${results.passed ? 'Passed' : 'Failed'}\n\n`;
92+
93+
comment += `### Metrics\n\n`;
94+
comment += `| Model | Exact Match | Token Efficiency |\n`;
95+
comment += `|-------|-------------|------------------|\n`;
96+
97+
for (const [model, metrics] of Object.entries(results.results)) {
98+
const exactMatch = ((metrics.exactMatch || 0) * 100).toFixed(1);
99+
const tokenEff = (metrics.tokenEfficiency || 0).toFixed(0);
100+
comment += `| ${model} | ${exactMatch}% | ${tokenEff} |\n`;
101+
}
102+
103+
if (results.regressions && results.regressions.length > 0) {
104+
comment += `\n### ⚠️ Regressions Detected\n\n`;
105+
results.regressions.forEach(r => {
106+
const emoji = r.severity === 'critical' ? '🔴' : r.severity === 'major' ? '🟠' : '🟡';
107+
comment += `${emoji} **${r.model}/${r.metric}:** ${r.change.toFixed(1)}% change\n`;
108+
});
109+
}
110+
111+
await github.rest.issues.createComment({
112+
issue_number: context.issue.number,
113+
owner: context.repo.owner,
114+
repo: context.repo.repo,
115+
body: comment
116+
});
117+
} catch (error) {
118+
console.error('Error posting comment:', error);
119+
}
120+
121+
full-evaluation:
122+
name: Full Evaluation Suite
123+
runs-on: ubuntu-latest
124+
if: github.event_name == 'push' || github.event_name == 'schedule' || github.event.inputs.eval_type == 'full'
125+
126+
steps:
127+
- name: Checkout code
128+
uses: actions/checkout@v4
129+
130+
- name: Setup Node.js
131+
uses: actions/setup-node@v4
132+
with:
133+
node-version: '18'
134+
cache: 'npm'
135+
136+
- name: Install dependencies
137+
run: npm ci
138+
139+
- name: Build project
140+
run: npm run build
141+
142+
- name: Run full evaluation suite
143+
env:
144+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
145+
AZURE_API_KEY: ${{ secrets.AZURE_API_KEY }}
146+
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
147+
run: npm run eval:full
148+
timeout-minutes: 30
149+
150+
- name: Check for regressions
151+
id: regression_check
152+
run: npm run eval:check-regressions
153+
continue-on-error: true
154+
155+
- name: Save as new baseline
156+
if: github.ref == 'refs/heads/main' && steps.regression_check.outcome == 'success'
157+
run: npm run eval:baseline
158+
159+
- name: Upload eval results
160+
uses: actions/upload-artifact@v4
161+
with:
162+
name: full-eval-results
163+
path: benchmarks/results/
164+
retention-days: 90
165+
166+
- name: Fail on critical regressions
167+
if: steps.regression_check.outcome == 'failure'
168+
run: exit 1

CHANGELOG.md

Lines changed: 86 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,91 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [1.2.0] - 2025-12-03
9+
10+
### Major Release: Enterprise Features & Production Readiness
11+
12+
This release transforms ZON into an enterprise-grade data format with versioning, adaptive encoding, binary format, comprehensive developer tools, automated testing, and production documentation.
13+
14+
### Added
15+
16+
#### Phase 1: Document-Level Schema Versioning
17+
- **Version Embedding/Extraction**: `embedVersion()` and `extractVersion()` for metadata management
18+
- **Migration Manager**: `ZonMigrationManager` with BFS path-finding for schema evolution
19+
- **Backward/Forward Compatibility**: Automatic migration between schema versions
20+
- **Test Coverage**: 45 tests covering all versioning scenarios
21+
22+
#### Phase 2: Adaptive Format Selection
23+
- **4 Encoding Modes**: `auto`, `compact`, `readable`, `llm-optimized`
24+
- **Data Complexity Analyzer**: Automatic analysis of nesting depth, irregularity, field count
25+
- **Mode Recommendation**: `recommendMode()` suggests optimal encoding based on data structure
26+
- **Test Coverage**: 20 tests for adaptive encoding
27+
28+
#### Phase 3: Binary ZON Format (ZON-B)
29+
- **MessagePack-Inspired Encoding**: Compact binary format with magic header (`ZNB\x01`)
30+
- **40-60% Space Savings**: Significantly smaller than JSON while maintaining structure
31+
- **Full Type Support**: Primitives, arrays, objects, nested structures
32+
- **APIs**: `encodeBinary()`, `decodeBinary()` with round-trip validation
33+
- **Test Coverage**: 18 tests for binary format
34+
35+
#### Phase 5: Developer Experience Tools
36+
- **Format Converters**: JSON ↔ ZON ↔ Binary with `BatchConverter`
37+
- **Helper Utilities**: `size()`, `compareFormats()`, `analyze()`, `inferSchema()`, `compare()`, `isSafe()`
38+
- **Pretty Printer**: Syntax highlighting with colors, `diffPrint()` for visual diffs
39+
- **Enhanced Validator**: Linting rules for depth, fields, performance with best practice suggestions
40+
- **CLI Enhancements**:
41+
- `zon analyze` - Data complexity analysis
42+
- `zon diff` - Visual file comparison
43+
- `zon validate --strict` - Strict validation with linting
44+
- `zon convert --to=binary` - Binary format conversion
45+
- `zon format --colors` - Pretty printing with syntax highlighting
46+
47+
#### Phase 6: LLM Evaluation Framework
48+
- **ZonEvaluator Engine**: Core evaluation framework with metric registration
49+
- **7 Built-in Metrics**: exactMatch, tokenEfficiency, structuralValidity, formatCorrectness, partialMatch, hallucination, latency
50+
- **Regression Detection**: Compare baseline vs current results
51+
- **Dataset Management**: `DatasetRegistry` with versioning and tagging
52+
- **Storage Backends**: `FileEvalStorage` and `MemoryEvalStorage`
53+
54+
#### Phase 7: CI/CD Integration
55+
- **GitHub Actions Workflow**: Automated evaluations on PRs and main branch
56+
- **Smoke Tests**: Fast <1min tests on every PR
57+
- **Regression Detection**: Automatic baseline comparison with severity levels (critical/major/minor)
58+
- **PR Comments**: Auto-post eval results with metrics tables
59+
- **Baseline Management**: Auto-save successful builds as new baselines
60+
- **NPM Scripts**: `eval:smoke`, `eval:check-regressions`, `eval:baseline`
61+
62+
#### Phase 4: Production Documentation
63+
- **Production Architecture Guide**: Multi-format strategy, versioning workflows, API patterns
64+
- **Best Practices Guide**: Code organization, error handling, testing, security
65+
- **Migration Examples**: Batch migration scripts with stats tracking
66+
- **Express Middleware**: Content negotiation for JSON/ZON/Binary formats
67+
68+
### Changed
69+
- **Code Quality**: Removed inline comments from core files (`encoder.ts`, `decoder.ts`)
70+
- **Documentation**: All functions have proper JSDoc/TSDoc documentation
71+
- **Build System**: Still compiles cleanly with TypeScript
72+
73+
### Performance
74+
- **Binary Format**: 40-60% smaller than JSON
75+
- **ZON Text**: Maintains 16-19% smaller than JSON
76+
- **Test Suite**: All 288 tests passing
77+
78+
### Documentation
79+
- **New Guides**: Production architecture, best practices, developer tools
80+
- **Working Examples**: Express middleware, migration scripts
81+
- **API Reference**: Complete documentation for all new APIs
82+
83+
### Testing
84+
- **Total Tests**: 288 passing (up from 175)
85+
- **Test Coverage**: 100% for all new features
86+
- **No Regressions**: Full backward compatibility maintained
87+
88+
### Development
89+
- **Total Code**: ~4250+ lines of production code
90+
- **Files Added**: 21 new files (binary/, evals/, tools/, docs/, examples/)
91+
- **Quality**: Professional-grade documentation, no inline comments
92+
893
## [1.1.0] - 2025-12-01
994

1095
### Major Release: Ecosystem Integrations & Streaming
@@ -19,7 +104,7 @@ This release transforms ZON into a production-ready format with first-class supp
19104
- **OpenAI Helper** (`zon-format/openai`): `ZOpenAI` wrapper with automatic format injection
20105

21106
#### Phase 4: Developer Experience
22-
- **VS Code Extension**: Syntax highlighting for `.zon` and `.zonf` files
107+
- **VS Code Extension**: Syntax highlighting for `.zonf` files
23108
- **Performance Benchmarks**: Automated benchmark suite comparing ZON vs JSON/MsgPack
24109
- **CLI Enhancements**: `validate`, `stats`, and `format` commands
25110

0 commit comments

Comments
 (0)