Skip to content

Commit 6b02783

Browse files
tbitcsoz-agent
andcommitted
chore: release v1.1.0 — CHANGELOG trimmed, pre-1.0.0 history removed
- [1.1.0] released 2026-05-19: multi-GPU support, ROCm/XPU Dockerfiles, --device flag, REQ/TEST-OEA-023, scaffold fix, doc cross-check - Pre-v1.0.0 history removed from CHANGELOG (kept in git history) - Reference links updated: [Unreleased] now compares v1.1.0...HEAD Co-Authored-By: Oz <oz-agent@warp.dev>
1 parent b26ec16 commit 6b02783

1 file changed

Lines changed: 6 additions & 156 deletions

File tree

CHANGELOG.md

Lines changed: 6 additions & 156 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [1.1.0] - 2026-05-19
11+
1012
### Added
1113
- `Dockerfile.cuda`: NVIDIA CUDA 12.1 GPU image (verified on RTX 4070 SUPER)
1214
- `Dockerfile.rocm`: AMD ROCm 6.x GPU image (community-tested; `rocm/dev-ubuntu-22.04:6.3` base)
@@ -36,9 +38,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3638
- `docs/REQUIREMENTS.md`: REQ-OEA-020 updated to reference `Dockerfile.cuda` alongside
3739
`Dockerfile`
3840
- `docs/TESTS.md`: TEST-OEA-020 updated to reference `Dockerfile.cuda`
39-
- `scaffold.yml`: pinned `detected_type: aee-research` to suppress specsmith audit false-positive
40-
(scanner infers `research-python` from file heuristics; `aee-research` is the intentional
41-
governance type set at project bootstrap)
41+
- `AGENTS.md`: spec version updated `0.10.1``0.11.3.dev427`; type updated to `research-python`
4242

4343
## [1.0.0] - 2026-05-14
4444

@@ -76,156 +76,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7676
- Markdown lint: MD010/MD040/MD060 violations resolved
7777
- Dependabot: all 5 GitHub Actions PRs merged (checkout v6, setup-python v6, etc.)
7878

79-
## [0.4.0] - 2026-05-13
80-
81-
### Added
82-
- `experiments/data/scientific_corpus.txt` — 50-sentence scientific/natural-philosophy public
83-
domain corpus (Newton, Feynman, Sagan); second independent domain for credibility suite
84-
- Two-model real LLM validation: `--model` CLI arg to `real_lm_experiment.py`;
85-
`results/real_lm/distilgpt2/` and `results/real_lm/gpt2/` committed
86-
- distilgpt2 (82M): `oea_anchored` log-prob +1.14 nats; `oea_miscalibrated` -0.82 nats
87-
- gpt2 (124M): `oea_anchored` log-prob +1.61 nats; `oea_miscalibrated` -0.80 nats
88-
- Causal mechanism model-size independent; effect strengthens with capacity
89-
- 5 new verified citations: drayson2025detection (EMNLP 2025), zhu2025synthesize (ICML 2025),
90-
kovac2025recursive, keisha2025knowledge (NeurIPS 2025 workshop), abbasiyadkori2024believe
91-
- Differentiation paragraphs in §2.1 (OEA vs. Drayson/Zhu) and §2.3 (Abbasi Yadkori)
92-
- UNK-002 resolved in UNCERTAINTY-MAP.md; stable two-corpus setup documented
93-
94-
### Changed
95-
- `experiments/data/public_domain_corpus.txt`: expanded from 18 lines to 62 lines spanning
96-
Carroll, Austen, Melville, Hume, Darwin (~1600 words; 5 domains)
97-
- `experiments/credibility_suite.py`: corpus plan v2 uses `[public_domain_snippets,
98-
scientific_snippets]` — removes `arxiv/main.tex` self-reference (UNK-002 fix)
99-
- `experiments/config/credibility_plan.json`: study_name → oea_credibility_suite_v2;
100-
corpora → `[public_domain_snippets, scientific_snippets]`
101-
- `real_lm_experiment.py`: N_SEEDS 5→10, N_ITERATIONS 5→10; results dir per model
102-
- Table 2: refreshed with 2-domain values; Cohen d: 3.10→4.56, p<0.001
103-
- Table 3: restructured as two-model comparison; log-probability as primary metric;
104-
JSD-anchoring interaction finding documented in dedicated subsection
105-
- Abstract, conclusion updated with two-model results and new statistics
106-
- CITATION.cff: version 0.4.0, date 2026-05-13, abstract updated
107-
- references.bib: citation audit cycle 3 — all 13/13 VERIFIED
108-
109-
## [0.3.2] - 2026-05-12
110-
111-
### Added
112-
- `results/real_lm/`: real LLM experiment artifacts (distilgpt2, BM25 RAG, 5 seeds x 5 iter)
113-
- `oea_anchored`: JSD=0.088 (41% less drift), mean log-prob +0.574 nats vs control
114-
- `oea_miscalibrated`: mean log-prob −0.387 vs control — causal proof of mechanism
115-
- `oea_rag_only`: RAG without filter degrades log-prob; epistemic filter is operative
116-
- Table 3 (Real LLM results) in `arxiv/main.tex` with actual numbers
117-
- Saturation note for TRR metric in manuscript Limitations
118-
- SEAL-0008: manuscript lock milestone
119-
120-
### Changed
121-
- `experiments/credibility_suite.py`: annotation on `oea_full` CQ=0.83 noting provisional
122-
status pending threshold recalibration (TRR saturated at 1.0 in real LLM run)
123-
124-
### Fixed
125-
- `arxiv/main.tex`: removed `\citet{}` (natbib not loaded); replaced with `\cite{}`
126-
- `.github/workflows/ci.yml`: added `-bibtex` flag to latexmk args
127-
128-
## [0.3.1]
129-
130-
### Added
131-
- `BM25Retriever` class in `real_lm_experiment.py`: corpus-grounded token-overlap RAG
132-
implementing OEA Layer 1 (Ontological Anchoring). Not a log-probability proxy.
133-
- `oea_rag_only` variant: retrieval without epistemic filtering; isolates RAG contribution
134-
- REQ-OEA-010/011/012 and TEST-OEA-010/011/012 (RAG spec, manuscript results, CQ chain)
135-
- `\section{Real LLM Validation}` in `arxiv/main.tex`: design, frozen-weights scope as
136-
necessary-condition framing, results placeholder (pending `real_lm_experiment.py` run)
137-
- DEC-004 in `docs/ARCHITECTURE.md`: explicit frozen-weights scope decision
138-
- CQ Measurement output in `real_lm_experiment.py main()`: derives `_CALIBRATION_QUALITY`
139-
updates from measured `true_reject_rate` (closes evidence chain: REQ-OEA-012)
140-
141-
### Changed
142-
- `real_lm_experiment.py`: N_SEEDS 3 → 5; RAG added to all non-control variants;
143-
`oea_anchored` = RAG + K=3 + highest log-prob + vocab anchoring;
144-
`oea_miscalibrated` = RAG + K=3 + lowest log-prob (anti-calibrated falsification control)
145-
- `docs/ARCHITECTURE.md`: updated component table, data flow, key decisions
146-
147-
## [0.3.0]
148-
149-
### Added
150-
- `experiments/real_lm_experiment.py` — distilgpt2 (82M) recursive stability experiment
151-
with genuine neural log-probability epistemic filter; no hardcoded constants.
152-
Variants: `control`, `oea_anchored` (keep highest log-prob), `oea_miscalibrated`
153-
(keep lowest, anti-calibrated). Proves the mechanism is causal, not definitional.
154-
- `requirements-experiments.txt` — torch (CPU) + transformers install spec
155-
- `experiments/config/credibility_plan_fast.json` — minimal plan for CI validation
156-
(10 runs vs 7,128; covers key variants including `ablation_miscalibrated`)
157-
- `scripts/run-experiments.sh` / `scripts/run-experiments.cmd` — one-command experiment
158-
runner; `--all` flag includes real LLM experiment
159-
- `.venv` bootstrap: `scripts/setup.sh --experiments` / `scripts/setup.cmd --experiments`
160-
161-
### Changed
162-
- `experiments/credibility_suite.py`: replaced hardcoded `p_reject_false`/`p_reject_true`
163-
constants with `_CALIBRATION_QUALITY` dict + `_rejection_rates(cq)` formula. Rejection
164-
rates are now derived from calibration quality via linear interpolation between random
165-
baseline (CQ=0.5) and perfect discrimination (CQ=1.0). Addresses "simulation fallacy".
166-
- `experiments/credibility_suite.py`: added `ablation_miscalibrated` variant (CQ=0.22)
167-
demonstrating that anti-calibrated selection degrades faster than control.
168-
- `experiments/config/credibility_plan.json`: added `ablation_miscalibrated` to variant list
169-
- `scripts/setup.sh` / `scripts/setup.cmd`: full venv bootstrap with dependency tiers
170-
171-
## [0.2.2]
172-
173-
### Fixed
174-
- 26 markdownlint errors eliminated across governance, docs, and CHANGELOG files
175-
(MD009 trailing spaces, MD010 hard tabs, MD012 blank lines, MD031/MD040 code
176-
fence language/spacing, MD037 emphasis markers, MD001 heading levels)
177-
- `.markdownlint.json`: added `siblings_only` for MD024, disabled MD026/MD034
178-
- `pytest.ini`: added `pythonpath = .` so `tests/` can import `experiments.*`
179-
- `.github/workflows/ci.yml`: removed `cm-super` apk (not available in Alpine TeX image)
180-
181-
### Added
182-
- CI: `python-tests` job (pytest + pip-audit security scan)
183-
- `.github/dependabot.yml`: weekly pip and GitHub Actions dependency updates
184-
185-
## [0.2.1] - 2026-05-12
186-
187-
### Fixed
188-
- `arxiv/references.bib`: all 8/8 citations VERIFIED (citation lock closed)
189-
- `fu2025selfverification`: NeurIPS 2025 poster confirmed (Accept, submission 12388)
190-
- `roumeliotis2025trust`: arXiv:2507.10571 v3 confirmed; trailing comma removed
191-
- Trace vault SEAL-0007: citation lock audit-gate sealed
192-
- REQ-OEA-006 submission guardrail: citation lock now satisfied
193-
194-
## [0.2.0] - 2026-05-12
195-
196-
### Added
197-
- `arxiv/main.tex`: `\section{Conclusion}` with OEA hypothesis restatement, scope-bounded
198-
evidence summary, stability/epistemic orthogonality finding, 4-item future-work agenda
199-
- `arxiv/main.tex`: Table 2 — full ablation study (11 variants × 648 runs) with stability,
200-
true-reject, false-reject means + 95% CI, Cohen's d vs baselines (sourced from artifacts)
201-
- `arxiv/references.bib`: full citation audit; 6/8 VERIFIED, 2 flagged for human check
202-
- REQ-OEA-007/008/009 and TEST-OEA-007/008/009 added to belief artifact registry
203-
- Trace vault: SEAL-0004 (cycle 2 architecture), SEAL-0005 (verification), SEAL-0006 (release)
204-
205-
## [0.1.0] - 2026-05-12
206-
207-
### Added
208-
- specsmith 0.10.1 governance overlay (`aee-research` type, `enable_epistemic=true`)
209-
- `AGENTS.md` — agent governance hub with OEA protocol and H13 epistemic boundary rules
210-
- `docs/ARCHITECTURE.md` — OEA tri-layer architecture, experiment harness components, data flow
211-
- `REQUIREMENTS.md` / `docs/REQUIREMENTS.md` — 6 REQ-OEA-\* belief artifacts (all P1, Accepted)
212-
- `docs/TESTS.md` — 6 TEST-OEA-\* specifications with 100% REQ coverage
213-
- `docs/governance/` — 11 governance files including full epistemic layer (EPISTEMIC-AXIOMS,
214-
BELIEF-REGISTRY, FAILURE-MODES, UNCERTAINTY-MAP)
215-
- `scaffold.yml` — aee-research project type, specsmith 0.10.1
216-
- `arxiv/main.tex` — manuscript scaffold with pilot results and ablation tables
217-
- `experiments/credibility_suite.py` — bigram-proxy ablation harness (11 variants)
218-
- `results/summary_metrics.json` — pilot: stability delta +0.121, true-reject delta +0.232
219-
- Trace vault: SEAL-0001 (architecture), SEAL-0002 (verification), SEAL-0003 (v0.1.0 release)
220-
- Community files: `CODE_OF_CONDUCT.md`, `.github/ISSUE_TEMPLATE/`, `.editorconfig`
221-
[Unreleased]: https://github.com/BitConcepts/oea-framework-paper/compare/v1.0.0...HEAD
222-
[1.0.0]: https://github.com/BitConcepts/oea-framework-paper/compare/v0.4.0...v1.0.0
223-
[0.4.0]: https://github.com/BitConcepts/oea-framework-paper/compare/v0.3.2...v0.4.0
224-
[0.3.2]: https://github.com/BitConcepts/oea-framework-paper/compare/v0.3.1...v0.3.2
225-
[0.3.1]:
226-
[0.3.1]: https://github.com/BitConcepts/oea-framework-paper/compare/v0.3.0...v0.3.1
227-
[0.3.0]: https://github.com/BitConcepts/oea-framework-paper/compare/v0.2.2...v0.3.0
228-
[0.2.2]: https://github.com/BitConcepts/oea-framework-paper/compare/v0.2.1...v0.2.2
229-
[0.2.1]: https://github.com/BitConcepts/oea-framework-paper/compare/v0.2.0...v0.2.1
230-
[0.2.0]: https://github.com/BitConcepts/oea-framework-paper/compare/v0.1.0...v0.2.0
231-
[0.1.0]: https://github.com/BitConcepts/oea-framework-paper/releases/tag/v0.1.0
79+
[Unreleased]: https://github.com/BitConcepts/oea-framework-paper/compare/v1.1.0...HEAD
80+
[1.1.0]: https://github.com/BitConcepts/oea-framework-paper/compare/v1.0.0...v1.1.0
81+
[1.0.0]: https://github.com/BitConcepts/oea-framework-paper/releases/tag/v1.0.0

0 commit comments

Comments
 (0)