@@ -7,6 +7,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77
88## [ Unreleased]
99
10+ ## [ 1.1.0] - 2026-05-19
11+
1012### Added
1113- ` Dockerfile.cuda ` : NVIDIA CUDA 12.1 GPU image (verified on RTX 4070 SUPER)
1214- ` Dockerfile.rocm ` : AMD ROCm 6.x GPU image (community-tested; ` rocm/dev-ubuntu-22.04:6.3 ` base)
@@ -36,9 +38,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3638- ` docs/REQUIREMENTS.md ` : REQ-OEA-020 updated to reference ` Dockerfile.cuda ` alongside
3739 ` Dockerfile `
3840- ` docs/TESTS.md ` : TEST-OEA-020 updated to reference ` Dockerfile.cuda `
39- - ` scaffold.yml ` : pinned ` detected_type: aee-research ` to suppress specsmith audit false-positive
40- (scanner infers ` research-python ` from file heuristics; ` aee-research ` is the intentional
41- governance type set at project bootstrap)
41+ - ` AGENTS.md ` : spec version updated ` 0.10.1 ` → ` 0.11.3.dev427 ` ; type updated to ` research-python `
4242
4343## [ 1.0.0] - 2026-05-14
4444
@@ -76,156 +76,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7676- Markdown lint: MD010/MD040/MD060 violations resolved
7777- Dependabot: all 5 GitHub Actions PRs merged (checkout v6, setup-python v6, etc.)
7878
79- ## [ 0.4.0] - 2026-05-13
80-
81- ### Added
82- - ` experiments/data/scientific_corpus.txt ` — 50-sentence scientific/natural-philosophy public
83- domain corpus (Newton, Feynman, Sagan); second independent domain for credibility suite
84- - Two-model real LLM validation: ` --model ` CLI arg to ` real_lm_experiment.py ` ;
85- ` results/real_lm/distilgpt2/ ` and ` results/real_lm/gpt2/ ` committed
86- - distilgpt2 (82M): ` oea_anchored ` log-prob +1.14 nats; ` oea_miscalibrated ` -0.82 nats
87- - gpt2 (124M): ` oea_anchored ` log-prob +1.61 nats; ` oea_miscalibrated ` -0.80 nats
88- - Causal mechanism model-size independent; effect strengthens with capacity
89- - 5 new verified citations: drayson2025detection (EMNLP 2025), zhu2025synthesize (ICML 2025),
90- kovac2025recursive, keisha2025knowledge (NeurIPS 2025 workshop), abbasiyadkori2024believe
91- - Differentiation paragraphs in §2.1 (OEA vs. Drayson/Zhu) and §2.3 (Abbasi Yadkori)
92- - UNK-002 resolved in UNCERTAINTY-MAP.md; stable two-corpus setup documented
93-
94- ### Changed
95- - ` experiments/data/public_domain_corpus.txt ` : expanded from 18 lines to 62 lines spanning
96- Carroll, Austen, Melville, Hume, Darwin (~ 1600 words; 5 domains)
97- - ` experiments/credibility_suite.py ` : corpus plan v2 uses `[ public_domain_snippets,
98- scientific_snippets] ` — removes ` arxiv/main.tex` self-reference (UNK-002 fix)
99- - ` experiments/config/credibility_plan.json ` : study_name → oea_credibility_suite_v2;
100- corpora → ` [public_domain_snippets, scientific_snippets] `
101- - ` real_lm_experiment.py ` : N_SEEDS 5→10, N_ITERATIONS 5→10; results dir per model
102- - Table 2: refreshed with 2-domain values; Cohen d: 3.10→4.56, p<0.001
103- - Table 3: restructured as two-model comparison; log-probability as primary metric;
104- JSD-anchoring interaction finding documented in dedicated subsection
105- - Abstract, conclusion updated with two-model results and new statistics
106- - CITATION.cff: version 0.4.0, date 2026-05-13, abstract updated
107- - references.bib: citation audit cycle 3 — all 13/13 VERIFIED
108-
109- ## [ 0.3.2] - 2026-05-12
110-
111- ### Added
112- - ` results/real_lm/ ` : real LLM experiment artifacts (distilgpt2, BM25 RAG, 5 seeds x 5 iter)
113- - ` oea_anchored ` : JSD=0.088 (41% less drift), mean log-prob +0.574 nats vs control
114- - ` oea_miscalibrated ` : mean log-prob −0.387 vs control — causal proof of mechanism
115- - ` oea_rag_only ` : RAG without filter degrades log-prob; epistemic filter is operative
116- - Table 3 (Real LLM results) in ` arxiv/main.tex ` with actual numbers
117- - Saturation note for TRR metric in manuscript Limitations
118- - SEAL-0008: manuscript lock milestone
119-
120- ### Changed
121- - ` experiments/credibility_suite.py ` : annotation on ` oea_full ` CQ=0.83 noting provisional
122- status pending threshold recalibration (TRR saturated at 1.0 in real LLM run)
123-
124- ### Fixed
125- - ` arxiv/main.tex ` : removed ` \citet{} ` (natbib not loaded); replaced with ` \cite{} `
126- - ` .github/workflows/ci.yml ` : added ` -bibtex ` flag to latexmk args
127-
128- ## [ 0.3.1]
129-
130- ### Added
131- - ` BM25Retriever ` class in ` real_lm_experiment.py ` : corpus-grounded token-overlap RAG
132- implementing OEA Layer 1 (Ontological Anchoring). Not a log-probability proxy.
133- - ` oea_rag_only ` variant: retrieval without epistemic filtering; isolates RAG contribution
134- - REQ-OEA-010/011/012 and TEST-OEA-010/011/012 (RAG spec, manuscript results, CQ chain)
135- - ` \section{Real LLM Validation} ` in ` arxiv/main.tex ` : design, frozen-weights scope as
136- necessary-condition framing, results placeholder (pending ` real_lm_experiment.py ` run)
137- - DEC-004 in ` docs/ARCHITECTURE.md ` : explicit frozen-weights scope decision
138- - CQ Measurement output in ` real_lm_experiment.py main() ` : derives ` _CALIBRATION_QUALITY `
139- updates from measured ` true_reject_rate ` (closes evidence chain: REQ-OEA-012)
140-
141- ### Changed
142- - ` real_lm_experiment.py ` : N_SEEDS 3 → 5; RAG added to all non-control variants;
143- ` oea_anchored ` = RAG + K=3 + highest log-prob + vocab anchoring;
144- ` oea_miscalibrated ` = RAG + K=3 + lowest log-prob (anti-calibrated falsification control)
145- - ` docs/ARCHITECTURE.md ` : updated component table, data flow, key decisions
146-
147- ## [ 0.3.0]
148-
149- ### Added
150- - ` experiments/real_lm_experiment.py ` — distilgpt2 (82M) recursive stability experiment
151- with genuine neural log-probability epistemic filter; no hardcoded constants.
152- Variants: ` control ` , ` oea_anchored ` (keep highest log-prob), ` oea_miscalibrated `
153- (keep lowest, anti-calibrated). Proves the mechanism is causal, not definitional.
154- - ` requirements-experiments.txt ` — torch (CPU) + transformers install spec
155- - ` experiments/config/credibility_plan_fast.json ` — minimal plan for CI validation
156- (10 runs vs 7,128; covers key variants including ` ablation_miscalibrated ` )
157- - ` scripts/run-experiments.sh ` / ` scripts/run-experiments.cmd ` — one-command experiment
158- runner; ` --all ` flag includes real LLM experiment
159- - ` .venv ` bootstrap: ` scripts/setup.sh --experiments ` / ` scripts/setup.cmd --experiments `
160-
161- ### Changed
162- - ` experiments/credibility_suite.py ` : replaced hardcoded ` p_reject_false ` /` p_reject_true `
163- constants with ` _CALIBRATION_QUALITY ` dict + ` _rejection_rates(cq) ` formula. Rejection
164- rates are now derived from calibration quality via linear interpolation between random
165- baseline (CQ=0.5) and perfect discrimination (CQ=1.0). Addresses "simulation fallacy".
166- - ` experiments/credibility_suite.py ` : added ` ablation_miscalibrated ` variant (CQ=0.22)
167- demonstrating that anti-calibrated selection degrades faster than control.
168- - ` experiments/config/credibility_plan.json ` : added ` ablation_miscalibrated ` to variant list
169- - ` scripts/setup.sh ` / ` scripts/setup.cmd ` : full venv bootstrap with dependency tiers
170-
171- ## [ 0.2.2]
172-
173- ### Fixed
174- - 26 markdownlint errors eliminated across governance, docs, and CHANGELOG files
175- (MD009 trailing spaces, MD010 hard tabs, MD012 blank lines, MD031/MD040 code
176- fence language/spacing, MD037 emphasis markers, MD001 heading levels)
177- - ` .markdownlint.json ` : added ` siblings_only ` for MD024, disabled MD026/MD034
178- - ` pytest.ini ` : added ` pythonpath = . ` so ` tests/ ` can import ` experiments.* `
179- - ` .github/workflows/ci.yml ` : removed ` cm-super ` apk (not available in Alpine TeX image)
180-
181- ### Added
182- - CI: ` python-tests ` job (pytest + pip-audit security scan)
183- - ` .github/dependabot.yml ` : weekly pip and GitHub Actions dependency updates
184-
185- ## [ 0.2.1] - 2026-05-12
186-
187- ### Fixed
188- - ` arxiv/references.bib ` : all 8/8 citations VERIFIED (citation lock closed)
189- - ` fu2025selfverification ` : NeurIPS 2025 poster confirmed (Accept, submission 12388)
190- - ` roumeliotis2025trust ` : arXiv:2507.10571 v3 confirmed; trailing comma removed
191- - Trace vault SEAL-0007: citation lock audit-gate sealed
192- - REQ-OEA-006 submission guardrail: citation lock now satisfied
193-
194- ## [ 0.2.0] - 2026-05-12
195-
196- ### Added
197- - ` arxiv/main.tex ` : ` \section{Conclusion} ` with OEA hypothesis restatement, scope-bounded
198- evidence summary, stability/epistemic orthogonality finding, 4-item future-work agenda
199- - ` arxiv/main.tex ` : Table 2 — full ablation study (11 variants × 648 runs) with stability,
200- true-reject, false-reject means + 95% CI, Cohen's d vs baselines (sourced from artifacts)
201- - ` arxiv/references.bib ` : full citation audit; 6/8 VERIFIED, 2 flagged for human check
202- - REQ-OEA-007/008/009 and TEST-OEA-007/008/009 added to belief artifact registry
203- - Trace vault: SEAL-0004 (cycle 2 architecture), SEAL-0005 (verification), SEAL-0006 (release)
204-
205- ## [ 0.1.0] - 2026-05-12
206-
207- ### Added
208- - specsmith 0.10.1 governance overlay (` aee-research ` type, ` enable_epistemic=true ` )
209- - ` AGENTS.md ` — agent governance hub with OEA protocol and H13 epistemic boundary rules
210- - ` docs/ARCHITECTURE.md ` — OEA tri-layer architecture, experiment harness components, data flow
211- - ` REQUIREMENTS.md ` / ` docs/REQUIREMENTS.md ` — 6 REQ-OEA-\* belief artifacts (all P1, Accepted)
212- - ` docs/TESTS.md ` — 6 TEST-OEA-\* specifications with 100% REQ coverage
213- - ` docs/governance/ ` — 11 governance files including full epistemic layer (EPISTEMIC-AXIOMS,
214- BELIEF-REGISTRY, FAILURE-MODES, UNCERTAINTY-MAP)
215- - ` scaffold.yml ` — aee-research project type, specsmith 0.10.1
216- - ` arxiv/main.tex ` — manuscript scaffold with pilot results and ablation tables
217- - ` experiments/credibility_suite.py ` — bigram-proxy ablation harness (11 variants)
218- - ` results/summary_metrics.json ` — pilot: stability delta +0.121, true-reject delta +0.232
219- - Trace vault: SEAL-0001 (architecture), SEAL-0002 (verification), SEAL-0003 (v0.1.0 release)
220- - Community files: ` CODE_OF_CONDUCT.md ` , ` .github/ISSUE_TEMPLATE/ ` , ` .editorconfig `
221- [ Unreleased ] : https://github.com/BitConcepts/oea-framework-paper/compare/v1.0.0...HEAD
222- [ 1.0.0 ] : https://github.com/BitConcepts/oea-framework-paper/compare/v0.4.0...v1.0.0
223- [ 0.4.0 ] : https://github.com/BitConcepts/oea-framework-paper/compare/v0.3.2...v0.4.0
224- [ 0.3.2 ] : https://github.com/BitConcepts/oea-framework-paper/compare/v0.3.1...v0.3.2
225- [ 0.3.1] :
226- [ 0.3.1 ] : https://github.com/BitConcepts/oea-framework-paper/compare/v0.3.0...v0.3.1
227- [ 0.3.0 ] : https://github.com/BitConcepts/oea-framework-paper/compare/v0.2.2...v0.3.0
228- [ 0.2.2 ] : https://github.com/BitConcepts/oea-framework-paper/compare/v0.2.1...v0.2.2
229- [ 0.2.1 ] : https://github.com/BitConcepts/oea-framework-paper/compare/v0.2.0...v0.2.1
230- [ 0.2.0 ] : https://github.com/BitConcepts/oea-framework-paper/compare/v0.1.0...v0.2.0
231- [ 0.1.0 ] : https://github.com/BitConcepts/oea-framework-paper/releases/tag/v0.1.0
79+ [ Unreleased ] : https://github.com/BitConcepts/oea-framework-paper/compare/v1.1.0...HEAD
80+ [ 1.1.0 ] : https://github.com/BitConcepts/oea-framework-paper/compare/v1.0.0...v1.1.0
81+ [ 1.0.0 ] : https://github.com/BitConcepts/oea-framework-paper/releases/tag/v1.0.0
0 commit comments