|
16 | 16 |
|
17 | 17 | --- |
18 | 18 |
|
| 19 | +> **2026-06-20 — branch work (`claude/happy-hamilton-0azlw4`)** — **UNICHARSET `other_case` transcoded + byte-parity proven (E-CPP-PARITY-5), the fifth leaf.** `UniCharSet` now parses the case-pair id (the token right after the script) into `other_cases: Vec<i32>`, applying the load-time clamp (`unicharset.cpp:901`: a value `>= size`, incl. the absent default, folds to the id itself). Exposes `get_other_case` + `dump_other_case`, mirroring `unicharset.h:703` (out-of-range id → `INVALID_UNICHAR_ID` -1). **Byte-identical 112/112** on real `eng.lstm-unicharset` vs tesseract's own `get_other_case` (self-validating oracle, `other_case` mode; 60/112 self, 52 real pairs, e.g. `C`→`c`). Last field cleanly reachable by token-offset; direction/mirror/bbox need the multi-tier parser (next, larger leaf). Additive, zero-dep; +4 contract tests (23 unicharset total), clippy `-D warnings` + fmt clean; reproducible via `examples/unicharset_dump.rs other_case`. Consumed by `tesseract-core::CharSet::get_other_case` (+1 boundary test, 6/6). No Core gap. EPIPHANIES `E-CPP-PARITY-5`. |
| 20 | +> |
19 | 21 | > **2026-06-20 — branch work (`claude/happy-hamilton-0azlw4`)** — **UNICHARSET script table transcoded + byte-parity proven (E-CPP-PARITY-4), the fourth leaf — first to transcode an INTERNING side-table.** `UniCharSet` now parses the per-line script name (the token after the optional bbox/stats CSV), interns it via an `add_script`-equivalent (`unicharset.cpp:1063`, insertion-order dedup) into `scripts: Vec<String>` with `null_script` ("NULL") seeded at sid 0 (the `unichar_insert` set_script, `unicharset.cpp:680`; so `null_sid_ == 0` always), and stores `script_ids: Vec<i32>`. Exposes `get_script` / `get_script_table_size` / `script_from_script_id` / `script_of` / `dump_script`, mirroring `unicharset.h:681` (out-of-range → `null_sid_` 0). **Byte-identical 112/112** on real `eng.lstm-unicharset` vs tesseract's own `get_script` (same self-validating oracle, `script` mode; oracle table = `["NULL","Common","Latin"]` confirmed empirically before writing the Rust). Mixed-tier safe (eng id 0 is tier-5 no-CSV, others tier-1 CSV). Additive, zero-dep; +4 contract tests (19 unicharset total), clippy `-D warnings` + fmt clean; reproducible via `examples/unicharset_dump.rs script`. Consumed by `tesseract-core::CharSet::{get_script,script_of}` (+1 boundary test, 5/5). No Core gap. EPIPHANIES `E-CPP-PARITY-4`. Next leaf: the full column tier-parser (unlocks other_case/mirror/direction/bbox). |
20 | 22 | > |
21 | 23 | > **2026-06-20 — branch work (`claude/happy-hamilton-0azlw4`)** — **UNICHARSET property accessors transcoded + byte-parity proven (E-CPP-PARITY-3), the third leaf through PROBE-OGAR-ADAPTER-UNICHARSET.** `lance_graph_contract::unicharset::UniCharSet` now parses the per-line hex property bitmask (`unicharset.cpp:824`) into a `props: Vec<u8>` and exposes `get_is{alpha,lower,upper,digit,punctuation}` + `get_isngram` + `dump_properties()`, mirroring the C++ inline accessors (`unicharset.h:497+`; out-of-range id → `false`, `INVALID_UNICHAR_ID` semantics). **Byte-identical 112/112** on real `eng.lstm-unicharset` vs tesseract's own `get_is*` via a **self-validating** oracle: the same harness dumps the id↔unichar bijection (proven 112/112 reference, E-CPP-PARITY-1) AND the properties — the bijection half diffing 0 proves the 5.5.0-header/5.3.4-lib layout is sound, making the property diff (also 0) trustworthy despite the version skew. Additive, zero-dep; +5 contract tests (15 unicharset total), clippy `-D warnings` + fmt clean. Consumed by `tesseract-core` as `CharSet::get_is*` (+1 consumer-boundary test, 4/4 green). Incidental: rustfmt-1.9.0 normalized two pre-existing test-assert wraps in `class_view.rs` (whitespace-only). No Core gap, no adapter state (per `E-CPP-KEYSTONE-1` "repetition of a validated pattern"). EPIPHANIES `E-CPP-PARITY-3`. |
|
110 | 112 |
|
111 | 113 | > **2026-06-18 — ADDED (D-DO-ARM-1, the OGAR DO arm)**: `lance_graph_contract::action::{ActionState, StateGuard, ActionDef, ClassActions, actions_for, effective_actions, ActionInvocation}` — the Perdurant DO arm completing the OGAR IR (the action-axis sibling of `codegen_manifest`'s `MethodSig`/THINK). Both the 4-agent `sale_order` AR→DO probe (runtime-archaeologist) AND the merged cross-repo PR survey (ruff/OGAR/lance-graph/openproject/tesseract) agreed this was the ONE missing wire: the THINK arm (`classid → ClassView`, `has_function → MethodSig`) is converged + merged; the DO-arm `ActionInvocation`/`ActionDef` type was ABSENT. **`ActionDef`** (static, `const`-constructible, all `&'static`/`Copy`): `predicate` (= harvested `has_function` method), `object_class` (classid), `exec` (`ExecTarget` incl `SurrealQl`), `guard` (`StateGuard` = KausalSpec field==value), `required_role` (RBAC), `overrides` (OGAR `classid→ClassView` inheritance). **`ClassActions`+`actions_for`** (zero-fallback) mirror `ClassMethods`/`methods_for`. **`effective_actions(parent, child)`** = OGAR inheritance on the action axis (child overrides parent by predicate). **`ActionInvocation`** (dynamic, `Copy`): lifecycle `ActionState{Pending→Committed|Failed|Cancelled}` (sticky terminals), S2.5 `cycle` stamp, idempotency/trace keys, HLC `emitted_at_millis`. **`ActionInvocation::commit(def, actor, impact, now)`** is the gated egress — RBAC FIRST (`auth::ActorContext` must hold `required_role` or be admin → else `Failed`), THEN MUL impact (`mul::GateDecision`: `Flow→Committed`+stamped, `Hold→`Pending/escalate, `Block→Cancelled`). This IS "commit to the external consumer (odoo/openproject/woa/tesseract) after the cycle decides sound." Dispatched via `UnifiedStep`/`ExecTarget`, NOT a per-crate endpoint. Additive, zero-dep. +5 tests green. Consumer reference: `docs/OGAR_CONSUMER_API.md`. Branch `claude/soa-write-deinterlace-inc2`. |
112 | 114 |
|
| 115 | +> **2026-06-20 — ADDED (D-UNICHARSET-OTHERCASE, the case-pair leaf)**: `lance_graph_contract::unicharset::UniCharSet` gained `get_other_case(id) -> i32` + `dump_other_case()`, backed by `other_cases: Vec<i32>`. The case-paired unichar id (`'C'`→`'c'`), parsed as the token after the script and clamped at load (`unicharset.cpp:901`: a value `>= size`, and the absent default = size, fold to the id itself). Out-of-range id → `INVALID_UNICHAR_ID` -1 (`unicharset.h:703`). **Byte-identical 112/112** vs tesseract's own `get_other_case` on real `eng.lstm-unicharset` (self-validating oracle `other_case` mode; 60 self / 52 pairs). Additive, zero-dep. +4 tests (23 unicharset total). Consumed by `tesseract-core::CharSet::get_other_case`. EPIPHANIES `E-CPP-PARITY-5`; fifth leaf of `PROBE-OGAR-ADAPTER-UNICHARSET`; the last field reachable by token-offset (direction/mirror/bbox need the multi-tier parser). Branch `claude/happy-hamilton-0azlw4`. |
| 116 | +
|
113 | 117 | > **2026-06-20 — ADDED (D-UNICHARSET-SCRIPT, the script-table leaf)**: `lance_graph_contract::unicharset::UniCharSet` gained `get_script(id) -> i32` / `get_script_table_size()` / `script_from_script_id(sid) -> Option<&str>` / `script_of(id) -> Option<&str>` / `dump_script()`, backed by new `script_ids: Vec<i32>` + an interned `scripts: Vec<String>`. The first leaf to transcode an **interning side-table** (`add_script`, `unicharset.cpp:1063`): `null_script` "NULL" seeded at sid 0 (the `unichar_insert` set_script, `unicharset.cpp:680` → `null_sid_ == 0`), real scripts intern from 1 in id order. Script name = token after the optional bbox/stats CSV (mixed-tier safe). Out-of-range → `null_sid_` 0 (`unicharset.h:681`). **Byte-identical 112/112** vs tesseract's own `get_script` on real `eng.lstm-unicharset` (self-validating oracle `script` mode; table `["NULL","Common","Latin"]`). Additive, zero-dep, behaviour-preserving on the bijection. +4 tests (19 unicharset total). Consumed by `tesseract-core::CharSet::{get_script,script_of}`. EPIPHANIES `E-CPP-PARITY-4`; fourth leaf of `PROBE-OGAR-ADAPTER-UNICHARSET`. Branch `claude/happy-hamilton-0azlw4`. |
114 | 118 |
|
115 | 119 | > **2026-06-20 — ADDED (D-UNICHARSET-PROPS, the property-accessor leaf)**: `lance_graph_contract::unicharset::UniCharSet` gained the character-category surface `get_isalpha` / `get_islower` / `get_isupper` / `get_isdigit` / `get_ispunctuation` / `get_isngram` + `dump_properties()`, backed by a new `props: Vec<u8>` parsed from the per-line hex bitmask (`unicharset.cpp:824`; masked to `ISALPHA=0x1 ISLOWER=0x2 ISUPPER=0x4 ISDIGIT=0x8 ISPUNCTUATION=0x10`). Accessors mirror the C++ inline guard (`unicharset.h:497+`): out-of-range id → `false` (`INVALID_UNICHAR_ID`); `get_isngram` is always-false on the plain-table load path (`unicharset.cpp:893`). **Byte-identical 112/112** vs tesseract's own `get_is*` on real `eng.lstm-unicharset` (self-validating oracle: bijection half cross-checks the 5.5.0-header/5.3.4-lib layout, then the property half diffs 0). Additive, zero-dep, behaviour-preserving on the existing id↔unichar bijection (lenient default-0 for a missing/!hex token). +5 tests (15 unicharset total). Consumed by `tesseract-core::CharSet::get_is*`. EPIPHANIES `E-CPP-PARITY-3`; the third leaf of `PROBE-OGAR-ADAPTER-UNICHARSET` (after D-UNICHARSET-1 + D-UNICHAR-1). Branch `claude/happy-hamilton-0azlw4`. |
|
0 commit comments