Skip to content

Commit 9c904b3

Browse files
author
miranov25
committed
docs: PHASE_HISTORY update for Phases 13.23, 13.24, 13.25 + BUG_20260427
Added entries for: 13.23.ADF — multi-level dotted expression resolution (A.B.C.val) 13.24.ADF — read-only aliases hardening (two-cycle, 2 failed attempts documented) 13.25.ADF — quantiles pass-through tests (Q1_1-Q1_5, informal) BUG_20260427 — alias invalidation cascade (P0 Safety) Updated: overview metrics (1584 tests, 177 invariance, 47 features, pipeline 2.1x), test counts table (+6 rows), architecture decisions (+5 entries), pending items (13.23/13.24 done, Phase 14 added). No content deleted — append-only update.
1 parent e827853 commit 9c904b3

1 file changed

Lines changed: 119 additions & 11 deletions

File tree

UTILS/dfextensions/AliasDataFrame/docs/PHASE_HISTORY.md

Lines changed: 119 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# AliasDataFrame Phase History
22

33
> **Purpose**: Development history for architecture reviews and restart prompts.
4-
> **Last Updated**: 2026-04-12
4+
> **Last Updated**: 2026-05-10
55
> **Maintained By**: Marian Ivanov (miranov25)
66
77
## How to Use This File
@@ -41,10 +41,10 @@ This file is intended for AI reviewers and human collaborators as a **restart co
4141
AliasDataFrame is a high-performance data analysis framework for particle physics research at CERN's ALICE experiment. It provides schema-driven, lazy-evaluated columns and hierarchical joins for ROOT/Parquet data.
4242

4343
**Key Metrics:**
44-
- Performance: 60-770x speedups achieved; production pipeline 2× faster (1452s → 722s)
45-
- Test Coverage: 1538 tests passing, 125 invariance tests
46-
- Lines of Code: ~12,930 (AliasDataFrame.py)
47-
- Features: 44 in taxonomy (26 verified, 14 smoke-only, 3 broken, 1 planned)
44+
- Performance: 60-770x speedups achieved; production pipeline 2.1× faster (1452s → 692s)
45+
- Test Coverage: 1584 tests passing, 177 invariance tests
46+
- Lines of Code: ~13,600 (AliasDataFrame.py)
47+
- Features: 47 in taxonomy (28 verified, 14 smoke-only, 4 broken, 1 planned)
4848

4949
**Development Team:**
5050
- Coordinator: Marian Ivanov (miranov25)
@@ -55,8 +55,8 @@ AliasDataFrame is a high-performance data analysis framework for particle physic
5555

5656
## Phase 13: Advanced Features
5757

58-
**Dates**: 2026-03-22 to 2026-03-27
59-
**Status**: 🔄 In Progress
58+
**Dates**: 2026-03-22 to 2026-05-05
59+
**Status**: 🔄 In Progress (ADF maintenance mode; dfdraw Phase A active)
6060

6161
### Phase 13.12.DF: Profile Enhancements
6262
**Date**: 2026-03-22
@@ -249,6 +249,85 @@ Mode #11 discipline).
249249

250250
---
251251

252+
### Phase 13.23.ADF: Multi-Level Dotted Expression Resolution
253+
**Dates**: 2026-04-27 to 2026-04-29
254+
**Status**: ✅ Merged
255+
**Commits**: `7c4116e1` (step 1: dependency_tree), `878ee941` (step 2: multi-level + invalidation), `cb33ae66` (close: taxonomy 44→47)
256+
257+
**Proposal**: `PHASE_13_23_ADF_v1.2_Proposal.md` (8-reviewer panel, 5×[OK][!][X])
258+
259+
**Step 1 — dependency_tree HTML/list output modes** (`7c4116e1`):
260+
- `dependency_tree()` enhanced with 3 output modes: `output='text'` (unchanged), `output='html'` (interactive collapsible tree), `output='list'` (flat topological order)
261+
- Accepts str or list of str for multiple roots
262+
- 4 new methods: `dependency_tree`, `_dependency_tree_build`, `_dependency_tree_list`, `_dependency_tree_html`
263+
- Tests T1-T10 (10 tests)
264+
265+
**Step 2 — multi-level dotted expression resolution** (`878ee941`):
266+
- Enables `A.B.C.val` syntax for nested subframe references
267+
- `_scatter_subframe_column`: factored helper from `_prepare_subframe_joins` (70 lines)
268+
- `_prepare_subframe_joins`: greedy left→right walk + bottom-up scatter (85 lines)
269+
- `MAX_SUBFRAME_DEPTH = 10` + `visited_ids` cycle guard (`ValueError` on cycles)
270+
- `add_alias` regex fix: `\.\w+``(?:\.\w+)+` at 3 locations for multi-segment chains
271+
- All 3 draw resolvers updated with greedy walk
272+
- KeyError preserved for confirmed subframe ref with invalid leaf column
273+
- Tests N1_0-N1_10 (11 invariance tests)
274+
275+
**Also includes**:
276+
- `_invalidate_alias_cascade()`: alias invalidation bug fix (BUG_20260427, see Bug Fixes)
277+
- Removed `test_M1_metadata_skip.py` (reverted feature from Phase 13.22)
278+
279+
**Phase close** (`cb33ae66`): feature_taxonomy.py updated 44→47 features (+SUB.multilevel, +CORE.dependency_tree, +CORE.invalidation). CAPABILITY_MATRIX regenerated: 29 verified, 177 invariance tests.
280+
281+
**Test results**: 1566 passed, 7F+1E pre-existing.
282+
283+
---
284+
285+
### Phase 13.24.ADF: Read-Only Aliases Hardening
286+
**Dates**: 2026-04-29 to 2026-04-30
287+
**Status**: ✅ Merged
288+
**Commits**: `9fe1e620` (Part A), `99881100` (Part B)
289+
290+
**Proposal**: `PHASE_13_24_ADF_v1.2_Proposal.md` (5-reviewer panel, reviewer-drafted by Claude36 after 2 failed coder attempts)
291+
292+
**Background — two failed attempts** (both reverted):
293+
- Attempt 1 (`MappingProxyType`): 47 tests broken, 49 errors — `MappingProxyType` is not JSON-serializable, broke `export_tree`
294+
- Attempt 2 (`_ReadOnlyAliasDict` without internal audit): 15 tests broken — `apply_schema` line 9303 writes through `self.aliases[name] = expr`, a pre-existing silent no-op exposed by the property change
295+
296+
**Root cause**: Internal code path in `apply_schema()` used the public `aliases` property as a write surface. The audit found exactly 1 such site (line 9303).
297+
298+
**Part A — Internal write redirection** (`9fe1e620`, behavior-neutral):
299+
- Single-line redirect: `self.aliases[name] = expr``self._restore_aliases_from_dict({name: expr})`
300+
- Zero test delta (behavior-neutral)
301+
302+
**Part B — Property hardening** (`99881100`):
303+
- `_ReadOnlyAliasDict(dict)` subclass: blocks `__setitem__`, `__delitem__`, `update`, `pop`, `popitem`, `clear`; `setdefault` returns existing key value (read path), raises on absent key (write path); `__reduce__` for pickle
304+
- `_ReadOnlyConstantAliasSet(set)` subclass: blocks `add`, `remove`, `discard`, `pop`, `clear`, `update`, `intersection_update`, `difference_update`, `symmetric_difference_update`; `__reduce__` for pickle
305+
- Three properties return read-only views: `aliases`, `alias_dtypes`, `constant_aliases`
306+
- Updated 6 existing MutationSafety tests to assert `TypeError`
307+
- New tests: V8_1-V8_9 (9 tests), V9_1-V9_3 (3 tests), N1_11+N1_11b (2 tests) = 14 total
308+
309+
**Key decisions**:
310+
- `dict` subclass (not `MappingProxyType`) preserves `isinstance(x, dict)` and JSON serialization
311+
- Module-level classes with leading underscore
312+
- Parameterized mutation message per property
313+
314+
**Test results**: 1579 passed, 6F+1E pre-existing.
315+
316+
---
317+
318+
### Phase 13.25.ADF: Quantiles Pass-Through Tests
319+
**Dates**: 2026-05-05
320+
**Status**: ✅ Merged (informal, test infrastructure only)
321+
**Commit**: `16c3ca4c`
322+
323+
No formal phase — test infrastructure verifying ADF correctly forwards `quantiles=`, `central=`, `quantile_mode=` kwargs to dfdraw `profile()`. Added after dfdraw Phase 13.25.DF shipped quantile support.
324+
325+
**Tests**: Q1_1-Q1_5 (5 tests): error_bars via ADF, band via ADF, parity ADF vs dfdraw (stats numerical equality), central='median' forwarded, group_by + quantiles.
326+
327+
**Test results**: 1584 passed, 7F+1E pre-existing.
328+
329+
---
330+
252331
## Bug Fixes
253332

254333
### BUG_AliasDataFrame_20260420_draw_selection_alias
@@ -294,6 +373,21 @@ Mode #11 discipline).
294373

295374
**Discovered**: O2DistAI Phase 0.3 (`makeTrackPairGB` QA plots).
296375

376+
### BUG_AliasDataFrame_20260427_alias_invalidation
377+
**Dates**: 2026-04-27
378+
**Status**: ✅ Fixed (included in Phase 13.23.ADF step 2 commit)
379+
**Commit**: `878ee941` (part of Phase 13.23.ADF)
380+
381+
**Problem**: `add_alias()` updates the expression in `_schema["columns"]` but does NOT drop the old materialized column from `self.df`. Stale values persist silently. Aliases that transitively depend on the redefined alias also keep their stale materialized values. Production impact: iterative calibration with coefficient-swap pattern produces wrong corrections.
382+
383+
**Root cause**: No invalidation mechanism — `add_alias()` overwrites the schema entry but never checks if the old value was materialized.
384+
385+
**Fix**: New `_invalidate_alias_cascade(name)` method — builds reverse dependency map via `_resolve_dependencies()`, BFS from changed alias to find all transitive dependents, drops all stale materialized columns. Raw columns never dropped. Called in `add_alias()` after schema write.
386+
387+
**Tests**: V1-V7 (7 invariance tests): basic redefine, cascade, 3-level cascade, unrelated not dropped, raw columns protected, new alias no drop, production pattern (iterative calibration with subframe coefficient swap).
388+
389+
**Severity**: P0 Safety — silent wrong results in iterative calibration workflows.
390+
297391
### BUG_AliasDataFrame_20260324_draw_subframe_resolution
298392
**Dates**: 2026-03-25
299393
**Status**: ✅ Fixed
@@ -679,6 +773,11 @@ All major decisions require consensus from 3+ AI reviewers:
679773
| Evaluator schema stores contract only | User must re-register after load; GBAI handles serialization |
680774
| Multi-predictor requires explicit selection | Silent default on multi-predictor is P1 violation |
681775
| pd.merge on index columns only | Memory O(N × index_cols), not O(N × all_cols) |
776+
| `_ReadOnlyAliasDict(dict)` not `MappingProxyType` | JSON-serializable + `isinstance(x, dict)` True; MappingProxyType broke 47 tests |
777+
| Two-cycle merge for property changes | Audit internal writes (Part A) before changing return type (Part B); validated by 2 failed single-cycle attempts |
778+
| `_restore_aliases_from_dict` as sanctioned write path | 3 independent reviewers (GPT10, GPT11, Claude37) converged; avoids spreading `setdefault({})["expr"]` across file |
779+
| Greedy left→right walk for multi-level subframes | `A.B.C.val` parsed segment-by-segment; first non-subframe segment = leaf column |
780+
| MAX_SUBFRAME_DEPTH = 10 + visited_ids cycle guard | Prevents infinite recursion on self-referential subframe registration |
682781

683782
---
684783

@@ -694,8 +793,8 @@ All major decisions require consensus from 3+ AI reviewers:
694793
| Phase 5 | 25× | vs TTree::Draw |
695794
| Phase 13.9 | 42× | Numba polynomial evaluator vs eval |
696795
| Phase 13.20 | ~29s saved | export_tree metadata batching (1 TFile.Open vs N+1) |
697-
| Phase 13.21 | ~40-50s expected | Join cache survives materialize_aliases |
698-
| **Production** | **2× (1452→722s)** | **Cross-team: GB + ADF + O2DistAI fixes combined** |
796+
| Phase 13.21 | ~37s saved | Join cache survives materialize_aliases (56s → 19.5s) |
797+
| **Production** | **2.1× (1452→692s)** | **Cross-team: GB + ADF + O2DistAI fixes combined** |
699798

700799
### Memory Savings
701800

@@ -742,6 +841,11 @@ Remaining overhead is Python/Pandas framework cost.
742841
| BUG draw_selection_alias | 5 (S1-S4) | 1528 |
743842
| BUG dtype_loss_subframe | 5 (D1-D5) | 1533 |
744843
| BUG draw_index_col | 6 (S5_1-S5_6) | 1538 |
844+
| BUG alias_invalidation | 7 (V1-V7) | 1545 |
845+
| 13.23.ADF step 1 | 10 (T1-T10) | 1548 |
846+
| 13.23.ADF step 2 | 11 (N1_0-N1_10) | 1566 |
847+
| 13.24.ADF Part B | 14 (V8+V9+N1_11) | 1579 |
848+
| 13.25.ADF (Q1) | 5 (Q1_1-Q1_5) | 1584 |
745849

746850
---
747851

@@ -750,15 +854,19 @@ Remaining overhead is Python/Pandas framework cost.
750854
- [x] ~~CAPABILITY_MATRIX.md creation~~ (Phase 13.11)
751855
- [x] ~~PHASE_BEGIN_AliasDataFrame tag~~ (Phase 13.20 close)
752856
- [x] ~~`read_tree` recursive subframe loading~~ (Phase 13.22)
753-
- [ ] Phase 13.23.ADF — Multi-level dotted expression resolution (v1.2 approved)
857+
- [x] ~~Phase 13.23.ADF — Multi-level dotted expression resolution~~ (v1.2 approved, merged 2026-04-29)
858+
- [x] ~~Phase 13.24.ADF — Read-only aliases hardening~~ (v1.2 approved, merged 2026-04-30)
754859
- [ ] A2 — LZ4 default compression (one-line + compat test, ~15-20s savings)
755860
- [ ] A3 — Batch metadata serialization (~50-55s savings, needs minimal-UserInfo approach)
756-
- [ ] GB tuple support for `linear_columns` (PolynomialSpec production blocker)
861+
- [ ] Phase 14 — ADFStore concept (formal architect review proposal needed)
862+
- [ ] AD-50 Option C1b — `_cached_last_ax` for Drawer state (~15 lines, deferred, not urgent)
757863
- [ ] Technical Summary v1.6 full public API documentation (~90 methods)
758864
- [ ] P1 tests: I2_6, I4_2, I4_3 fixes
759865
- [ ] Fix `register_subframe_lazy()` bug (BUG_AliasDataFrame_20260116)
760866
- [ ] Axis title lookup for subframe columns (`Sub_dy` vs `Side.dy`)
761867
- [ ] draw() lazy=False doesn't materialize selection aliases (S1 xfail)
868+
- [ ] draw() resolver unification (3 parallel implementations)
869+
- [ ] Feature taxonomy: 61 unmatched tests remaining (schema-versioning, fill-handling gaps)
762870

763871
---
764872

0 commit comments

Comments
 (0)