Skip to content

Commit dba7757

Browse files
author
miranov25
committed
docs: Phase 13.22 + 3 bug fixes in PHASE_HISTORY, S5_4 test fix
PHASE_HISTORY.md: add Phase 13.22 (recursive read_tree), three bug fix entries (draw_selection_alias, dtype_loss_subframe, draw_index_col). Update test counts (1521 → 1538), pending items, overview metrics. CAPABILITY_MATRIX.md: updated from test run (1538 passed). test_S5_draw_index_col_collision.py: S5_4 uses deterministic fixture (review P1-1 — random fixture could miss test point).
1 parent d318852 commit dba7757

3 files changed

Lines changed: 146 additions & 71 deletions

File tree

UTILS/dfextensions/AliasDataFrame/docs/CAPABILITY_MATRIX.md

Lines changed: 48 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,23 @@
11
# Capability Matrix — AliasDataFrame
22

3-
**Generated:** 2026-04-12 11:11 UTC
3+
**Generated:** 2026-04-26 07:48 UTC
44
**Phase:** 13.11.B
5-
**Taxonomy:** 41 features (PHASE_13_11_B approved)
5+
**Taxonomy:** 44 features (PHASE_13_11_B approved)
66
**Generator:** `scripts/generate_capability_matrix.py` v2 (taxonomy-based)
77

88
## Summary
99

1010
| Status | Count | % |
1111
|--------|------:|--:|
12-
| ✅ Verified | 23 | 56% |
13-
| ☑️ Smoke-only | 14 | 34% |
14-
| 🧨 Broken | 3 | 7% |
12+
| ✅ Verified | 26 | 59% |
13+
| ☑️ Smoke-only | 14 | 31% |
14+
| 🧨 Broken | 3 | 6% |
1515
| 📋 Planned | 1 | 2% |
16-
| **Total features** | **41** | |
17-
| **Matched tests** | **1525** | |
18-
| **Invariance tests** | **114** | |
16+
| **Total features** | **44** | |
17+
| **Matched tests** | **1536** | |
18+
| **Invariance tests** | **125** | |
1919

20-
**Unmatched tests:** 42 (not mapped to any feature)
20+
**Unmatched tests:** 88 (not mapped to any feature)
2121

2222
## CORE
2323

@@ -60,6 +60,9 @@
6060
|| **FUNC.polynomial** — PolynomialSpec & register_polynomial_from_subframe | 20 | 20 | 0 | 3 |
6161
|| **FUNC.evaluator** — register_evaluator | 24 | 24 | 0 | 3 |
6262
|| **FUNC.persistence** — Function persistence through schema | 9 | 9 | 0 | 2 |
63+
|| **FUNC.regression_metadata** — Regression metadata registration & update | 4 | 4 | 0 | 4 |
64+
|| **FUNC.evaluator_from_metadata** — Bridge: metadata → evaluator binding | 6 | 6 | 0 | 6 |
65+
|| **FUNC.regression_persistence** — Regression metadata schema roundtrip | 1 | 1 | 0 | 1 |
6366

6467
## DRAWING
6568

@@ -118,52 +121,52 @@
118121
## 🧨 Broken Features — Details
119122

120123
### COMP.roundtrip
121-
-`test_invariance_compression.py::TestInvarianceCompression::test_I4_3_asinh_compression_roundtrip`
122124
-`test_invariance_compression.py::TestInvarianceCompression::test_I4_2_scaled_linear_compression_roundtrip`
125+
-`test_invariance_compression.py::TestInvarianceCompression::test_I4_3_asinh_compression_roundtrip`
123126

124127
### BACK.invariance
125128
-`test_invariance_backend.py::TestInvarianceBackend::test_I2_6_chained_subframe_expressions_numba_vs_numpy`
126129

127130
### RDF.export
128-
-`test_AliasDataFrameRDF.py::TestTMemFileBranch::test_missing_keys_in_friend`
129-
-`test_AliasDataFrameRDF.py::TestRDataFrameFriendAccess::test_composite_index_friend`
130131
-`test_AliasDataFrameRDF.py::TestAddDefinesCollision::test_collision_from_friend_tree`
132+
-`test_AliasDataFrameRDF.py::TestRDataFrameFriendAccess::test_composite_index_friend`
133+
-`test_AliasDataFrameRDF.py::TestTMemFileBranch::test_missing_keys_in_friend`
131134

132135
## Unmatched Tests
133136

134-
42 tests not mapped to any feature.
135-
136-
- `test_alias_dataframe.py::TestReadTreeOptimized::test_aliases_preserved`
137-
- `test_alias_dataframe.py::TestReadTreeOptimized::test_backward_compatibility_no_metadata`
138-
- `test_alias_dataframe.py::TestReadTreeOptimized::test_basic_read`
139-
- `test_alias_dataframe.py::TestReadTreeOptimized::test_entry_range_start_stop`
140-
- `test_alias_dataframe.py::TestReadTreeOptimized::test_entry_range_stop`
141-
- `test_alias_dataframe.py::TestReadTreeOptimized::test_invalid_tree_raises_error`
142-
- `test_alias_dataframe.py::TestReadTreeOptimized::test_subframe_loaded`
143-
- `test_alias_dataframe.py::TestReadTreeOptimized::test_subframe_warning_with_entry_range`
144-
- `test_alias_dataframe.py::TestReadTreeOptimized::test_threaded_vs_unthreaded_equivalence`
145-
- `test_alias_dataframe.py::TestReadTreeWithCompression::test_compressed_columns_dtype_restored`
146-
- `test_alias_dataframe.py::TestReadTreeWithCompression::test_compression_info_preserved`
147-
- `test_alias_dataframe.py::TestReadTreeWithCompression::test_decompression_alias_works`
148-
- `test_alias_dataframe.py::TestReadTreeWithCompression::test_entry_range_with_compression`
149-
- `test_alias_dataframe.py::TestSchemaV2Ordering::test_schema_v2_groups_roundtrip`
150-
- `test_alias_dataframe.py::TestSchemaV2Ordering::test_schema_v2_groups_simple_lists`
151-
- `test_alias_dataframe.py::TestSchemaV2Ordering::test_schema_v2_order_agnostic_load`
152-
- `test_alias_dataframe.py::TestSchemaV2Ordering::test_schema_v2_order_canonical`
153-
- `test_alias_dataframe.py::TestSchemaV2Ordering::test_schema_v2_order_full`
154-
- `test_alias_dataframe.py::TestSchemaV2Ordering::test_schema_v2_order_strict_positions`
155-
- `test_alias_dataframe.py::TestSchemaV2Ordering::test_schema_v2_without_groups`
156-
- `test_fill_handling.py::TestModeComparison::test_direct_and_safe_give_same_values_when_no_missing`
157-
- `test_fill_handling.py::TestMultipleSubframes::test_different_fill_per_subframe`
158-
- `test_fill_handling.py::TestSubframeFillConfig::test_clear_subframe_fill`
159-
- `test_fill_handling.py::TestSubframeFillConfig::test_clear_subframe_fill_validates_subframe`
160-
- `test_fill_handling.py::TestSubframeFillConfig::test_set_subframe_fill_partial_override`
161-
- `test_fill_handling.py::TestSubframeFillConfig::test_set_subframe_fill_rejects_unknown_fill_mode`
162-
- `test_fill_handling.py::TestSubframeFillConfig::test_set_subframe_fill_stores_config`
163-
- `test_fill_handling.py::TestSubframeFillConfig::test_set_subframe_fill_unknown_subframe_raises`
164-
- `test_fill_handling.py::TestSubframeFillConfig::test_subframe_config_overrides_global`
165-
- `test_fill_handling.py::TestSubframeFillConfig::test_subframe_fill_mode_overrides_global`
166-
- ... +12 more
137+
88 tests not mapped to any feature.
138+
139+
- `test_D1_dtype_subframe_join.py::TestDtypeLossSubframeJoin::test_D1_int8_dtype_preserved_through_join`
140+
- `test_D1_dtype_subframe_join.py::TestDtypeLossSubframeJoin::test_D2_bool_dtype_preserved_through_join`
141+
- `test_D1_dtype_subframe_join.py::TestDtypeLossSubframeJoin::test_D3_float_dtype_unaffected`
142+
- `test_D1_dtype_subframe_join.py::TestDtypeLossSubframeJoin::test_D4_no_missing_keys_no_warning`
143+
- `test_D1_dtype_subframe_join.py::TestDtypeLossSubframeJoin::test_D5_boolean_and_operator_works`
144+
- `test_E1_export_tree_roundtrip.py::TestExportTreeMetadataRoundtrip::test_E1_1_basic_roundtrip_no_subframes`
145+
- `test_E1_export_tree_roundtrip.py::TestExportTreeMetadataRoundtrip::test_E1_2_roundtrip_3_subframes`
146+
- `test_E1_export_tree_roundtrip.py::TestExportTreeMetadataRoundtrip::test_E1_3_roundtrip_production_scale_subframes`
147+
- `test_E1_export_tree_roundtrip.py::TestExportTreeMetadataRoundtrip::test_E1_4_roundtrip_timing_report`
148+
- `test_E2_export_tree_fix_a.py::TestE2ExportTreeFixA::test_E2_1_single_tfile_open_per_export`
149+
- `test_E2_export_tree_fix_a.py::TestE2ExportTreeFixA::test_E2_2_nested_subframe_roundtrip`
150+
- `test_E2_export_tree_fix_a.py::TestE2ExportTreeFixA::test_E2_3_read_tree_backward_compatibility`
151+
- `test_E2_export_tree_fix_a.py::TestE2ExportTreeFixA::test_E2_4_standalone_write_metadata_to_root`
152+
- `test_J1_join_cache.py::TestJ1JoinCacheCorrectness::test_J1_10_dematerialize_drop_keep_mutual_exclusion`
153+
- `test_J1_join_cache.py::TestJ1JoinCacheCorrectness::test_J1_1_cached_equals_uncached`
154+
- `test_J1_join_cache.py::TestJ1JoinCacheCorrectness::test_J1_2_cache_survives_materialize_aliases`
155+
- `test_J1_join_cache.py::TestJ1JoinCacheCorrectness::test_J1_3_cache_invalidates_on_register_subframe`
156+
- `test_J1_join_cache.py::TestJ1JoinCacheCorrectness::test_J1_4_cache_invalidates_on_subframe_data_change`
157+
- `test_J1_join_cache.py::TestJ1JoinCacheCorrectness::test_J1_5_multi_subframe_pipeline`
158+
- `test_J1_join_cache.py::TestJ1JoinCacheCorrectness::test_J1_6_dematerialize_drop_and_recover`
159+
- `test_J1_join_cache.py::TestJ1JoinCacheCorrectness::test_J1_7_dematerialize_keep`
160+
- `test_J1_join_cache.py::TestJ1JoinCacheCorrectness::test_J1_8_dematerialize_all`
161+
- `test_J1_join_cache.py::TestJ1JoinCacheCorrectness::test_J1_9_dematerialize_ignores_raw_columns`
162+
- `test_J1_join_cache.py::TestJ2JoinCachePerformance::test_J2_1_cache_hit_count_across_materialize_calls`
163+
- `test_K1_vector_draw_kwarg_diagnostic.py::TestK1VectorDrawKwargDiagnostic::test_K1_1_draw_accepts_all_documented_kwargs`
164+
- `test_K1_vector_draw_kwarg_diagnostic.py::TestK1VectorDrawKwargDiagnostic::test_K1_2_draw_forwards_kwargs_to_dfdraw`
165+
- `test_K1_vector_draw_kwarg_diagnostic.py::TestK1VectorDrawKwargDiagnostic::test_K1_3_draw_batch_forwards_batch_kwargs`
166+
- `test_K1_vector_draw_kwarg_diagnostic.py::TestK1VectorDrawKwargDiagnostic::test_K1_4_draw_figures_forwards_figure_kwargs`
167+
- `test_K1_vector_draw_kwarg_diagnostic.py::TestK1VectorDrawKwargDiagnostic::test_K1_5_vector_expression_each_call_gets_full_kwargs`
168+
- `test_K2_vector_draw_end_to_end.py::TestK2VectorDrawEndToEnd::test_K2_1_scalar_groupby_baseline`
169+
- ... +58 more
167170

168171
---
169172
*Generated from pytest JSON + feature_taxonomy.py (v2 taxonomy-based).*

UTILS/dfextensions/AliasDataFrame/docs/PHASE_HISTORY.md

Lines changed: 68 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,8 @@ AliasDataFrame is a high-performance data analysis framework for particle physic
4242

4343
**Key Metrics:**
4444
- Performance: 60-770x speedups achieved; production pipeline 2× faster (1452s → 722s)
45-
- Test Coverage: 1521 tests passing, 125 invariance tests
46-
- Lines of Code: ~12,800 (AliasDataFrame.py)
45+
- Test Coverage: 1538 tests passing, 125 invariance tests
46+
- Lines of Code: ~12,930 (AliasDataFrame.py)
4747
- Features: 44 in taxonomy (26 verified, 14 smoke-only, 3 broken, 1 planned)
4848

4949
**Development Team:**
@@ -234,8 +234,66 @@ Mode #11 discipline).
234234

235235
---
236236

237+
### Phase 13.22.ADF: Recursive Subframe Loading in read_tree
238+
**Dates**: 2026-04-19
239+
**Status**: ✅ Merged
240+
**Commit**: `cf62cf33`
241+
242+
**Fix**: One-line change — `read_tree` line 5298 `load_subframes=False``True`. `export_tree` already writes nested subframes recursively (Phase 13.20). `read_tree` now loads them recursively too.
243+
244+
**Result**: E2_2 (nested subframe roundtrip) changed from FAILING to PASSING. 1523 passed (+2).
245+
246+
**Out of scope**: Multi-level dotted expression resolution (`Outer.Inner.val`) — deferred to Phase 13.23.ADF proposal. Metadata skip (A3) — reverted, needs minimal-UserInfo approach.
247+
248+
**Test results**: 1523 passed, 6F+1E pre-existing.
249+
250+
---
251+
237252
## Bug Fixes
238253

254+
### BUG_AliasDataFrame_20260420_draw_selection_alias
255+
**Dates**: 2026-04-20
256+
**Status**: ✅ Fixed
257+
**Commit**: `6f93e2d1`
258+
259+
**Problem**: `draw_batch()` and `draw_figures()` do not pass `selection` or `weights` to `_parse_expr_aliases()`. Aliases used only in selections (e.g., `isNotEdge`) are never auto-materialized. `pandas.eval` fails with `name 'isNotEdge' is not defined`.
260+
261+
**Root cause**: Two call sites pass `(expr, group_by, color)` but omit `selection=` and `weights=`. `draw()` was correct (already passes all params).
262+
263+
**Fix**: 2 lines per method — pass `selection` and `weights`.
264+
265+
**Tests**: S1 (xfail — draw lazy=False limitation), S1b, S2, S3, S4.
266+
267+
**Discovered**: Production QA (`makeIterationFit123_QA`) on gr17.
268+
269+
### BUG_AliasDataFrame_20260424_dtype_loss_subframe_join
270+
**Dates**: 2026-04-24
271+
**Status**: ✅ Fixed
272+
**Commit**: `bfb4d22f`
273+
274+
**Problem**: Aliases declared with integer/bool dtype lose their dtype through subframe joins. The join produces NaN for missing keys; pandas raises `IntCastingNaNError` on `.astype(int8)`; the cast fails silently; result stays float32. Downstream bool operators (`isPrimITS & isNotEdge`) fail.
275+
276+
**Root cause**: Three `.astype()` calls caught only `AttributeError`, not `IntCastingNaNError`.
277+
278+
**Fix**: New `_safe_dtype_cast()` helper fills NaN with 0 (int) or False (bool) before casting, with `RuntimeWarning`. Replaces all 3 raw `.astype()` calls.
279+
280+
**Tests**: D1-D5 (int8, bool, float unaffected, no-NaN no-warning, bool & bool production pattern).
281+
282+
### BUG_AliasDataFrame_20260426_draw_index_col_collision
283+
**Dates**: 2026-04-26
284+
**Status**: ✅ Fixed
285+
**Commit**: `d3188527`
286+
287+
**Problem**: `adf.draw('Sub.col:Sub.index_col')` raises `KeyError` when the plotted column is also one of the subframe's `index_columns`. The draw resolver selects the column twice via `sf.df[index_cols + [col_name]]`, then `.rename()` renames both copies (pandas rename is name-based), destroying the join key.
288+
289+
**Root cause**: No guard for `col_name in index_cols` in two draw resolvers.
290+
291+
**Fix**: 5 lines per resolver — guard `col_name in index_cols`, copy + add instead of select + rename. Two locations: `draw()` and `draw_figures()`. `draw_batch()` uses direct index lookup — not affected.
292+
293+
**Tests**: S5_1-S5_6 (index col on x/y axis, selection, correctness, draw_figures, both axes as index cols).
294+
295+
**Discovered**: O2DistAI Phase 0.3 (`makeTrackPairGB` QA plots).
296+
239297
### BUG_AliasDataFrame_20260324_draw_subframe_resolution
240298
**Dates**: 2026-03-25
241299
**Status**: ✅ Fixed
@@ -680,21 +738,27 @@ Remaining overhead is Python/Pandas framework cost.
680738
| 13.19.ADF.FIX1 | 8 (K1+K2) | 1499 |
681739
| 13.20.ADF | 8 (E1+E2) | 1510 |
682740
| 13.21.ADF | 11 (J1+J2) | 1521 |
741+
| 13.22.ADF | 0 (E2_2 fixed) | 1523 |
742+
| BUG draw_selection_alias | 5 (S1-S4) | 1528 |
743+
| BUG dtype_loss_subframe | 5 (D1-D5) | 1533 |
744+
| BUG draw_index_col | 6 (S5_1-S5_6) | 1538 |
683745

684746
---
685747

686748
## Pending Items
687749

688750
- [x] ~~CAPABILITY_MATRIX.md creation~~ (Phase 13.11)
689751
- [x] ~~PHASE_BEGIN_AliasDataFrame tag~~ (Phase 13.20 close)
752+
- [x] ~~`read_tree` recursive subframe loading~~ (Phase 13.22)
753+
- [ ] Phase 13.23.ADF — Multi-level dotted expression resolution (v1.2 approved)
690754
- [ ] A2 — LZ4 default compression (one-line + compat test, ~15-20s savings)
691-
- [ ] A3 — Batch metadata serialization (~20-30s savings)
692-
- [ ] `read_tree` recursive subframe loading (line 5172 `load_subframes=False``True`)
755+
- [ ] A3 — Batch metadata serialization (~50-55s savings, needs minimal-UserInfo approach)
693756
- [ ] GB tuple support for `linear_columns` (PolynomialSpec production blocker)
694757
- [ ] Technical Summary v1.6 full public API documentation (~90 methods)
695758
- [ ] P1 tests: I2_6, I4_2, I4_3 fixes
696759
- [ ] Fix `register_subframe_lazy()` bug (BUG_AliasDataFrame_20260116)
697760
- [ ] Axis title lookup for subframe columns (`Sub_dy` vs `Side.dy`)
761+
- [ ] draw() lazy=False doesn't materialize selection aliases (S1 xfail)
698762

699763
---
700764

UTILS/dfextensions/AliasDataFrame/tests/test_S5_draw_index_col_collision.py

Lines changed: 30 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -89,29 +89,37 @@ def test_S5_4_draw_values_match_subframe(self):
8989
"""
9090
S5_4: values in drawn profile match manual subframe lookup.
9191
Not just no-crash — correctness check.
92+
Uses deterministic fixture to guarantee test point exists.
9293
"""
93-
adf = _build_adf_with_index_col_subframe()
94-
sf = adf.get_subframe('Stats')
95-
96-
# Materialize the subframe column via draw's lazy path
97-
adf.materialize_aliases() # ensure all aliases materialized
98-
99-
# The draw resolver creates Stats_dy_median on df_subset via merge.
100-
# We verify the merge is correct by checking a few values manually.
101-
# Pick sec=5, tgl=10 — look up dy_median from subframe
102-
mask_main = (adf.df['sec'] == 5) & (adf.df['tgl'] == 10)
103-
mask_sub = (sf.df['sec'] == 5) & (sf.df['tgl'] == 10)
104-
105-
if mask_main.any() and mask_sub.any():
106-
expected = sf.df.loc[mask_sub, 'dy_median'].iloc[0]
107-
# Use _prepare_subframe_joins to get the joined column
108-
adf.add_alias('test_val', 'Stats.dy_median', dtype=np.float32)
109-
adf.materialize_aliases(names=['test_val'])
110-
actual = adf.df.loc[mask_main, 'test_val'].iloc[0]
111-
np.testing.assert_allclose(
112-
actual, expected, rtol=1e-5,
113-
err_msg="S5_4: joined value doesn't match subframe lookup"
114-
)
94+
# Deterministic: every (sec, tgl) combination present
95+
main_df = pd.DataFrame({
96+
'sec': np.array([0, 1, 2, 0, 1, 2], dtype=np.int8),
97+
'tgl': np.array([0, 0, 0, 1, 1, 1], dtype=np.int8),
98+
'x': np.array([1, 2, 3, 4, 5, 6], dtype=np.float32),
99+
})
100+
sub_df = pd.DataFrame({
101+
'sec': np.array([0, 1, 2, 0, 1, 2], dtype=np.int8),
102+
'tgl': np.array([0, 0, 0, 1, 1, 1], dtype=np.int8),
103+
'dy_median': np.array([10, 20, 30, 40, 50, 60], dtype=np.float32),
104+
})
105+
106+
adf = AliasDataFrame(main_df)
107+
sf = AliasDataFrame(sub_df)
108+
adf.register_subframe('Stats', sf, index_columns=['sec', 'tgl'])
109+
110+
adf.add_alias('test_val', 'Stats.dy_median', dtype=np.float32)
111+
adf.materialize_aliases(names=['test_val'])
112+
113+
# sec=1, tgl=0 → dy_median=20
114+
mask = (adf.df['sec'] == 1) & (adf.df['tgl'] == 0)
115+
assert mask.any(), "fixture must contain test point"
116+
actual = adf.df.loc[mask, 'test_val'].iloc[0]
117+
assert actual == 20.0, f"S5_4: expected 20.0, got {actual}"
118+
119+
# sec=2, tgl=1 → dy_median=60
120+
mask2 = (adf.df['sec'] == 2) & (adf.df['tgl'] == 1)
121+
actual2 = adf.df.loc[mask2, 'test_val'].iloc[0]
122+
assert actual2 == 60.0, f"S5_4: expected 60.0, got {actual2}"
115123

116124
def test_S5_5_draw_figures_index_col(self):
117125
"""

0 commit comments

Comments
 (0)