Skip to content

Commit 67d125e

Browse files
author
miranov25
committed
PHASE 13.40.DF — Cumulative histogram (cumulative=True/-1)
Adds cumulative: Union[bool, int] = False to hist(). Three values: True → ascending cumulative (each bin = count ≤ right edge) False → regular histogram (default, byte-identical backward compat) -1 → survival function/CCDF (ROOT convention) Implementation: ~40 LOC source across 5 forwarding sites (CP1-2 recursive + CRR §2.3 NEW faceted dispatch layer): Layer 1: DFDraw.hist(cumulative=False) — named param Layer 2: draw_hist(cumulative=False) — named param Layer 3: _draw_hist_grouped(cumulative=False) — named param (NEW) Layer 4a: ax.hist(cumulative=cumulative) ungrouped — site 1/4 (line 591) Layer 4b: ax.hist(cumulative=cumulative) stacked — site 2/4 (line 803, CP1-1) Layer 4c: ax.hist(cumulative=cumulative) overlaid LS — site 3/4 (line 878, NEW) Layer 4d: ax.hist(cumulative=cumulative) overlaid def — site 4/4 (line 888, NEW) Side: _dispatch_faceted_render(cumulative=cumulative) — faceted layer (CRR §2.3) Every layer forwards explicitly (cumulative=cumulative), NEVER via **kwargs or kwargs.get — recursively applies Phase 13.39 §2.2 lesson at every internal layer + faceted side branch (5 forwarding sites total). Correctness guard (M5): hist_errors=True + cumulative=True → NotImplementedError (Poisson per-bin errors are independent; cumulative counts have correlated uncertainty — silently passing through would render statistically wrong error bars on the ECDF.) CP1-3: drawer.py:3085 docstring fix — drops stale norm='cumulative' (actual line was 3085, not 3002 as v1.2 spec said; drift from Phase 13.36-13.39 additions). Compositions locked (10 tests): CH.1 monotone non-decreasing + total-N CH.2 norm='probability' + cumulative=True → ECDF (max=1.0) CH.3 cumulative=-1 + norm='probability' → survival (max=1.0) CH.4 cumulative=False byte-identical (backward compat) CH.5 group_by + cumulative + hist_norm → per-group ECDFs CH.6 hist_errors + cumulative → NotImplementedError (M5) CH.7 [x,y] vector + cumulative → both polygons saturate CH.8 facet_by + cumulative → per-facet cumulative (CRR §2.3 fix) CH.9 histtype='step' + cumulative → Polygon-safe vertex probe CH.10 group_by + stacked + cumulative → CP2-1 regression lock Fix-at-code-time disclosures (CRR §2): §2.1: 4 ax.hist() call sites (not 3) — Phase 13.37 split overlaid into linestyle_cycle + default branches §2.2: Polygon-safe probes in ALL tests (default histtype='stepfilled' produces Polygon, not Rectangle — CP1-4 broader than spec) §2.3: _dispatch_faceted_render needs explicit cumulative= forward (5th forwarding site, not in v1.2 spec) §2.4: docstring line drift 3002 → 3085 (caught by §8 protocol) Tests: 913 → 923 (+10 §9 invariance: CH.1-10). QRC v1.32 carry-forward AliceO2Group#6 (NEW, recursive forwarding rule): 'Named param ≠ kwarg after FORWARDED_NAMES promotion applies recursively at EVERY forwarding layer including faceted/vector dispatch side branches. All N internal call sites must forward explicitly.' Pre-existing failure (not Phase 13.40): test_vector_draw_kwarg_surface_enumeration fails on Linux Py3.12 (pandas StringDtype in _process_color — Phase 13.39 §2.5 carry-forward). Mac Py3.9.6 unaffected.
1 parent 46b7adf commit 67d125e

6 files changed

Lines changed: 439 additions & 10 deletions

File tree

UTILS/dfextensions/dfdraw/docs/CAPABILITY_MATRIX.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Capability Matrix — dfdraw
22

3-
**Generated:** 2026-05-21 20:47 UTC
3+
**Generated:** 2026-05-21 22:27 UTC
44
**Phase:** 13.15.DF
55
**Generator:** `scripts/generate_capability_matrix.py`
66
**Sources:** `tests/feature_taxonomy.py` + `tests/test_layer_classification.py`
@@ -9,13 +9,13 @@
99

1010
| Status | Count | % |
1111
|--------|------:|--:|
12-
| ✅ Verified | 47 | 45% |
12+
| ✅ Verified | 48 | 45% |
1313
| ☑️ Smoke-only | 57 | 54% |
1414
| 🧨 Broken | 0 | 0% |
1515
| 📋 Planned | 1 | 1% |
16-
| **Total features** | **105** | |
17-
| **Total proof tests** | **468** | |
18-
| **Invariance tests** | **239** | |
16+
| **Total features** | **106** | |
17+
| **Total proof tests** | **478** | |
18+
| **Invariance tests** | **249** | |
1919

2020
**Status key:**
2121
- ✅ Verified — has at least one invariance test (A ≡ B check)
@@ -167,6 +167,8 @@
167167
| | **SCATTER** | | |
168168
|| **SCATTER.time_axis** — scatter() time_format= pre-conversion: x_data → matplotlib date numbers BEFORE ax.scatter/ax.errorbar — Phase 13.39.DF | 1 | 0 |
169169
|| **SCATTER.scatter3d** — draw('z:y:x', type='scatter3d') → 3D point cloud via mpl_toolkits.mplot3d. Reuses Phase 13.38 _process_color() + _process_size() unchanged. color=/size= accept column names or df.eval() expressions. elev=/azim= for ax.view_init(). Stats dict locks mean_x AND mean_y AND mean_z to 1e-9 (CP1-3). Scope boundaries: group_by + scatter3d raises (CP2-1); same=True onto non-3D axes raises (CP2-2). 'y:x' (colon!=2) with type='scatter3d' raises with actionable message — Phase 13.39.DF | 8 | 0 |
170+
| | **HIST** | | |
171+
|| **HIST.cumulative** — hist() cumulative=True/-1/False — ROOT TH1::Draw('cumulative') equivalent. Three values: True (ascending CDF/ECDF), False (default, byte-identical backward compat), -1 (descending/survival, ROOT convention). matplotlib native cumulative= forwarded explicitly at 4 internal call sites (Phase 13.39 §2.2 lesson applied recursively: DFDraw.hist → draw_hist → _draw_hist_grouped → ax.hist; ALSO through _dispatch_faceted_render for facet_by composition). Composes with: norm='probability' (→ ECDF 0-1), group_by overlaid (per-group ECDFs), group_by stacked (CP2-1 regression lock for 3rd call site), facet_by (per-facet cumulative), histtype='step' (HEP-standard step ECDF). Correctness guard (M5): hist_errors+cumulative → NotImplementedError (Poisson per-bin errors are independent; cumulative counts are correlated). Vector dispatch [x,y] propagates cumulative correctly (Phase 13.16.DF FIX1 bug class lock) — Phase 13.40.DF | 10 | 0 |
170172

171173
---
172174

UTILS/dfextensions/dfdraw/drawer.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -652,6 +652,8 @@ def _auto_label(self, y_expr, x_expr=None):
652652
'hist_errors', 'linestyle_cycle',
653653
# Phase 13.39.DF: time-axis formatting (pre-conversion approach)
654654
'time_format',
655+
# Phase 13.40.DF: cumulative histogram (CDF/ECDF/survival)
656+
'cumulative',
655657
)
656658

657659
_SCATTER_FORWARDED_NAMES = (
@@ -3082,7 +3084,9 @@ def draw(
30823084
stats : bool or list, optional
30833085
Show statistics box. True for defaults, or list of stat names.
30843086
norm : str, optional
3085-
Histogram normalization: "count", "density", "probability", "cumulative".
3087+
Histogram normalization: "count", "density", "probability".
3088+
For cumulative distributions, use the `cumulative=True` parameter
3089+
(Phase 13.40.DF) — NOT norm="cumulative" (raises ValueError).
30863090
title : str, optional
30873091
Plot title.
30883092
ax : matplotlib Axes, optional
@@ -3273,6 +3277,8 @@ def hist(
32733277
nan_policy: str = "filter",
32743278
# Phase 13.39.DF: time-axis formatting (pre-conversion approach)
32753279
time_format: Optional[str] = None,
3280+
# Phase 13.40.DF: cumulative histogram (CDF/ECDF/survival)
3281+
cumulative: Union[bool, int] = False,
32763282
# Phase 13.27.DF Commit 2 FIX1 (§7b): weights as column name or
32773283
# df.eval-able expression. Mirrors profile()'s weights= semantics.
32783284
# If both `weights=` and `norm="probability"` are passed, the
@@ -3509,6 +3515,10 @@ def hist(
35093515
group_by_quantiles=group_by_quantiles,
35103516
hist_norm=hist_norm,
35113517
min_entries=min_entries,
3518+
# Phase 13.40.DF: cumulative must be explicit through faceted
3519+
# dispatch (recursive QRC v1.32 #6 — every forwarding layer
3520+
# must pass the named param explicitly). Locked by §9.CH.8.
3521+
cumulative=cumulative,
35123522
**kwargs
35133523
)
35143524
# Facet mode (legacy path, same=True ignored in facet mode)
@@ -3551,6 +3561,8 @@ def hist(
35513561
linestyle_cycle=linestyle_cycle,
35523562
# Phase 13.39.DF: time-axis formatting
35533563
time_format=time_format,
3564+
# Phase 13.40.DF: cumulative histogram (explicit forward)
3565+
cumulative=cumulative,
35543566
**kwargs
35553567
)
35563568
axes = ax

UTILS/dfextensions/dfdraw/plots/histogram.py

Lines changed: 49 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -240,6 +240,11 @@ def draw_hist(
240240
linestyle_cycle: bool = False,
241241
# Phase 13.39.DF: time-axis formatting (pre-conversion approach, CP1-4 auto-detect).
242242
time_format: Optional[str] = None,
243+
# Phase 13.40.DF: cumulative histogram (CDF/ECDF/survival).
244+
# False (default) → regular histogram (byte-identical backward compat)
245+
# True → ascending cumulative (each bin = count ≤ right edge)
246+
# -1 → descending / survival (ROOT convention)
247+
cumulative: Union[bool, int] = False,
243248
**kwargs
244249
) -> Tuple[plt.Figure, plt.Axes, Dict[str, Any]]:
245250
"""
@@ -353,6 +358,20 @@ def draw_hist(
353358
isinstance(x, str) and x in df.columns
354359
and np.issubdtype(df[x].dtype, np.datetime64)
355360
)
361+
362+
# Phase 13.40.DF M5 correctness guard: hist_errors + cumulative is
363+
# statistically wrong. Poisson per-bin errors assume independent counts;
364+
# cumulative counts have correlated uncertainty. Locked by §9.CH.6.
365+
if hist_errors and cumulative:
366+
raise NotImplementedError(
367+
"hist_errors=True and cumulative=True cannot be composed: "
368+
"Poisson per-bin errors assume independent counts; cumulative "
369+
"counts have correlated uncertainty (each bin's error depends "
370+
"on all prior bins). Use cumulative=True without "
371+
"hist_errors=True, or use a single-bin approach for the "
372+
"threshold of interest."
373+
)
374+
356375
if isinstance(x, str):
357376
x_name = x
358377
if _x_is_datetime:
@@ -558,6 +577,11 @@ def draw_hist(
558577
density=density, weights=_hist_weights,
559578
alpha=alpha, histtype=histtype, edgecolor=edgecolor,
560579
linewidth=linewidth,
580+
# Phase 13.40.DF CP1-2: cumulative forwarded explicitly to the
581+
# grouped path (covers BOTH stacked branch at ~line 803 AND the
582+
# 2 overlaid branches at ~847/853). Without this, all 3 grouped
583+
# call sites silently drop cumulative.
584+
cumulative=cumulative,
561585
**kwargs # group_by_bins/hist_norm/min_entries already consumed
562586
)
563587
stats_dict["grouped"] = True
@@ -567,7 +591,10 @@ def draw_hist(
567591
ax.hist(
568592
x_data, bins=bins, range=_used_range, density=density, weights=_hist_weights,
569593
color=color, alpha=alpha, histtype=histtype, edgecolor=edgecolor,
570-
linewidth=linewidth, label=label, **kwargs
594+
linewidth=linewidth, label=label,
595+
# Phase 13.40.DF: cumulative histogram (explicit forward — call site 1/4)
596+
cumulative=cumulative,
597+
**kwargs
571598
)
572599
# Phase 13.37.DF: Poisson error bar overlay for ungrouped path.
573600
# CP1-7: use edges from np.histogram() return (bins= may be int).
@@ -694,6 +721,10 @@ def _draw_hist_grouped(
694721
# Phase 13.37.DF: pre-computed per-row weights (forwarded from draw_hist
695722
# for hist_errors weighted-Poisson computation in the grouped path).
696723
_hist_weights_arr: Optional[np.ndarray] = None,
724+
# Phase 13.40.DF CP1-2: cumulative histogram. Recursive named-param
725+
# forwarding per QRC v1.32 #6 — DFDraw.hist → draw_hist → _draw_hist_grouped
726+
# → ax.hist. NEVER access via kwargs.get/**hist_kwargs.
727+
cumulative: Union[bool, int] = False,
697728
**hist_kwargs
698729
) -> int:
699730
"""Draw grouped/overlaid histograms.
@@ -802,7 +833,12 @@ def _draw_hist_grouped(
802833
hist_kwargs.pop('edgecolor', None)
803834
ax.hist(data_list, bins=bins_arg, label=labels,
804835
color=surviving_colors, edgecolor=_stacked_ec,
805-
stacked=True, **hist_kwargs)
836+
stacked=True,
837+
# Phase 13.40.DF CP1-1: cumulative forwarded explicitly (call
838+
# site 2/4 — stacked branch). v1.1 spec missed this site;
839+
# added in v1.2 per Sonet50 panel. Locked by §9.CH.10.
840+
cumulative=cumulative,
841+
**hist_kwargs)
806842
return len(data_list)
807843
else:
808844
# Overlaid histograms — one ax.hist call per surviving group.
@@ -847,13 +883,22 @@ def _draw_hist_grouped(
847883
ax.hist(group_data, bins=bins_arg,
848884
label=str(group), color=group_color,
849885
edgecolor=_ec, linestyle=_per_group_ls,
850-
weights=weights, **hist_kwargs)
886+
weights=weights,
887+
# Phase 13.40.DF: cumulative (call site 3/4 — overlaid
888+
# linestyle_cycle path. Phase 13.37 split overlaid into
889+
# 2 branches; both need explicit forward.)
890+
cumulative=cumulative,
891+
**hist_kwargs)
851892
else:
852893
# No cycle OR user explicit → existing flow (linestyle in **kwargs)
853894
ax.hist(group_data, bins=bins_arg,
854895
label=str(group), color=group_color,
855896
edgecolor=_ec,
856-
weights=weights, **hist_kwargs)
897+
weights=weights,
898+
# Phase 13.40.DF: cumulative (call site 4/4 — overlaid
899+
# default path).
900+
cumulative=cumulative,
901+
**hist_kwargs)
857902

858903
# Phase 13.37.DF: Poisson error bar overlay (CP1-1/CP1-2 fix:
859904
# color=group_color follows Phase 13.36 sentinel, NOT colors[i]).

UTILS/dfextensions/dfdraw/tests/feature_taxonomy.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1312,4 +1312,22 @@
13121312
"test_phase_13_39_df_profile2d_timeaxis.py::TestScatter3D::test_SC3D_8_same_true_non_3d_axes_raises",
13131313
],
13141314
},
1315+
# ── Phase 13.40.DF: Cumulative Histogram ──
1316+
{
1317+
"id": "HIST.cumulative",
1318+
"name": "hist() cumulative=True/-1/False — ROOT TH1::Draw('cumulative') equivalent. Three values: True (ascending CDF/ECDF), False (default, byte-identical backward compat), -1 (descending/survival, ROOT convention). matplotlib native cumulative= forwarded explicitly at 4 internal call sites (Phase 13.39 §2.2 lesson applied recursively: DFDraw.hist → draw_hist → _draw_hist_grouped → ax.hist; ALSO through _dispatch_faceted_render for facet_by composition). Composes with: norm='probability' (→ ECDF 0-1), group_by overlaid (per-group ECDFs), group_by stacked (CP2-1 regression lock for 3rd call site), facet_by (per-facet cumulative), histtype='step' (HEP-standard step ECDF). Correctness guard (M5): hist_errors+cumulative → NotImplementedError (Poisson per-bin errors are independent; cumulative counts are correlated). Vector dispatch [x,y] propagates cumulative correctly (Phase 13.16.DF FIX1 bug class lock) — Phase 13.40.DF",
1319+
"category": "HIST",
1320+
"tests": [
1321+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_1_monotone_and_total_N_lock",
1322+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_2_ECDF_last_value_is_one",
1323+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_3_survival_starts_at_one",
1324+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_4_cumulative_false_backward_compat",
1325+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_5_group_by_per_group_ecdf",
1326+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_6_hist_errors_plus_cumulative_raises",
1327+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_7_vector_dispatch_propagates_cumulative",
1328+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_8_facet_by_per_facet_ecdf",
1329+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_9_histtype_step_plus_cumulative_polygon_safe",
1330+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_10_group_by_stacked_cumulative_regression_lock",
1331+
],
1332+
},
13151333
]

UTILS/dfextensions/dfdraw/tests/test_layer_classification.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -312,5 +312,17 @@
312312
"test_phase_13_39_df_profile2d_timeaxis.py::TestScatter3D::test_SC3D_7_group_by_raises_value_error": "invariance",
313313
"test_phase_13_39_df_profile2d_timeaxis.py::TestScatter3D::test_SC3D_8_same_true_non_3d_axes_raises": "invariance",
314314

315+
# ── Phase 13.40.DF: Cumulative Histogram (10 invariance) ──
316+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_1_monotone_and_total_N_lock": "invariance",
317+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_2_ECDF_last_value_is_one": "invariance",
318+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_3_survival_starts_at_one": "invariance",
319+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_4_cumulative_false_backward_compat": "invariance",
320+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_5_group_by_per_group_ecdf": "invariance",
321+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_6_hist_errors_plus_cumulative_raises": "invariance",
322+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_7_vector_dispatch_propagates_cumulative": "invariance",
323+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_8_facet_by_per_facet_ecdf": "invariance",
324+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_9_histtype_step_plus_cumulative_polygon_safe": "invariance",
325+
"test_phase_13_40_df_cumulative_hist.py::TestCumulativeHist::test_CH_10_group_by_stacked_cumulative_regression_lock": "invariance",
326+
315327
# Everything else defaults to "smoke"
316328
}

0 commit comments

Comments
 (0)