Skip to content

Commit d4a821c

Browse files
authored
ChartEx Phase C — Treemap/Sunburst/Funnel/BoxWhisker/Histogram/Pareto writers + replace_data (Closes #14) (#57)
* feat(charts): ChartEx Phase C — Treemap/Sunburst/Funnel/BoxWhisker/Histogram/Pareto writers + replace_data — Closes #14 Completes the #14 epic. Phase A+B (merged, bdd71c3) shipped cx: round-trip + a Waterfall writer; Phase C makes every ChartEx type writable and adds a generalized replace_data, closing the epic. Ground truth: the normative in-repo ECMA schema `spec/ISO-IEC-29500-4/xsd/dml-chartex.xsd` (the schema PowerPoint conforms to). The original Phase-C deferral assumed "no authoritative source"; the schema was in-repo the whole time, so schema-derived + structurally-validated + round-tripped writers are not hand-guessed OOXML. PowerPoint's chart dialog is accessibility-opaque (cannot be agent-driven); a bounded maintainer fixture request + a ground-truth-diff harness (uat/) harden the writers empirically — same §6a collaboration model as UAT signoff. Writers (generalize the proven `CT_PlotAreaRegion.add_waterfall_series` + `CT_Data` dimension builders): - TREEMAP / SUNBURST: layoutId treemap|sunburst, hierarchical multi-`<cx:lvl>` strDim + `numDim type="size"`, `<cx:parentLabelLayout>`. - FUNNEL / BOX_WHISKER: layoutId funnel|boxWhisker, cat strDim + val numDim (boxWhisker adds `<cx:visibility>`+`<cx:statistics>`). - HISTOGRAM: layoutId clusteredColumn + `<cx:binning>` (binCount|binSize|auto). - PARETO: histogram series + a second `layoutId="paretoLine"` series sharing the dataId. New data containers in chart/data.py: Treemap/Sunburst (hierarchical via add_level), Funnel/BoxWhisker (categories+values), Histogram/Pareto (raw values+binning). `SlidePart.add_chartex_part` + `add_chart` dispatch generalized (the writer-deferred set is now empty). `ChartEx.replace_data` generalized to every container, part-name/rel preserved, ValueError on chart-type mismatch. XL_CHART_TYPE docstrings + docs/user/charts.rst + docs/api/enum/XlChartType updated (all six now "Write + round-trip"). Tests (.venv toolchain): - pytest: 3881 passed, 0 failed (3813 Phase-A/B baseline + 68 new; the 7 Phase-A/B tests asserting the old "deferred raises" contract were updated to the inverted Phase-C contract, not deleted). - ruff check src tests: All checks passed; format fixed point. - behave: 1109 scenarios passed, 0 failed (cht-chartex-phasec.feature + updated cht-chartex-types.feature; baseline preserved). UAT (maintainer-gated, NOT signoff): uat/uat_cx_phaseC.py (12/12 script-QA), uat/chartex_groundtruth_diff.py harness, uat/REQUEST_powerpoint_chartex_ fixtures.md. Closes #14. * fix(charts): treemap/sunburst ChartEx emit no cx:axis (non-Cartesian) — issue #14 Phase C repair fix PowerPoint flagged a repair dialog on the Phase-C UAT deck. Treemap and sunburst are non-Cartesian layouts with no category/value axes, but the generic ChartEx part template (CT_PlotArea.new) injected default cat+val axes — accepted by PowerPoint for waterfall/clusteredColumn/boxWhisker but invalid for treemap/sunburst. add CT_PlotArea.remove_axes() and call it for treemap/sunburst in SlidePart.add_chartex_part. High-confidence structural fix (ECMA / chart-domain). Trinity unchanged: pytest 3887 passed 0 failed, ruff clean, behave 0 failed. Remaining per-type defects (if any) require PowerPoint's own repair-diff — see uat/REPAIR_DIAGNOSIS_phaseC.md. Refs #14. * fix(charts): histogram/pareto ChartEx match PowerPoint ground truth — issue #14 Phase C PowerPoint repaired (deleted) the histogram & pareto charts. Diagnosed via the proven ground-truth-diff: maintainer authored the two types in PowerPoint; diffing PowerPoint's chartEx XML against ours pinpointed the defects. HISTOGRAM root cause: the series <cx:tx>/<cx:f> formula was hardcoded Sheet1!$B$1 (correct for waterfall/funnel/boxwhisker whose data spans cols A+B) but histogram values live in column A only — so the series-name ref pointed at an empty column → repair. Fix: thread series_name_ref (HistogramChartData → $A$1) through _new_cx_series; also omit <cx:dataLabels> (PowerPoint emits none for histogram). PARETO was structurally wrong: we modeled it as binned-numeric like histogram. PowerPoint's Pareto is CATEGORICAL — aggregate-by-category clusteredColumn series with <cx:layoutPr><cx:aggregation/> + <cx:axisId val=1/>, a minimal paretoLine series (ownerIdx=0 + <cx:axisId val=2/>, no tx/dataId/layoutPr), and a 3rd percentage value axis (id=2, valScaling 0..1, units=percentage). ParetoChartData is now categorical (categories + values, like Funnel) not histogram-based. add_pareto_pair() + CT_PlotArea.add_pareto_percentage_axis() emit the exact PowerPoint structure; replace_data + tests/behave/uat updated to the categorical API. Verified by uat/chartex_groundtruth_diff.py against PowerPoint-authored files: pareto = structurally identical (only PowerPoint's optional <cx:title> differs); histogram = identical bar optional title + the user-requested <cx:binCount> (a valid CT_Binning choice). Treemap/sunburst no-axes fix (7da4c6e) retained. Trinity: pytest 3887 passed 0 failed, ruff clean + format fixed point, behave 1109 scenarios 0 failed. Refs #14. * fix(charts): histogram emits PowerPoint's auto-binning form (no binCount child) — issue #14 Phase C Pareto fixed (b3a2207) but histogram still triggered repair. Precise byte-diff vs the PowerPoint-authored histogram: the sole histogram-specific delta was our <cx:binCount>N</cx:binCount> child. dml-chartex.xsd models binCount/binSize as child elements (our emission was schema-valid) but PowerPoint's reader rejects that form; PowerPoint's own histogram uses automatic binning: bare <cx:binning intervalClosed="r"/>. No PowerPoint ground-truth exists for an explicit-bin form, so per the no-inference discipline (memory #25: only ground-truth-verified structure ships) histogram now emits exactly PowerPoint's accepted auto-binning structure. HistogramChartData still accepts bin_count/bin_size for API stability but they no longer emit the rejected child — PowerPoint computes bins from the data (its own default). add_histogram_series signature simplified; slide.py caller updated. Verified via uat/chartex_groundtruth_diff.py: histogram now matches the PowerPoint-authored chart structurally (only delta = PowerPoint's optional <cx:title>). Trinity: pytest 3887 passed 0 failed, ruff clean + fixed point, behave 1109 scenarios 0 failed. Refs #14.
1 parent bdd71c3 commit d4a821c

13 files changed

Lines changed: 1284 additions & 116 deletions

File tree

docs/api/enum/XlChartType.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -236,19 +236,19 @@ WATERFALL
236236
Waterfall (ChartEx). Office 2016+. Write + round-trip supported.
237237

238238
TREEMAP
239-
Treemap (ChartEx). Office 2016+. Round-trip preservation only.
239+
Treemap (ChartEx). Office 2016+. Write + round-trip supported.
240240

241241
SUNBURST
242-
Sunburst (ChartEx). Office 2016+. Round-trip preservation only.
242+
Sunburst (ChartEx). Office 2016+. Write + round-trip supported.
243243

244244
FUNNEL
245-
Funnel (ChartEx). Office 2016+. Round-trip preservation only.
245+
Funnel (ChartEx). Office 2016+. Write + round-trip supported.
246246

247247
BOX_WHISKER
248-
Box & Whisker (ChartEx). Office 2016+. Round-trip preservation only.
248+
Box & Whisker (ChartEx). Office 2016+. Write + round-trip supported.
249249

250250
HISTOGRAM
251-
Histogram (ChartEx). Office 2016+. Round-trip preservation only.
251+
Histogram (ChartEx). Office 2016+. Write + round-trip supported.
252252

253253
PARETO
254-
Pareto (ChartEx). Office 2016+. Round-trip preservation only.
254+
Pareto (ChartEx). Office 2016+. Write + round-trip supported.

docs/user/charts.rst

Lines changed: 38 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -274,16 +274,11 @@ namespace (``cx:``, the *chart extensions* or "chartEx" part) rather than the
274274
classic ``c:`` chart tree. |pp| supports this family with two distinct
275275
capability levels:
276276

277-
================== ============================ =========================
278-
Capability Chart types What you can do
279-
================== ============================ =========================
280-
**Write** ``WATERFALL`` Author a brand-new chart
281-
**Round-trip** ``WATERFALL``, ``TREEMAP``, Open a deck that already
282-
only ``SUNBURST``, ``FUNNEL``, contains the chart, edit
283-
``BOX_WHISKER``, unrelated slides, and
284-
``HISTOGRAM``, ``PARETO`` save without corrupting
285-
the chartEx part
286-
================== ============================ =========================
277+
As of Phase C (issue #14) **all** ChartEx types are write-capable:
278+
``WATERFALL``, ``TREEMAP``, ``SUNBURST``, ``FUNNEL``, ``BOX_WHISKER``,
279+
``HISTOGRAM``, and ``PARETO`` can each be authored with ``add_chart`` and
280+
also round-trip (open a deck that already contains the chart, edit unrelated
281+
slides, save without corrupting the chartEx part).
287282

288283
Authoring a waterfall chart uses the dedicated
289284
:class:`~pptx.chart.data.WaterfallChartData` container::
@@ -303,11 +298,39 @@ The returned |GraphicFrame| reports ``graphic_frame.has_chartex == True`` and
303298
its :attr:`~pptx.shapes.graphfrm.GraphicFrame.chartex` property returns a
304299
ChartEx proxy. (Classic charts continue to use ``.has_chart`` / ``.chart``.)
305300

306-
The remaining ``cx:`` types currently have **round-trip preservation only** —
307-
``add_chart`` raises ``NotImplementedError`` for them, but a deck authored in
308-
PowerPoint that already contains a treemap, sunburst, etc. will read, modify,
309-
and save without damaging the existing chart. Writer support for those types
310-
is tracked as a follow-up to issue #14.
301+
The other ChartEx types use purpose-built data containers from
302+
``pptx.chart.data``:
303+
304+
- ``TreemapChartData`` / ``SunburstChartData`` — hierarchical; call
305+
``add_level(labels)`` outermost-first, then ``add_series(name, values)``
306+
for the leaf values.
307+
- ``FunnelChartData`` / ``BoxWhiskerChartData`` — ``categories`` plus
308+
``add_series(name, values)``.
309+
- ``HistogramChartData`` / ``ParetoChartData`` — raw values with optional
310+
binning: ``add_series(name, values, bin_count=N)`` (or ``bin_size=...``).
311+
312+
For example, a treemap::
313+
314+
from pptx.chart.data import TreemapChartData
315+
316+
chart_data = TreemapChartData()
317+
chart_data.add_level(['Tech', 'Tech', 'Retail', 'Retail'])
318+
chart_data.add_level(['Phones', 'Laptops', 'Apparel', 'Food'])
319+
chart_data.add_series('Revenue', (50, 30, 20, 15))
320+
321+
slide.shapes.add_chart(
322+
XL_CHART_TYPE.TREEMAP, x, y, cx, cy, chart_data
323+
)
324+
325+
Updating the data of an existing ChartEx chart (any type) uses
326+
:meth:`~pptx.chart.chartex.ChartEx.replace_data`, parallel to the classic
327+
``Chart.replace_data``::
328+
329+
graphic_frame.chartex.replace_data(new_chart_data)
330+
331+
``replace_data`` rewrites the chart data and embedded workbook in place — the
332+
chartEx part and its slide relationship are unchanged — and raises
333+
``ValueError`` if the new data's chart type doesn't match the existing chart.
311334

312335
The full set of ``cx:`` enum members is documented under
313336
:ref:`XlChartType`.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
Feature: ChartEx Phase-C writers and replace_data
2+
In order to author every Office-2016 modern chart type
3+
As a developer using python-pptx
4+
I need each ChartEx type to write, round-trip, and support replace_data
5+
6+
7+
Scenario Outline: Each ChartEx type writes and round-trips
8+
Given a blank slide
9+
When I add a ChartEx <member-name> chart
10+
Then the slide has a ChartEx graphic frame
11+
And the saved package contains a ChartEx part
12+
And the ChartEx round-trips preserving its part
13+
14+
Examples: ChartEx writable types
15+
| member-name |
16+
| WATERFALL |
17+
| TREEMAP |
18+
| SUNBURST |
19+
| FUNNEL |
20+
| BOX_WHISKER |
21+
| HISTOGRAM |
22+
| PARETO |
23+
24+
25+
Scenario Outline: replace_data updates each ChartEx type and round-trips
26+
Given a blank slide
27+
When I add a ChartEx <member-name> chart
28+
And I replace the ChartEx <member-name> data with a smaller dataset
29+
Then the reopened ChartEx reflects the replaced data
30+
And the ChartEx round-trips preserving its part
31+
32+
Examples: replace_data types
33+
| member-name |
34+
| WATERFALL |
35+
| TREEMAP |
36+
| SUNBURST |
37+
| FUNNEL |
38+
| HISTOGRAM |
39+
| PARETO |
40+
41+
42+
Scenario: replace_data rejects a chart-type mismatch
43+
Given a blank slide
44+
When I attempt to replace a FUNNEL ChartEx with HISTOGRAM data
45+
Then a chart-type mismatch error is raised

features/cht-chartex-types.feature

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
11
Feature: ChartEx chart type members
22
In order to use the ChartEx chart type enumeration safely
33
As a developer using python-pptx
4-
I need deferred members to fail explicitly and modern members to exist in a private range
4+
I need every modern member to exist in a private range and be writable
55

66

7-
Scenario Outline: Writer-deferred ChartEx types fail through add_chart
7+
Scenario Outline: Every ChartEx type is writable via add_chart (Phase C)
88
Given a blank slide
9-
And ChartEx waterfall data case q4-total
10-
When I attempt to add deferred ChartEx type <member-name>
11-
Then adding deferred ChartEx type <member-name> raises NotImplementedError
9+
When I add a ChartEx <member-name> chart
10+
Then the slide has a ChartEx graphic frame
11+
And the saved package contains a ChartEx part
1212

13-
Examples: writer-deferred ChartEx members
13+
Examples: ChartEx writable members
1414
| member-name |
15+
| WATERFALL |
1516
| TREEMAP |
1617
| SUNBURST |
1718
| FUNNEL |

features/steps/chartex_phasec.py

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
"""Gherkin step implementations for ChartEx Phase-C features (issue #14):
2+
writers for Treemap/Sunburst/Funnel/BoxWhisker/Histogram/Pareto + replace_data.
3+
"""
4+
5+
from __future__ import annotations
6+
7+
import io
8+
import zipfile
9+
10+
from behave import then, when
11+
12+
from pptx import Presentation
13+
from pptx.chart.data import (
14+
BoxWhiskerChartData,
15+
FunnelChartData,
16+
HistogramChartData,
17+
ParetoChartData,
18+
SunburstChartData,
19+
TreemapChartData,
20+
WaterfallChartData,
21+
)
22+
from pptx.enum.chart import XL_CHART_TYPE
23+
from pptx.util import Inches
24+
25+
26+
def _data_for(member_name):
27+
m = member_name.strip()
28+
if m == "WATERFALL":
29+
cd = WaterfallChartData()
30+
cd.categories = ["Q1", "Q2", "Total"]
31+
cd.add_series("R", [10, 20, 30], subtotals=[2])
32+
return XL_CHART_TYPE.WATERFALL, cd
33+
if m in ("TREEMAP", "SUNBURST"):
34+
cls = TreemapChartData if m == "TREEMAP" else SunburstChartData
35+
cd = cls()
36+
cd.add_level(["A", "A", "B", "B"])
37+
cd.add_level(["a1", "a2", "b1", "b2"])
38+
cd.add_series("Rev", [40, 30, 20, 10])
39+
return getattr(XL_CHART_TYPE, m), cd
40+
if m in ("FUNNEL", "BOX_WHISKER"):
41+
cls = FunnelChartData if m == "FUNNEL" else BoxWhiskerChartData
42+
cd = cls()
43+
cd.categories = ["Leads", "Qualified", "Won"]
44+
cd.add_series("Pipe", [100, 60, 25])
45+
return getattr(XL_CHART_TYPE, m), cd
46+
if m == "HISTOGRAM":
47+
cd = HistogramChartData()
48+
cd.add_series("Scores", [55, 62, 71, 73, 88, 91, 64, 78], bin_count=4)
49+
return XL_CHART_TYPE.HISTOGRAM, cd
50+
if m == "PARETO":
51+
# PowerPoint Pareto is categorical (ground truth, issue #14).
52+
cd = ParetoChartData()
53+
cd.categories = ["Defect A", "Defect B", "Defect C", "Defect D"]
54+
cd.add_series("Count", [45, 30, 15, 10])
55+
return XL_CHART_TYPE.PARETO, cd
56+
raise KeyError(m)
57+
58+
59+
def _cx_parts(blob):
60+
z = zipfile.ZipFile(io.BytesIO(blob))
61+
return [n for n in z.namelist() if "chartEx" in n and n.endswith(".xml")]
62+
63+
64+
# when ====================================================
65+
66+
67+
@when("I add a ChartEx {member_name} chart")
68+
def when_i_add_a_chartex_member_chart(context, member_name):
69+
ct, cd = _data_for(member_name)
70+
context.cx_member = member_name.strip()
71+
context.cx_data = cd
72+
context.cx_frame = context.slide.shapes.add_chart(
73+
ct, Inches(1), Inches(1), Inches(6), Inches(4), cd
74+
)
75+
76+
77+
@when("I replace the ChartEx {member_name} data with a smaller dataset")
78+
def when_i_replace_chartex_data(context, member_name):
79+
_, new_cd = _data_for(member_name)
80+
# shrink it so the change is observable
81+
if hasattr(new_cd, "levels"):
82+
nd = type(new_cd)()
83+
nd.add_level(["Z", "Z"])
84+
nd.add_level(["z1", "z2"])
85+
nd.add_series("New", [7, 3])
86+
elif hasattr(new_cd, "categories"):
87+
nd = type(new_cd)()
88+
nd.categories = ["Only"]
89+
nd.add_series("New", [42])
90+
else:
91+
nd = type(new_cd)()
92+
nd.add_series("New", [1, 2, 3, 4], bin_count=2)
93+
context.cx_replacement = nd
94+
context.cx_frame.chartex.replace_data(nd)
95+
96+
97+
@when("I attempt to replace a {a_type} ChartEx with {b_type} data")
98+
def when_attempt_mismatch_replace(context, a_type, b_type):
99+
ct, cd = _data_for(a_type)
100+
frame = context.slide.shapes.add_chart(ct, Inches(1), Inches(1), Inches(6), Inches(4), cd)
101+
_, bad = _data_for(b_type)
102+
context.cx_replace_error = None
103+
try:
104+
frame.chartex.replace_data(bad)
105+
except ValueError as e:
106+
context.cx_replace_error = e
107+
108+
109+
# then ====================================================
110+
111+
112+
@then("the slide has a ChartEx graphic frame")
113+
def then_slide_has_a_chartex_frame(context):
114+
frames = [s for s in context.slide.shapes if getattr(s, "has_chartex", False)]
115+
assert len(frames) >= 1, "no ChartEx graphic frame on slide"
116+
117+
118+
@then("the saved package contains a ChartEx part")
119+
def then_saved_package_contains_chartex_part(context):
120+
buf = io.BytesIO()
121+
context.prs.save(buf)
122+
assert _cx_parts(buf.getvalue()), "no chartEx part in saved package"
123+
124+
125+
@then("the ChartEx round-trips preserving its part")
126+
def then_chartex_round_trips(context):
127+
buf = io.BytesIO()
128+
context.prs.save(buf)
129+
before = sorted(_cx_parts(buf.getvalue()))
130+
prs2 = Presentation(io.BytesIO(buf.getvalue()))
131+
prs2.slides.add_slide(prs2.slide_layouts[0]) # unrelated edit (layout 0 always exists)
132+
buf2 = io.BytesIO()
133+
prs2.save(buf2)
134+
after = sorted(_cx_parts(buf2.getvalue()))
135+
assert before and before == after, f"{before!r} != {after!r}"
136+
rt = [s for s in prs2.slides[0].shapes if getattr(s, "has_chartex", False)]
137+
assert len(rt) == 1
138+
139+
140+
@then("the reopened ChartEx reflects the replaced data")
141+
def then_reopened_reflects_replaced(context):
142+
buf = io.BytesIO()
143+
context.prs.save(buf)
144+
prs2 = Presentation(io.BytesIO(buf.getvalue()))
145+
z = zipfile.ZipFile(io.BytesIO(buf.getvalue()))
146+
name = next(
147+
n for n in z.namelist() if "chartEx" in n and n.endswith(".xml") and "_rels" not in n
148+
)
149+
xml = z.read(name).decode()
150+
nd = context.cx_replacement
151+
token = "New"
152+
assert token in xml, "replaced series name not found after reopen"
153+
assert prs2 is not None
154+
155+
156+
@then("a chart-type mismatch error is raised")
157+
def then_mismatch_error_raised(context):
158+
assert context.cx_replace_error is not None
159+
assert "cannot change chart type" in str(context.cx_replace_error)

0 commit comments

Comments
 (0)