Skip to content

Commit 1721884

Browse files
MHoroszowskiMatthew Horoszowski
andauthored
feat(slides): add Presentation.append_from (slide-CRUD Phase 3 — cross-deck copy) (#35)
* feat(slides): add Presentation.append_from (slide-CRUD Phase 3 — cross-deck copy) Implements the cross-deck slide copy sub-feature of issue #11. After Phase 1 (in-deck CRUD: remove, move, indexed add) and Phase 2 (`Slide.duplicate`), Phase 3 closes the cross-package gap so users no longer need to fork to `python-pptx-valutico` or roll their own `copy_slide_from_external_prs` recipes. API --- - `Presentation.append_from(other_pres, slide_indexes=None) -> list[Slide]` - `slide_indexes=None` appends every slide of `other_pres` in source order. An iterable of zero-based ints appends only those slides in that order (allowing reordering). - Returns the list of newly-added `Slide` objects in insertion order. - Raises `IndexError` if any value in `slide_indexes` is out of range. Design ------ The implementation generalizes Phase 2's part-graph copy to work between two `Package` instances. A `_PortContext` instance carries the per-call dedup cache (`{source_part: target_part}` for masters, layouts, themes) so source slide-masters / layouts / themes shared by multiple appended source slides land as a single target part. Three load-bearing choices, all driven by the upstream prior-art recon and Phase 2's lessons: 1. **Port-fresh, not match-existing.** Always port a copy of source's slide-layout, slide-master, and theme into target. Layout matching by name or structural similarity is fragile — see upstream issue scanny#934 attempts. Trade-off: target deck accumulates masters when appending from many decks. Documented; preferred over fragile matching. 2. **Image / media dedup uses target package's existing SHA1 machinery.** `Package.get_or_add_image_part(BytesIO(blob))` already dedupes against target's existing image parts by SHA1 of bytes. We hand it the foreign image's blob — no new dedup code needed. 3. **Notes-master is target's, not source's.** Notes-master is a presentation-level singleton in OOXML. The ported notes-slide relates to TARGET's existing (or auto-created) notes-master via `RT.NOTES_MASTER`, with the back-rel (`RT.SLIDE`) rewired to the new slide part — addresses upstream gotcha scanny#961 generalized to the cross-package case. Recursion handling ------------------ Master and layouts form a cycle (master refs its layouts, each layout refs back to master). Broken by registering the new master in `_master_map` BEFORE walking its rels, and by relating each new layout to the master IMMEDIATELY after creation (so subsequent `next_partname` calls see it via `iter_parts` and don't allocate duplicate partnames — that bug surfaces as a zipfile "Duplicate name" warning on save and is rejected by Office). OOXML allocator quirk --------------------- python-pptx's `CT_SlideMasterIdListEntry` / `CT_SlideLayoutIdListEntry` only declare `rId` as a typed attribute, not the required `id` uint32. Existing presentations round-trip fine because lxml preserves unknown attributes verbatim; but a freshly-`_add_sldMasterId()`'d entry has no `id` and PowerPoint rejects the file. The `_add_sldMasterId_to_presentation` and `_add_sldLayoutId_to_master` helpers compute next-free id (max+1 with floor 2147483648) and set it via `el.set("id", str(next_id))` directly on the lxml element. Theme part loaded from disk is a base `Part` (binary blob), not an `XmlPart` — `_port_theme` branches on `isinstance(... XmlPart)` and either deepcopies the element or blob-copies, depending. Test coverage ------------- `tests/test_append_from.py` — 37 new tests across 6 describe classes: - `DescribePresentation_AppendFrom_API` — argument validation, raises, signature, return shape, empty `slide_indexes`, self-from-self edge case. - `DescribePresentation_AppendFrom_PerSlide` — partname uniqueness, sldId uniqueness, modification isolation, source non-mutation. - `DescribePresentation_AppendFrom_MasterPort` — master ported, within-call dedup (3 source slides on shared master → 1 ported), cross-call non-dedup, `<p:sldMasterId>` allocator, layout-tree fidelity. - `DescribePresentation_AppendFrom_ImageDedup` — cross-package SHA1 dedup, within-call dedup, round-trip preservation. - `DescribePresentation_AppendFrom_NotesSlide` — own notes part, text carried, back-rel rewired to new slide, target's existing notes-master reused. - `DescribePresentation_AppendFrom_RoundTrip` — open → append → save → reopen across all paths. - `DescribePresentation_AppendFrom_Antis` — anti-criteria: source unchanged, comments dropped, target's existing slides preserved. `features/append-from.feature` + 7 new step impls — 4 acceptance scenarios for default-all, selective, no-op-on-empty, and IndexError. Verification ------------ ``` $ python3 -m pytest tests/ -q 3162 passed in 4.32s $ python3 -m ruff format src tests 204 files left unchanged $ python3 -m ruff check src tests All checks passed! $ python3 -m behave features/ --no-color 56 features passed, 0 failed, 0 skipped 994 scenarios passed, 0 failed, 0 skipped ``` UAT script `uat_append_from.py` (untracked at repo root): builds a 3-slide source (textbox + picture + speaker notes) and a 1-slide target sharing the same PNG; runs `target.append_from(source)`; prints structural summary showing image_part_count = 2 BEFORE AND AFTER (cross-package SHA1 dedup of both the PNG and the default theme image holds), slide_part_count growing 1 → 4, master_part_count growing 1 → 2; round-trips through save/reopen with title text and speaker notes preserved. Includes a selective-subset variant exercising `slide_indexes=[2, 0]`. Out of scope (deferred per epic split): - `Presentation.sections` (`<p14:sectionLst>`) — Phase 4. - Append-mode open — Performance & Scale epic. - Comments parts — explicitly dropped (consistent with Phase 2). - Cross-deck font-table merging — fonts-in-OOXML are package-shared but font-table parts beyond default-template baselines deferred. Refs #11. * fix(slides): unify master/layout id pool in append_from to clear PowerPoint repair `Presentation.append_from` triggered PowerPoint's bad-content / repair dialog on first open of any multi-master output. Three distinct OOXML id collisions inherited from the shared default.pptx template needed scrubbing during cross-package master/layout deepcopy: 1. `<p:sldMasterId id>` and `<p:sldLayoutId id>` share one identifier pool per ECMA-376 sec 19.2.1.34 / 19.2.1.27, and PowerPoint enforces uniqueness across both. The new master2 was being allocated id 2147483649 (one above target's lone master at 2147483648) — but 2147483649 was already master1's first layout id. PowerPoint's repair output renumbered master2 to 2147483660. Root fix: build a unified pool from BOTH master and layout ids in the target, allocate master id first (consumes one slot), then renumber layout ids above it. End state matches PowerPoint's repair ordering (master2= 2147483660, master2's layouts=2147483661..2147483671). 2. `<p:sldLayoutId id>` collided across masters because deepcopy of the source master carried the source's layout ids verbatim, identical to target's existing master's layout ids when both came from the same default.pptx. Spec: unique within the layout collection; PowerPoint: unique across the presentation. 3. `<p14:creationId val>` was deterministic in default.pptx (master: 2209977519, plus 11 fixed layout vals), so two presentations from the template ended up with identical vals across their parallel masters and layouts. Refresh each val to a fresh secrets.randbits(32) on deepcopy to keep the extLst structure intact while breaking the collision. Diagnostic path: PowerPoint's repair dialog gave only the generic "PowerPoint found a problem with content" with no error-line detail. Iterated by having the maintainer save the repaired copy and diffing against the original; the rewritten `<p:sldMasterId id>` value (2147483649 → 2147483660) revealed the master/layout pool sharing. Surface: - src/pptx/parts/slide.py: new _used_master_layout_ids_in_target, _renumber_sldLayoutIds, _refresh_creation_ids; _add_sldMasterId_to _presentation accepts unified pool; _port_master and _port_layout _standalone apply scrubs at deepcopy points - tests/test_append_from.py: DescribePresentation_AppendFrom_OOXML Integrity class with 4 zip-level integrity tests (sldLayoutId uniqueness across masters, master id disjoint from layout id pool, master creationId uniqueness, layout creationId uniqueness) Refs #11. Tests: 3166 unit (+4 new) + 994 behave scenarios all passing. Lint: ruff check + ruff format clean. --------- Co-authored-by: Matthew Horoszowski <mhoroszowski@Winton.lan>
1 parent 17d1207 commit 1721884

5 files changed

Lines changed: 1397 additions & 2 deletions

File tree

features/append-from.feature

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
Feature: Presentation.append_from — cross-deck slide copy
2+
In order to assemble a deck from slides scattered across multiple .pptx files
3+
As a developer using python-pptx
4+
I need a single API call that copies slides between two presentations,
5+
porting the slide-master / layout / theme and deduping images at the target
6+
7+
8+
Scenario: Presentation.append_from(other) appends every source slide
9+
Given two seeded presentations source and target
10+
When I call target.append_from(source)
11+
Then target's slide count grew by source's slide count
12+
And target's master count grew by 1
13+
14+
15+
Scenario: Presentation.append_from(other, slide_indexes) appends selected slides
16+
Given two seeded presentations source and target
17+
When I call target.append_from(source, slide_indexes=[0, 1])
18+
Then target's slide count grew by 2
19+
20+
21+
Scenario: append_from with empty slide_indexes is a no-op
22+
Given two seeded presentations source and target
23+
When I call target.append_from(source, slide_indexes=[])
24+
Then target's slide count is unchanged
25+
26+
27+
Scenario: append_from raises IndexError for out-of-range index
28+
Given two seeded presentations source and target
29+
Then calling target.append_from(source, slide_indexes=[99]) raises IndexError

features/steps/slides.py

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,3 +236,77 @@ def then_duplicate_index_99_raises(context):
236236

237237
with pytest.raises(IndexError):
238238
context.slides.duplicate(context.slides[0], index=99)
239+
240+
241+
# -- append_from steps -----------------------------------------------------
242+
243+
244+
@given("two seeded presentations source and target")
245+
def given_two_seeded_presentations(context):
246+
from pptx.util import Inches
247+
248+
src = Presentation()
249+
layout = src.slide_layouts[6]
250+
for i in range(3):
251+
s = src.slides.add_slide(layout)
252+
s.shapes.add_textbox(Inches(1), Inches(1), Inches(2), Inches(1)).text_frame.text = (
253+
"src %d" % i
254+
)
255+
tgt = Presentation()
256+
tgt.slides.add_slide(tgt.slide_layouts[6])
257+
context.source_pres = src
258+
context.target_pres = tgt
259+
context.target_slides_before = len(tgt.slides)
260+
context.target_masters_before = sum(1 for _ in tgt.slide_masters)
261+
262+
263+
@when("I call target.append_from(source)")
264+
def when_target_append_from_source_all(context):
265+
context.new_slides = context.target_pres.append_from(context.source_pres)
266+
267+
268+
@when("I call target.append_from(source, slide_indexes=[0, 1])")
269+
def when_target_append_from_source_indexes_0_1(context):
270+
context.new_slides = context.target_pres.append_from(
271+
context.source_pres, slide_indexes=[0, 1]
272+
)
273+
274+
275+
@when("I call target.append_from(source, slide_indexes=[])")
276+
def when_target_append_from_source_indexes_empty(context):
277+
context.new_slides = context.target_pres.append_from(context.source_pres, slide_indexes=[])
278+
279+
280+
@then("target's slide count grew by source's slide count")
281+
def then_target_grew_by_source_slide_count(context):
282+
expected = context.target_slides_before + len(context.source_pres.slides)
283+
actual = len(context.target_pres.slides)
284+
assert actual == expected, "expected %d slides, got %d" % (expected, actual)
285+
286+
287+
@then("target's master count grew by 1")
288+
def then_target_master_count_grew_by_1(context):
289+
actual = sum(1 for _ in context.target_pres.slide_masters)
290+
assert actual == context.target_masters_before + 1, (
291+
"expected %d masters, got %d" % (context.target_masters_before + 1, actual)
292+
)
293+
294+
295+
@then("target's slide count grew by 2")
296+
def then_target_slide_count_grew_by_2(context):
297+
actual = len(context.target_pres.slides)
298+
assert actual == context.target_slides_before + 2
299+
300+
301+
@then("target's slide count is unchanged")
302+
def then_target_slide_count_unchanged(context):
303+
actual = len(context.target_pres.slides)
304+
assert actual == context.target_slides_before
305+
306+
307+
@then("calling target.append_from(source, slide_indexes=[99]) raises IndexError")
308+
def then_append_from_index_99_raises(context):
309+
import pytest
310+
311+
with pytest.raises(IndexError):
312+
context.target_pres.append_from(context.source_pres, slide_indexes=[99])

0 commit comments

Comments
 (0)