feat(slides): add Presentation.append_from (slide-CRUD Phase 3 — cross-deck copy)#35
Merged
Merged
Conversation
added 2 commits
May 7, 2026 19:21
…s-deck copy) Implements the cross-deck slide copy sub-feature of issue #11. After Phase 1 (in-deck CRUD: remove, move, indexed add) and Phase 2 (`Slide.duplicate`), Phase 3 closes the cross-package gap so users no longer need to fork to `python-pptx-valutico` or roll their own `copy_slide_from_external_prs` recipes. API --- - `Presentation.append_from(other_pres, slide_indexes=None) -> list[Slide]` - `slide_indexes=None` appends every slide of `other_pres` in source order. An iterable of zero-based ints appends only those slides in that order (allowing reordering). - Returns the list of newly-added `Slide` objects in insertion order. - Raises `IndexError` if any value in `slide_indexes` is out of range. Design ------ The implementation generalizes Phase 2's part-graph copy to work between two `Package` instances. A `_PortContext` instance carries the per-call dedup cache (`{source_part: target_part}` for masters, layouts, themes) so source slide-masters / layouts / themes shared by multiple appended source slides land as a single target part. Three load-bearing choices, all driven by the upstream prior-art recon and Phase 2's lessons: 1. **Port-fresh, not match-existing.** Always port a copy of source's slide-layout, slide-master, and theme into target. Layout matching by name or structural similarity is fragile — see upstream issue scanny#934 attempts. Trade-off: target deck accumulates masters when appending from many decks. Documented; preferred over fragile matching. 2. **Image / media dedup uses target package's existing SHA1 machinery.** `Package.get_or_add_image_part(BytesIO(blob))` already dedupes against target's existing image parts by SHA1 of bytes. We hand it the foreign image's blob — no new dedup code needed. 3. **Notes-master is target's, not source's.** Notes-master is a presentation-level singleton in OOXML. The ported notes-slide relates to TARGET's existing (or auto-created) notes-master via `RT.NOTES_MASTER`, with the back-rel (`RT.SLIDE`) rewired to the new slide part — addresses upstream gotcha scanny#961 generalized to the cross-package case. Recursion handling ------------------ Master and layouts form a cycle (master refs its layouts, each layout refs back to master). Broken by registering the new master in `_master_map` BEFORE walking its rels, and by relating each new layout to the master IMMEDIATELY after creation (so subsequent `next_partname` calls see it via `iter_parts` and don't allocate duplicate partnames — that bug surfaces as a zipfile "Duplicate name" warning on save and is rejected by Office). OOXML allocator quirk --------------------- python-pptx's `CT_SlideMasterIdListEntry` / `CT_SlideLayoutIdListEntry` only declare `rId` as a typed attribute, not the required `id` uint32. Existing presentations round-trip fine because lxml preserves unknown attributes verbatim; but a freshly-`_add_sldMasterId()`'d entry has no `id` and PowerPoint rejects the file. The `_add_sldMasterId_to_presentation` and `_add_sldLayoutId_to_master` helpers compute next-free id (max+1 with floor 2147483648) and set it via `el.set("id", str(next_id))` directly on the lxml element. Theme part loaded from disk is a base `Part` (binary blob), not an `XmlPart` — `_port_theme` branches on `isinstance(... XmlPart)` and either deepcopies the element or blob-copies, depending. Test coverage ------------- `tests/test_append_from.py` — 37 new tests across 6 describe classes: - `DescribePresentation_AppendFrom_API` — argument validation, raises, signature, return shape, empty `slide_indexes`, self-from-self edge case. - `DescribePresentation_AppendFrom_PerSlide` — partname uniqueness, sldId uniqueness, modification isolation, source non-mutation. - `DescribePresentation_AppendFrom_MasterPort` — master ported, within-call dedup (3 source slides on shared master → 1 ported), cross-call non-dedup, `<p:sldMasterId>` allocator, layout-tree fidelity. - `DescribePresentation_AppendFrom_ImageDedup` — cross-package SHA1 dedup, within-call dedup, round-trip preservation. - `DescribePresentation_AppendFrom_NotesSlide` — own notes part, text carried, back-rel rewired to new slide, target's existing notes-master reused. - `DescribePresentation_AppendFrom_RoundTrip` — open → append → save → reopen across all paths. - `DescribePresentation_AppendFrom_Antis` — anti-criteria: source unchanged, comments dropped, target's existing slides preserved. `features/append-from.feature` + 7 new step impls — 4 acceptance scenarios for default-all, selective, no-op-on-empty, and IndexError. Verification ------------ ``` $ python3 -m pytest tests/ -q 3162 passed in 4.32s $ python3 -m ruff format src tests 204 files left unchanged $ python3 -m ruff check src tests All checks passed! $ python3 -m behave features/ --no-color 56 features passed, 0 failed, 0 skipped 994 scenarios passed, 0 failed, 0 skipped ``` UAT script `uat_append_from.py` (untracked at repo root): builds a 3-slide source (textbox + picture + speaker notes) and a 1-slide target sharing the same PNG; runs `target.append_from(source)`; prints structural summary showing image_part_count = 2 BEFORE AND AFTER (cross-package SHA1 dedup of both the PNG and the default theme image holds), slide_part_count growing 1 → 4, master_part_count growing 1 → 2; round-trips through save/reopen with title text and speaker notes preserved. Includes a selective-subset variant exercising `slide_indexes=[2, 0]`. Out of scope (deferred per epic split): - `Presentation.sections` (`<p14:sectionLst>`) — Phase 4. - Append-mode open — Performance & Scale epic. - Comments parts — explicitly dropped (consistent with Phase 2). - Cross-deck font-table merging — fonts-in-OOXML are package-shared but font-table parts beyond default-template baselines deferred. Refs #11.
…rPoint repair `Presentation.append_from` triggered PowerPoint's bad-content / repair dialog on first open of any multi-master output. Three distinct OOXML id collisions inherited from the shared default.pptx template needed scrubbing during cross-package master/layout deepcopy: 1. `<p:sldMasterId id>` and `<p:sldLayoutId id>` share one identifier pool per ECMA-376 sec 19.2.1.34 / 19.2.1.27, and PowerPoint enforces uniqueness across both. The new master2 was being allocated id 2147483649 (one above target's lone master at 2147483648) — but 2147483649 was already master1's first layout id. PowerPoint's repair output renumbered master2 to 2147483660. Root fix: build a unified pool from BOTH master and layout ids in the target, allocate master id first (consumes one slot), then renumber layout ids above it. End state matches PowerPoint's repair ordering (master2= 2147483660, master2's layouts=2147483661..2147483671). 2. `<p:sldLayoutId id>` collided across masters because deepcopy of the source master carried the source's layout ids verbatim, identical to target's existing master's layout ids when both came from the same default.pptx. Spec: unique within the layout collection; PowerPoint: unique across the presentation. 3. `<p14:creationId val>` was deterministic in default.pptx (master: 2209977519, plus 11 fixed layout vals), so two presentations from the template ended up with identical vals across their parallel masters and layouts. Refresh each val to a fresh secrets.randbits(32) on deepcopy to keep the extLst structure intact while breaking the collision. Diagnostic path: PowerPoint's repair dialog gave only the generic "PowerPoint found a problem with content" with no error-line detail. Iterated by having the maintainer save the repaired copy and diffing against the original; the rewritten `<p:sldMasterId id>` value (2147483649 → 2147483660) revealed the master/layout pool sharing. Surface: - src/pptx/parts/slide.py: new _used_master_layout_ids_in_target, _renumber_sldLayoutIds, _refresh_creation_ids; _add_sldMasterId_to _presentation accepts unified pool; _port_master and _port_layout _standalone apply scrubs at deepcopy points - tests/test_append_from.py: DescribePresentation_AppendFrom_OOXML Integrity class with 4 zip-level integrity tests (sldLayoutId uniqueness across masters, master id disjoint from layout id pool, master creationId uniqueness, layout creationId uniqueness) Refs #11. Tests: 3166 unit (+4 new) + 994 behave scenarios all passing. Lint: ruff check + ruff format clean.
84267f7 to
5c7c149
Compare
MHoroszowski
added a commit
that referenced
this pull request
May 8, 2026
) (#36) Phase 4 of issue #11 (slide CRUD epic). Implements PowerPoint Sections support — read, write, and round-trip the `p:extLst/p:ext{uri=521415D9-…}/p14:sectionLst` extension that PowerPoint uses to organize slides into named groups in the slide pane. The fork's slide CRUD epic is now complete: Phase 1 (#33), Phase 2 (#34), Phase 3 (#35), and Phase 4 (this PR) collectively shipped duplicate / delete / reorder / copy-between-decks / sections across the originally listed sub-features. New public API -------------- - `Presentation.sections` → `_Sections` collection. - `_Sections` is sequence-like: `__len__`, `__iter__`, `__getitem__`, `index()`. Adds: * `add_section(name, after=None) -> Section` — append, or insert immediately after an existing section when `after` is given. * `remove(section)` — drop section; cleans up the wrapping `p14:sectionLst` / `p:ext` / `p:extLst` chain when the last section goes. - `Section` exposes: * `name` (read/write str) * `id` (read-only GUID-with-braces, e.g. `{ABC-…}`) * `slides` (tuple of |Slide|, in section order) * `add_slide(slide)` — assign or move slide into this section (a slide can belong to at most one section) * `remove_slide(slide)` — drop a slide's section assignment; slide remains in the presentation. Membership invariants --------------------- Section membership references slides by their stable `p:sldId/@id` integer (NOT by `r:id` and NOT by position), so `Slides.move(...)`, indexed `add_slide(...)`, and `Slides.remove(...)` preserve assignment without any extra plumbing. Removed slides become orphan ids in the section's `p14:sldIdLst`; `Section.slides` silently skips them on read but the XML round-trips them untouched (deliberate — matching python-pptx's "preserve foreign data" doctrine). PowerPoint compatibility ------------------------ - Empty sections emit `<p14:sldIdLst/>` so all PowerPoint versions treat them consistently (some interpret an omitted `sldIdLst` as "all unsectioned slides," which is not what we mean). - Section ids generated as `{<UPPER-CASE-GUID>}` matching PowerPoint's wire shape. - Foreign `p:extLst/p:ext` siblings (`p15:*`, modification tracking, etc.) round-trip untouched — the section ext lives alongside, not in place of. Internal additions ------------------ - New `pptx/sections.py` module hosting `Section` and `_Sections` proxy classes. - `pptx/oxml/presentation.py` extended with `CT_PresentationExtensionList`, `CT_PresentationExtension`, `CT_SectionList`, `CT_Section`, `CT_SectionSlideIdList`, `CT_SectionSlideId`, plus `SECTION_LIST_EXT_URI` constant and `CT_Presentation.{section_list, get_or_add_section_list, remove_section_list}` helpers. - `pptx/oxml/ns.py` registers the `p14` prefix (`http://schemas.microsoft.com/office/powerpoint/2010/main`). - `pptx/oxml/__init__.py` registers the new element classes. - `p:extLst` added as a successor entry on the existing ZeroOrOne `sldMasterIdLst`/`sldIdLst`/`sldSz` slots so insert ordering is correct. Tangential test-infra fix ------------------------- `tests/unitutil/cxml.py` namespace-prefix grammar widened from `Word(alphas)` to `Word(alphas, alphanums)` so test fixtures can address `p14:`, `w14:`, `o15:`, etc. Local-name grammar already supported alphanums. Test coverage ------------- - 56 new unit tests in `tests/test_sections.py` covering oxml helpers, the `Section`/`_Sections` proxies, and 8 round-trip integration tests (no-sections baseline, names + membership, GUID preservation, membership-survives-move, removal pruning, orphan preservation, empty-section, unicode/XML-special name, unsectioned-on-section-remove, sibling-ext preservation). - 5 new behave scenarios in `features/sld-sections.feature` (default empty, add, slide membership, move-preserves-membership, remove cleans up extLst). - New `uat_slide_sections.py` (untracked per repo §6) builds a 6-slide / 3-section deck, demonstrates membership preservation through a slide move, and prints a structural read-back. Verification ------------ ``` $ python3 -m pytest tests/ -q | tail -3 3222 passed in 4.33s $ ruff check src tests | tail -3 All checks passed! $ python3 -m behave features/ --no-color | tail -3 999 scenarios passed, 0 failed, 0 skipped 3000 steps passed, 0 failed, 0 skipped ``` Closes #11 Co-authored-by: Matthew Horoszowski <mhoroszowski@Winton.lan>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements Phase 3 of the slide-CRUD epic (issue #11):
Presentation.append_from(other_pres, slide_indexes=None)for cross-deck slide copy. Builds on Phase 1 (#33) and Phase 2 (merged via #33) — the two stacked feature branches landed together.What it does
Returns a list of newly-appended
Slideobjects in target. Source presentation is unmodified.Cross-package porting machinery
_PortContext(inparts/slide.py) caches{source_part: target_part}maps for the duration of oneappend_fromcall so source slide-master / layout / theme parts shared across multiple appended slides land as a single target part rather than being duplicated.Per-class porting:
RT.NOTES_SLIDE,RT.COMMENTS,RT.COMMENT_AUTHORS(Phase 2 policy)._replicate_layout_rels; back-rel to ported master;<p:sldLayoutId>registered in target master.next_partnamecall (subtle race where unrelated parts are invisible toiter_parts, allocator hands out the same partname twice).XmlPartvs basePart(binary blob when default content-type isn't registered to a specific XML class).Package.get_or_add_image_part(BytesIO(blob))for SHA1-based dedup at the target package._duplicate_chart_part_into; OLE / binary via_duplicate_blob_part_into; unknown XML / blob falls through_port_unknown_part(deepcopy element if XmlPart, else blob copy).OOXML id-collision scrubs (load-bearing)
Two presentations built from the same default.pptx template carry identical id values across their parallel parts. Without scrubbing, the merged file triggered PowerPoint's bad-content / repair dialog on first open. Three classes of collision needed handling on deepcopy:
<p:sldMasterId id>and<p:sldLayoutId id>consume one identifier space — PowerPoint enforces uniqueness across both. New_used_master_layout_ids_in_target(pres_part)builds a unified pool;_add_sldMasterId_to_presentationand_renumber_sldLayoutIdsboth allocate from it. Master id consumed first (lowest available), then layout ids above. End state: master2=2147483660, master2's layouts=2147483661..2147483671 — matches PowerPoint's own repair-output ordering exactly.<p:sldLayoutId id>cross-master collision. Source master's deepcopy carries source's layout ids verbatim, identical to target's existing master's layout ids._renumber_sldLayoutIdsreassigns each to a fresh value above the running pool max.<p14:creationId val>on master + every layout. default.pptx ships deterministic vals (master=2209977519, plus 11 fixed layout vals)._refresh_creation_idsrewrites eachvalto a freshsecrets.randbits(32)while keeping the<p:extLst>structure intact (PowerPoint expects the extension element present).The diagnostic path that surfaced collision (1): PowerPoint's repair dialog gave only the generic "found a problem with content" with no error-line detail. Iterated by saving the repaired copy and diffing against the original — the rewritten
<p:sldMasterId id>value (2147483649 → 2147483660) revealed the master/layout pool sharing.Tests (37 new in
tests/test_append_from.py)7 describe classes covering API surface, per-slide porting, master/layout/theme port integrity, image dedup invariant, notes-slide round-trip, full save→reopen cycles, anti-criteria (source not mutated, comments dropped, target's existing slides preserved), and
OOXMLIntegrityzip-level checks (cross-master<p:sldLayoutId id>uniqueness, master id disjoint from layout id pool, master + layout<p14:creationId val>uniqueness).features/append-from.feature— 4 acceptance scenarios.Verification (CLAUDE.md §7)
UAT (CLAUDE.md §6)
uat_append_from.py(untracked at repo root) builds source + target presentations exercising textbox / picture / notes content, cross-package SHA1 image dedup, runs the full append, runs the selective-and-reordered subset variant (slide_indexes=[2, 0]), and round-trips through save→reopen. Maintainer signed off after PowerPoint rendered both_target_out.pptxand_subset_out.pptxclean (no repair dialog) — the post-fix builds.Refs #11.