Skip to content

feat(slides): add Presentation.append_from (slide-CRUD Phase 3 — cross-deck copy)#35

Merged
MHoroszowski merged 2 commits into
masterfrom
feature/slide-crud-phase3
May 7, 2026
Merged

feat(slides): add Presentation.append_from (slide-CRUD Phase 3 — cross-deck copy)#35
MHoroszowski merged 2 commits into
masterfrom
feature/slide-crud-phase3

Conversation

@MHoroszowski
Copy link
Copy Markdown
Owner

Implements Phase 3 of the slide-CRUD epic (issue #11): Presentation.append_from(other_pres, slide_indexes=None) for cross-deck slide copy. Builds on Phase 1 (#33) and Phase 2 (merged via #33) — the two stacked feature branches landed together.

What it does

from pptx import Presentation

target = Presentation("target.pptx")
source = Presentation("source.pptx")
new_slides = target.append_from(source)                        # all slides
new_slides = target.append_from(source, slide_indexes=[2, 0])  # selective + reordered
target.save("merged.pptx")

Returns a list of newly-appended Slide objects in target. Source presentation is unmodified.

Cross-package porting machinery

_PortContext (in parts/slide.py) caches {source_part: target_part} maps for the duration of one append_from call so source slide-master / layout / theme parts shared across multiple appended slides land as a single target part rather than being duplicated.

Per-class porting:

  • Slide — deepcopy element + replicate rels minus RT.NOTES_SLIDE, RT.COMMENTS, RT.COMMENT_AUTHORS (Phase 2 policy).
  • SlideLayout — deepcopy + remap rels through _replicate_layout_rels; back-rel to ported master; <p:sldLayoutId> registered in target master.
  • SlideMaster — deepcopy + walk rels; layouts pre-created and related back to master IMMEDIATELY before the next next_partname call (subtle race where unrelated parts are invisible to iter_parts, allocator hands out the same partname twice).
  • Theme — branched on XmlPart vs base Part (binary blob when default content-type isn't registered to a specific XML class).
  • Image — routed through Package.get_or_add_image_part(BytesIO(blob)) for SHA1-based dedup at the target package.
  • Notes-slide — uses target's existing notes-master (presentation singleton); back-rel rewired to the new slide.
  • Chart / OLE / unknown — chart cloned via _duplicate_chart_part_into; OLE / binary via _duplicate_blob_part_into; unknown XML / blob falls through _port_unknown_part (deepcopy element if XmlPart, else blob copy).

OOXML id-collision scrubs (load-bearing)

Two presentations built from the same default.pptx template carry identical id values across their parallel parts. Without scrubbing, the merged file triggered PowerPoint's bad-content / repair dialog on first open. Three classes of collision needed handling on deepcopy:

  1. Shared master/layout id pool (per ECMA-376 §19.2.1.34 + §19.2.1.27). <p:sldMasterId id> and <p:sldLayoutId id> consume one identifier space — PowerPoint enforces uniqueness across both. New _used_master_layout_ids_in_target(pres_part) builds a unified pool; _add_sldMasterId_to_presentation and _renumber_sldLayoutIds both allocate from it. Master id consumed first (lowest available), then layout ids above. End state: master2=2147483660, master2's layouts=2147483661..2147483671 — matches PowerPoint's own repair-output ordering exactly.
  2. <p:sldLayoutId id> cross-master collision. Source master's deepcopy carries source's layout ids verbatim, identical to target's existing master's layout ids. _renumber_sldLayoutIds reassigns each to a fresh value above the running pool max.
  3. <p14:creationId val> on master + every layout. default.pptx ships deterministic vals (master=2209977519, plus 11 fixed layout vals). _refresh_creation_ids rewrites each val to a fresh secrets.randbits(32) while keeping the <p:extLst> structure intact (PowerPoint expects the extension element present).

The diagnostic path that surfaced collision (1): PowerPoint's repair dialog gave only the generic "found a problem with content" with no error-line detail. Iterated by saving the repaired copy and diffing against the original — the rewritten <p:sldMasterId id> value (2147483649 → 2147483660) revealed the master/layout pool sharing.

Tests (37 new in tests/test_append_from.py)

7 describe classes covering API surface, per-slide porting, master/layout/theme port integrity, image dedup invariant, notes-slide round-trip, full save→reopen cycles, anti-criteria (source not mutated, comments dropped, target's existing slides preserved), and OOXMLIntegrity zip-level checks (cross-master <p:sldLayoutId id> uniqueness, master id disjoint from layout id pool, master + layout <p14:creationId val> uniqueness).

features/append-from.feature — 4 acceptance scenarios.

Verification (CLAUDE.md §7)

$ python3 -m pytest tests/ -q | tail -3
........................................................................ [ 97%]
......................................................................   [100%]
3166 passed in 4.54s

$ ruff check src tests | tail -3
All checks passed!

$ python3 -m behave features/ --no-color 2>&1 | tail -3
994 scenarios passed, 0 failed, 0 skipped
2979 steps passed, 0 failed, 0 skipped
Took 0min 1.670s

UAT (CLAUDE.md §6)

uat_append_from.py (untracked at repo root) builds source + target presentations exercising textbox / picture / notes content, cross-package SHA1 image dedup, runs the full append, runs the selective-and-reordered subset variant (slide_indexes=[2, 0]), and round-trips through save→reopen. Maintainer signed off after PowerPoint rendered both _target_out.pptx and _subset_out.pptx clean (no repair dialog) — the post-fix builds.

Refs #11.

Matthew Horoszowski added 2 commits May 7, 2026 19:21
…s-deck copy)

Implements the cross-deck slide copy sub-feature of issue #11. After
Phase 1 (in-deck CRUD: remove, move, indexed add) and Phase 2
(`Slide.duplicate`), Phase 3 closes the cross-package gap so users no
longer need to fork to `python-pptx-valutico` or roll their own
`copy_slide_from_external_prs` recipes.

API
---

- `Presentation.append_from(other_pres, slide_indexes=None) -> list[Slide]`
  - `slide_indexes=None` appends every slide of `other_pres` in source
    order. An iterable of zero-based ints appends only those slides in
    that order (allowing reordering).
  - Returns the list of newly-added `Slide` objects in insertion order.
  - Raises `IndexError` if any value in `slide_indexes` is out of range.

Design
------

The implementation generalizes Phase 2's part-graph copy to work
between two `Package` instances. A `_PortContext` instance carries the
per-call dedup cache (`{source_part: target_part}` for masters, layouts,
themes) so source slide-masters / layouts / themes shared by multiple
appended source slides land as a single target part.

Three load-bearing choices, all driven by the upstream prior-art
recon and Phase 2's lessons:

1. **Port-fresh, not match-existing.** Always port a copy of source's
   slide-layout, slide-master, and theme into target. Layout matching
   by name or structural similarity is fragile — see upstream issue
   scanny#934 attempts. Trade-off: target deck accumulates masters when
   appending from many decks. Documented; preferred over fragile
   matching.

2. **Image / media dedup uses target package's existing SHA1
   machinery.** `Package.get_or_add_image_part(BytesIO(blob))` already
   dedupes against target's existing image parts by SHA1 of bytes. We
   hand it the foreign image's blob — no new dedup code needed.

3. **Notes-master is target's, not source's.** Notes-master is a
   presentation-level singleton in OOXML. The ported notes-slide
   relates to TARGET's existing (or auto-created) notes-master via
   `RT.NOTES_MASTER`, with the back-rel (`RT.SLIDE`) rewired to the
   new slide part — addresses upstream gotcha scanny#961 generalized to the
   cross-package case.

Recursion handling
------------------

Master and layouts form a cycle (master refs its layouts, each layout
refs back to master). Broken by registering the new master in
`_master_map` BEFORE walking its rels, and by relating each new layout
to the master IMMEDIATELY after creation (so subsequent
`next_partname` calls see it via `iter_parts` and don't allocate
duplicate partnames — that bug surfaces as a zipfile "Duplicate name"
warning on save and is rejected by Office).

OOXML allocator quirk
---------------------

python-pptx's `CT_SlideMasterIdListEntry` / `CT_SlideLayoutIdListEntry`
only declare `rId` as a typed attribute, not the required `id` uint32.
Existing presentations round-trip fine because lxml preserves unknown
attributes verbatim; but a freshly-`_add_sldMasterId()`'d entry has no
`id` and PowerPoint rejects the file. The `_add_sldMasterId_to_presentation`
and `_add_sldLayoutId_to_master` helpers compute next-free id (max+1
with floor 2147483648) and set it via `el.set("id", str(next_id))`
directly on the lxml element.

Theme part loaded from disk is a base `Part` (binary blob), not an
`XmlPart` — `_port_theme` branches on `isinstance(... XmlPart)` and
either deepcopies the element or blob-copies, depending.

Test coverage
-------------

`tests/test_append_from.py` — 37 new tests across 6 describe classes:
- `DescribePresentation_AppendFrom_API` — argument validation, raises,
  signature, return shape, empty `slide_indexes`, self-from-self edge
  case.
- `DescribePresentation_AppendFrom_PerSlide` — partname uniqueness,
  sldId uniqueness, modification isolation, source non-mutation.
- `DescribePresentation_AppendFrom_MasterPort` — master ported,
  within-call dedup (3 source slides on shared master → 1 ported),
  cross-call non-dedup, `<p:sldMasterId>` allocator, layout-tree
  fidelity.
- `DescribePresentation_AppendFrom_ImageDedup` — cross-package SHA1
  dedup, within-call dedup, round-trip preservation.
- `DescribePresentation_AppendFrom_NotesSlide` — own notes part,
  text carried, back-rel rewired to new slide, target's existing
  notes-master reused.
- `DescribePresentation_AppendFrom_RoundTrip` — open → append → save
  → reopen across all paths.
- `DescribePresentation_AppendFrom_Antis` — anti-criteria: source
  unchanged, comments dropped, target's existing slides preserved.

`features/append-from.feature` + 7 new step impls — 4 acceptance
scenarios for default-all, selective, no-op-on-empty, and IndexError.

Verification
------------

```
$ python3 -m pytest tests/ -q
3162 passed in 4.32s

$ python3 -m ruff format src tests
204 files left unchanged

$ python3 -m ruff check src tests
All checks passed!

$ python3 -m behave features/ --no-color
56 features passed, 0 failed, 0 skipped
994 scenarios passed, 0 failed, 0 skipped
```

UAT script `uat_append_from.py` (untracked at repo root): builds a
3-slide source (textbox + picture + speaker notes) and a 1-slide
target sharing the same PNG; runs `target.append_from(source)`;
prints structural summary showing image_part_count = 2 BEFORE AND
AFTER (cross-package SHA1 dedup of both the PNG and the default
theme image holds), slide_part_count growing 1 → 4, master_part_count
growing 1 → 2; round-trips through save/reopen with title text and
speaker notes preserved. Includes a selective-subset variant
exercising `slide_indexes=[2, 0]`.

Out of scope (deferred per epic split):
- `Presentation.sections` (`<p14:sectionLst>`) — Phase 4.
- Append-mode open — Performance & Scale epic.
- Comments parts — explicitly dropped (consistent with Phase 2).
- Cross-deck font-table merging — fonts-in-OOXML are package-shared
  but font-table parts beyond default-template baselines deferred.

Refs #11.
…rPoint repair

`Presentation.append_from` triggered PowerPoint's bad-content / repair
dialog on first open of any multi-master output. Three distinct OOXML
id collisions inherited from the shared default.pptx template needed
scrubbing during cross-package master/layout deepcopy:

1. `<p:sldMasterId id>` and `<p:sldLayoutId id>` share one identifier
   pool per ECMA-376 sec 19.2.1.34 / 19.2.1.27, and PowerPoint enforces
   uniqueness across both. The new master2 was being allocated id
   2147483649 (one above target's lone master at 2147483648) — but
   2147483649 was already master1's first layout id. PowerPoint's
   repair output renumbered master2 to 2147483660. Root fix: build a
   unified pool from BOTH master and layout ids in the target, allocate
   master id first (consumes one slot), then renumber layout ids above
   it. End state matches PowerPoint's repair ordering (master2=
   2147483660, master2's layouts=2147483661..2147483671).

2. `<p:sldLayoutId id>` collided across masters because deepcopy of the
   source master carried the source's layout ids verbatim, identical to
   target's existing master's layout ids when both came from the same
   default.pptx. Spec: unique within the layout collection; PowerPoint:
   unique across the presentation.

3. `<p14:creationId val>` was deterministic in default.pptx (master:
   2209977519, plus 11 fixed layout vals), so two presentations from
   the template ended up with identical vals across their parallel
   masters and layouts. Refresh each val to a fresh secrets.randbits(32)
   on deepcopy to keep the extLst structure intact while breaking the
   collision.

Diagnostic path: PowerPoint's repair dialog gave only the generic
"PowerPoint found a problem with content" with no error-line detail.
Iterated by having the maintainer save the repaired copy and diffing
against the original; the rewritten `<p:sldMasterId id>` value
(2147483649 → 2147483660) revealed the master/layout pool sharing.

Surface:
- src/pptx/parts/slide.py: new _used_master_layout_ids_in_target,
  _renumber_sldLayoutIds, _refresh_creation_ids; _add_sldMasterId_to
  _presentation accepts unified pool; _port_master and _port_layout
  _standalone apply scrubs at deepcopy points
- tests/test_append_from.py: DescribePresentation_AppendFrom_OOXML
  Integrity class with 4 zip-level integrity tests (sldLayoutId
  uniqueness across masters, master id disjoint from layout id pool,
  master creationId uniqueness, layout creationId uniqueness)

Refs #11.

Tests: 3166 unit (+4 new) + 994 behave scenarios all passing.
Lint: ruff check + ruff format clean.
@MHoroszowski MHoroszowski force-pushed the feature/slide-crud-phase3 branch from 84267f7 to 5c7c149 Compare May 7, 2026 23:21
@MHoroszowski MHoroszowski merged commit 1721884 into master May 7, 2026
12 checks passed
@MHoroszowski MHoroszowski deleted the feature/slide-crud-phase3 branch May 7, 2026 23:24
MHoroszowski added a commit that referenced this pull request May 8, 2026
) (#36)

Phase 4 of issue #11 (slide CRUD epic). Implements PowerPoint Sections
support — read, write, and round-trip the
`p:extLst/p:ext{uri=521415D9-…}/p14:sectionLst` extension that
PowerPoint uses to organize slides into named groups in the slide
pane. The fork's slide CRUD epic is now complete: Phase 1 (#33),
Phase 2 (#34), Phase 3 (#35), and Phase 4 (this PR) collectively
shipped duplicate / delete / reorder / copy-between-decks / sections
across the originally listed sub-features.

New public API
--------------
- `Presentation.sections` → `_Sections` collection.
- `_Sections` is sequence-like: `__len__`, `__iter__`,
  `__getitem__`, `index()`. Adds:
  * `add_section(name, after=None) -> Section` — append, or insert
    immediately after an existing section when `after` is given.
  * `remove(section)` — drop section; cleans up the wrapping
    `p14:sectionLst` / `p:ext` / `p:extLst` chain when the last
    section goes.
- `Section` exposes:
  * `name` (read/write str)
  * `id` (read-only GUID-with-braces, e.g. `{ABC-…}`)
  * `slides` (tuple of |Slide|, in section order)
  * `add_slide(slide)` — assign or move slide into this section
    (a slide can belong to at most one section)
  * `remove_slide(slide)` — drop a slide's section assignment;
    slide remains in the presentation.

Membership invariants
---------------------
Section membership references slides by their stable
`p:sldId/@id` integer (NOT by `r:id` and NOT by position), so
`Slides.move(...)`, indexed `add_slide(...)`, and `Slides.remove(...)`
preserve assignment without any extra plumbing. Removed slides
become orphan ids in the section's `p14:sldIdLst`; `Section.slides`
silently skips them on read but the XML round-trips them untouched
(deliberate — matching python-pptx's "preserve foreign data"
doctrine).

PowerPoint compatibility
------------------------
- Empty sections emit `<p14:sldIdLst/>` so all PowerPoint versions
  treat them consistently (some interpret an omitted `sldIdLst`
  as "all unsectioned slides," which is not what we mean).
- Section ids generated as `{<UPPER-CASE-GUID>}` matching
  PowerPoint's wire shape.
- Foreign `p:extLst/p:ext` siblings (`p15:*`, modification
  tracking, etc.) round-trip untouched — the section ext lives
  alongside, not in place of.

Internal additions
------------------
- New `pptx/sections.py` module hosting `Section` and
  `_Sections` proxy classes.
- `pptx/oxml/presentation.py` extended with
  `CT_PresentationExtensionList`, `CT_PresentationExtension`,
  `CT_SectionList`, `CT_Section`, `CT_SectionSlideIdList`,
  `CT_SectionSlideId`, plus `SECTION_LIST_EXT_URI` constant and
  `CT_Presentation.{section_list, get_or_add_section_list,
  remove_section_list}` helpers.
- `pptx/oxml/ns.py` registers the `p14` prefix
  (`http://schemas.microsoft.com/office/powerpoint/2010/main`).
- `pptx/oxml/__init__.py` registers the new element classes.
- `p:extLst` added as a successor entry on the existing
  ZeroOrOne `sldMasterIdLst`/`sldIdLst`/`sldSz` slots so
  insert ordering is correct.

Tangential test-infra fix
-------------------------
`tests/unitutil/cxml.py` namespace-prefix grammar widened from
`Word(alphas)` to `Word(alphas, alphanums)` so test fixtures
can address `p14:`, `w14:`, `o15:`, etc. Local-name grammar
already supported alphanums.

Test coverage
-------------
- 56 new unit tests in `tests/test_sections.py` covering oxml
  helpers, the `Section`/`_Sections` proxies, and 8 round-trip
  integration tests (no-sections baseline, names + membership,
  GUID preservation, membership-survives-move, removal pruning,
  orphan preservation, empty-section, unicode/XML-special name,
  unsectioned-on-section-remove, sibling-ext preservation).
- 5 new behave scenarios in `features/sld-sections.feature`
  (default empty, add, slide membership, move-preserves-membership,
  remove cleans up extLst).
- New `uat_slide_sections.py` (untracked per repo §6) builds a
  6-slide / 3-section deck, demonstrates membership preservation
  through a slide move, and prints a structural read-back.

Verification
------------
```
$ python3 -m pytest tests/ -q | tail -3
3222 passed in 4.33s

$ ruff check src tests | tail -3
All checks passed!

$ python3 -m behave features/ --no-color | tail -3
999 scenarios passed, 0 failed, 0 skipped
3000 steps passed, 0 failed, 0 skipped
```

Closes #11

Co-authored-by: Matthew Horoszowski <mhoroszowski@Winton.lan>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant