Skip to content

Commit 26dc97e

Browse files
Matthew HoroszowskiMatthew Horoszowski
authored andcommitted
feat(slides): add Slide.duplicate / Slides.duplicate (slide-CRUD Phase 2)
Closes the Phase 2 sub-task of the slide-CRUD epic (#11) and the 12-year-old most-commented upstream feature request scanny#132 in this fork. API --- - `Slides.duplicate(slide, index=None)` — collection-level operation, returns a `Slide`. `index=None` defaults to `source_index + 1` (immediately after the source). Raises `ValueError` if `slide` is not a member, `IndexError` if `index` is out of range. - `Slide.duplicate(index=None)` — convenience alias delegating to `Slides.duplicate(self, index)`. Mirrors `Slide.delete()` → `Slides.remove(self)` from Phase 1. Design ------ The implementation walks the source slide's rels and decides per reltype whether to share the target part (image, media, slide-layout, slide-master, theme — package-level dedup) or deep-copy it (chart, OLE-object, embedded-package, notes-slide). For chart parts the chart XML is `copy.deepcopy`'d but the embedded xlsx workbook is *blob- copied* — the workbook is the chart's data and `<c:numCache>` values in the chart XML mirror it; deepcopy of XML would not capture the binary payload. After deepcopy, the duplicated `<p:sld>` still references the source's `rId`s. We walk every descendant element via `lxml.iter()` and substitute any attribute whose name is in the OOXML relationships namespace — catches `r:id`, `r:embed`, `r:link`, `r:pict`, `r:href` in one pass without enumerating local-names (the local-name approach silently misses `r:link` on linked images, which is the most common real-world failure mode of community recipes). Notes-slide handling addresses upstream gotcha scanny#961: a NotesSlidePart back-references its parent slide via `RT.SLIDE`. The duplicate's notes-slide MUST point at the new slide, not the source. The new helper `duplicate_notes_slide_for(src, new)` rewires the back-rel and is invoked AFTER the new slide part is registered in presentation rels. Comments parts (`RT.COMMENTS` / `RT.COMMENT_AUTHORS`) are dropped on duplicate by an explicit reltype filter — comments support is out of scope for Phase 2 of #11. A test documents the drop so the policy doesn't silently change. Test coverage ------------- - `tests/test_slide_duplicate.py` — 39 new tests across 7 describe classes covering the API surface (existence, signature, raises, index variants, alias delegation), the part-graph invariants (new unique partname / rId / sldId, layout-part shared with source, modification isolation, image-dedup invariant unchanged), the notes-slide path (own copy, back-ref rewired, isolation), the defensive XPath `r:*` check (no unmapped rIds), and round-trip through save/reopen. - `features/sld-duplicate.feature` + 7 new step impls — 4 acceptance scenarios for default-index, explicit-index, alias, and IndexError paths. Verification ------------ ``` $ python3 -m pytest tests/ -q 3125 passed in 4.02s $ python3 -m ruff format src tests 203 files left unchanged $ python3 -m ruff check src tests All checks passed! $ python3 -m behave features/ --no-color 990 scenarios passed, 0 failed, 0 skipped ``` UAT script `uat_slide_duplicate.py` (untracked at repo root per CLAUDE.md §6): builds a 3-slide deck with a textbox, a picture, and speaker notes; exercises both `Slides.duplicate` and `Slide.duplicate`; round-trips; prints structural summary (image_part_count = 2 before AND after — dedup invariant holds). Out of scope (deferred per epic split): - `Presentation.append_from` — Phase 3 (cross-deck rels). - `Presentation.sections` — Phase 4 (`<p14:sectionLst>`). Refs #11.
1 parent 29f3022 commit 26dc97e

5 files changed

Lines changed: 844 additions & 4 deletions

File tree

features/sld-duplicate.feature

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
Feature: Slide duplicate — Slides.duplicate, Slide.duplicate
2+
In order to programmatically clone slides without lxml hacks
3+
As a developer using python-pptx
4+
I need a single API call that duplicates a slide and inserts it at a chosen position
5+
6+
7+
Scenario: Slides.duplicate(slide) inserts the copy after the source
8+
Given a Slides object containing 3 slides
9+
When I call slides.duplicate(slides[0])
10+
Then len(slides) is 4
11+
And the duplicate is at index 1
12+
And the source slide is still at index 0
13+
14+
15+
Scenario: Slides.duplicate(slide, index=N) inserts the copy at index N
16+
Given a Slides object containing 3 slides
17+
When I call slides.duplicate(slides[0], index=3)
18+
Then len(slides) is 4
19+
And the duplicate is at index 3
20+
21+
22+
Scenario: Slide.duplicate() returns a Slide with a new unique slide_id
23+
Given a Slides object containing 3 slides
24+
When I call slides[1].duplicate()
25+
Then len(slides) is 4
26+
And the duplicate slide_id is unique
27+
28+
29+
Scenario: Slides.duplicate raises IndexError for an out-of-range index
30+
Given a Slides object containing 3 slides
31+
Then calling slides.duplicate(slides[0], index=99) raises IndexError

features/steps/slides.py

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,21 @@ def when_I_call_slide_delete(context):
7373
context.slides[1].delete()
7474

7575

76+
@when("I call slides.duplicate(slides[0])")
77+
def when_I_call_slides_duplicate_default_index(context):
78+
context.new_slide = context.slides.duplicate(context.slides[0])
79+
80+
81+
@when("I call slides.duplicate(slides[0], index=3)")
82+
def when_I_call_slides_duplicate_index_3(context):
83+
context.new_slide = context.slides.duplicate(context.slides[0], index=3)
84+
85+
86+
@when("I call slides[1].duplicate()")
87+
def when_I_call_slide_duplicate_alias(context):
88+
context.new_slide = context.slides[1].duplicate()
89+
90+
7691
# then ====================================================
7792

7893

@@ -192,3 +207,32 @@ def then_surviving_slide_order_matches_0_2(context):
192207
expected = [o[0], o[2]]
193208
actual = [s.slide_id for s in context.slides]
194209
assert actual == expected, "expected %r, got %r" % (expected, actual)
210+
211+
212+
@then("the duplicate is at index {idx:d}")
213+
def then_the_duplicate_is_at_index(context, idx):
214+
assert context.slides[idx].slide_id == context.new_slide.slide_id, (
215+
"expected duplicate at index %d, got slide_id mismatch" % idx
216+
)
217+
218+
219+
@then("the source slide is still at index 0")
220+
def then_source_slide_still_at_index_0(context):
221+
assert context.slides[0].slide_id == context.original_slide_ids[0], (
222+
"source slide moved off index 0"
223+
)
224+
225+
226+
@then("the duplicate slide_id is unique")
227+
def then_duplicate_slide_id_is_unique(context):
228+
assert context.new_slide.slide_id not in context.original_slide_ids, (
229+
"duplicate slide_id collides with an existing slide"
230+
)
231+
232+
233+
@then("calling slides.duplicate(slides[0], index=99) raises IndexError")
234+
def then_duplicate_index_99_raises(context):
235+
import pytest
236+
237+
with pytest.raises(IndexError):
238+
context.slides.duplicate(context.slides[0], index=99)

src/pptx/parts/slide.py

Lines changed: 170 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,12 +2,14 @@
22

33
from __future__ import annotations
44

5+
import copy
6+
import re
57
from typing import IO, TYPE_CHECKING, cast
68

79
from pptx.enum.shapes import PROG_ID
810
from pptx.opc.constants import CONTENT_TYPE as CT
911
from pptx.opc.constants import RELATIONSHIP_TYPE as RT
10-
from pptx.opc.package import XmlPart
12+
from pptx.opc.package import Part, XmlPart
1113
from pptx.opc.packuri import PackURI
1214
from pptx.oxml.slide import CT_NotesMaster, CT_NotesSlide, CT_Slide
1315
from pptx.oxml.theme import CT_OfficeStyleSheet
@@ -259,6 +261,173 @@ def _add_notes_slide_part(self):
259261
self.relate_to(notes_slide_part, RT.NOTES_SLIDE)
260262
return notes_slide_part
261263

264+
def duplicate(self) -> SlidePart:
265+
"""Return a new |SlidePart| that is a deep copy of this one.
266+
267+
Image, media, slide-layout, and slide-master rels are reused —
268+
the duplicate references the same package-level parts as the
269+
source. Chart, OLE-embedded, and embedded-package parts are
270+
deep-copied per duplicate. The notes-slide rel and any
271+
comments rels are NOT carried over: notes-slide rewiring is
272+
the caller's job (see |Slides.duplicate|), and comments are
273+
out of scope for Phase 2 of issue #11.
274+
"""
275+
new_partname = self._package.next_partname("/ppt/slides/slide%d.xml")
276+
new_element = copy.deepcopy(self._element)
277+
new_part = SlidePart(new_partname, CT.PML_SLIDE, self._package, new_element)
278+
279+
rId_map = _replicate_rels_for_duplicate(self, new_part)
280+
_remap_rId_attrs(new_element, rId_map)
281+
282+
return new_part
283+
284+
285+
# ---------------------------------------------------------------------------
286+
# Module-level helpers for slide / slide-private part duplication.
287+
# ---------------------------------------------------------------------------
288+
289+
_RELS_NS = "{http://schemas.openxmlformats.org/officeDocument/2006/relationships}"
290+
291+
# Reltypes filtered out during slide duplication. NOTES_SLIDE is wired
292+
# explicitly by |Slides.duplicate| so the new notes-slide back-references
293+
# the new parent slide. Comments are dropped — Phase 2 scope (issue #11).
294+
_DUP_DROP_RELTYPES_SLIDE = frozenset({RT.NOTES_SLIDE, RT.COMMENTS, RT.COMMENT_AUTHORS})
295+
296+
297+
def _replicate_rels_for_duplicate(src_part: Part, new_part: Part) -> dict[str, str]:
298+
"""Mirror src_part's slide-relevant rels onto new_part.
299+
300+
Returns a `{old_rId: new_rId}` map for rId-attribute remapping.
301+
"""
302+
rId_map: dict[str, str] = {}
303+
for rId, rel in src_part.rels.items():
304+
if rel.reltype in _DUP_DROP_RELTYPES_SLIDE:
305+
continue
306+
if rel.is_external:
307+
new_rId = new_part.relate_to(rel.target_ref, rel.reltype, is_external=True)
308+
elif rel.reltype == RT.CHART:
309+
new_target = _duplicate_chart_part(cast(ChartPart, rel.target_part))
310+
new_rId = new_part.relate_to(new_target, rel.reltype)
311+
elif rel.reltype in (RT.OLE_OBJECT, RT.PACKAGE):
312+
new_target = _duplicate_blob_part(cast(Part, rel.target_part))
313+
new_rId = new_part.relate_to(new_target, rel.reltype)
314+
else:
315+
# Shared parts: image, media, video, layout, master, theme, etc.
316+
new_rId = new_part.relate_to(rel.target_part, rel.reltype)
317+
rId_map[rId] = new_rId
318+
return rId_map
319+
320+
321+
def _remap_rId_attrs(element, rId_map: dict[str, str]) -> None:
322+
"""Substitute relationships-namespace attribute values in `element`.
323+
324+
Walks every descendant element and rewrites any attribute whose name
325+
is in the OOXML relationships namespace (catches `r:id`, `r:embed`,
326+
`r:link`, `r:pict`, `r:href` in one pass).
327+
"""
328+
for el in element.iter():
329+
for attr_name in list(el.attrib):
330+
if attr_name.startswith(_RELS_NS):
331+
old = el.attrib[attr_name]
332+
if old in rId_map:
333+
el.attrib[attr_name] = rId_map[old]
334+
335+
336+
def _duplicate_chart_part(src: ChartPart) -> ChartPart:
337+
"""Return a new ChartPart cloning `src`.
338+
339+
Chart XML is deep-copied. Embedded data (e.g. an xlsx workbook
340+
reached via an `RT.PACKAGE` rel) is binary and must be blob-copied,
341+
not deep-copy-of-XML — the workbook IS the chart's data, and the
342+
`<c:numCache>` values in the chart XML mirror it.
343+
"""
344+
package = src._package
345+
new_partname = package.next_partname("/ppt/charts/chart%d.xml")
346+
new_element = copy.deepcopy(src._element)
347+
cls = type(src)
348+
new_part = cls(new_partname, src.content_type, package, new_element)
349+
rId_map: dict[str, str] = {}
350+
for rId, rel in src.rels.items():
351+
if rel.is_external:
352+
new_rId = new_part.relate_to(rel.target_ref, rel.reltype, is_external=True)
353+
elif rel.reltype == RT.PACKAGE:
354+
new_target = _duplicate_blob_part(cast(Part, rel.target_part))
355+
new_rId = new_part.relate_to(new_target, rel.reltype)
356+
else:
357+
# Theme override and other chart-private parts: share for now.
358+
# Practical impact is small; revisit if a user reports it.
359+
new_rId = new_part.relate_to(rel.target_part, rel.reltype)
360+
rId_map[rId] = new_rId
361+
_remap_rId_attrs(new_element, rId_map)
362+
return new_part
363+
364+
365+
def _duplicate_blob_part(src: Part) -> Part:
366+
"""Return a new binary |Part| cloning `src`'s blob.
367+
368+
Used for embedded packages (xlsx, docx, pptx) and OLE objects —
369+
parts whose payload is opaque bytes rather than XML.
370+
"""
371+
package = src._package
372+
cls = type(src)
373+
tmpl = getattr(cls, "partname_template", None)
374+
if tmpl is None:
375+
tmpl = _derive_partname_template(str(src.partname))
376+
new_partname = package.next_partname(tmpl)
377+
return cls(new_partname, src.content_type, package, src.blob)
378+
379+
380+
def _derive_partname_template(partname: str) -> str:
381+
"""Derive a `next_partname`-compatible template from an existing partname.
382+
383+
Replaces the trailing integer (just before the final extension) with
384+
`%d`. Falls back to inserting `%d` immediately before the extension
385+
if there is no trailing digit run.
386+
"""
387+
match = re.match(r"^(.*?)(\d+)(\.[^./]+)$", partname)
388+
if match:
389+
prefix, _, ext = match.groups()
390+
return f"{prefix}%d{ext}"
391+
# No trailing-digit pattern; insert %d before final extension.
392+
dot = partname.rfind(".")
393+
if dot < 0:
394+
return f"{partname}%d"
395+
return f"{partname[:dot]}%d{partname[dot:]}"
396+
397+
398+
def duplicate_notes_slide_for(
399+
src_slide_part: SlidePart, new_slide_part: SlidePart
400+
) -> NotesSlidePart:
401+
"""Create a fresh |NotesSlidePart| for `new_slide_part`, cloning content from src.
402+
403+
Public-to-the-module helper used by |Slides.duplicate| AFTER the new
404+
slide part is registered with the presentation rels. Wires the new
405+
notes-slide's `RT.SLIDE` back-rel to point at `new_slide_part` (NOT
406+
the source) — addresses upstream community gotcha #961 where blindly
407+
copying notes rels left the duplicate's notes pointing at the source.
408+
"""
409+
src_notes_part = cast(NotesSlidePart, src_slide_part.part_related_by(RT.NOTES_SLIDE))
410+
package = src_slide_part._package
411+
new_partname = package.next_partname("/ppt/notesSlides/notesSlide%d.xml")
412+
new_element = copy.deepcopy(src_notes_part._element)
413+
new_notes_part = NotesSlidePart(new_partname, CT.PML_NOTES_SLIDE, package, new_element)
414+
415+
rId_map: dict[str, str] = {}
416+
for rId, rel in src_notes_part.rels.items():
417+
if rel.is_external:
418+
new_rId = new_notes_part.relate_to(rel.target_ref, rel.reltype, is_external=True)
419+
elif rel.reltype == RT.SLIDE:
420+
# ---rewire back-ref to NEW slide part---
421+
new_rId = new_notes_part.relate_to(new_slide_part, RT.SLIDE)
422+
else:
423+
# NOTES_MASTER and any others: share at package level
424+
new_rId = new_notes_part.relate_to(rel.target_part, rel.reltype)
425+
rId_map[rId] = new_rId
426+
_remap_rId_attrs(new_element, rId_map)
427+
428+
new_slide_part.relate_to(new_notes_part, RT.NOTES_SLIDE)
429+
return new_notes_part
430+
262431

263432
class SlideLayoutPart(BaseSlidePart):
264433
"""Slide layout part.

src/pptx/slide.py

Lines changed: 60 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66

77
from pptx.dml.fill import FillFormat
88
from pptx.enum.shapes import PP_PLACEHOLDER
9+
from pptx.opc.constants import RELATIONSHIP_TYPE as RT
910
from pptx.shapes.shapetree import (
1011
LayoutPlaceholders,
1112
LayoutShapes,
@@ -259,6 +260,19 @@ def delete(self) -> None:
259260
prs = self.part.package.presentation_part.presentation
260261
prs.slides.remove(self)
261262

263+
def duplicate(self, index: int | None = None) -> Slide:
264+
"""Return a deep copy of this slide added to the parent presentation.
265+
266+
Convenience alias delegating to :meth:`Slides.duplicate`. The duplicate
267+
is inserted at zero-based `index`; when `index` is |None|, the
268+
duplicate sits at ``self_index + 1`` — immediately after this slide.
269+
270+
See :meth:`Slides.duplicate` for full semantics on dedup, notes-slide
271+
handling, and round-trip behavior.
272+
"""
273+
prs = self.part.package.presentation_part.presentation
274+
return prs.slides.duplicate(self, index)
275+
262276

263277
class Slides(ParentedElementProxy):
264278
"""Sequence of slides belonging to an instance of |Presentation|.
@@ -300,9 +314,9 @@ def add_slide(self, slide_layout: SlideLayout, index: int | None = None) -> Slid
300314
are not supported; pass an explicit position. Raises |IndexError| if
301315
`index` is out of range (negative, or greater than `len(self)`).
302316
303-
Companion operations: :meth:`remove`, :meth:`move`. Cross-deck copy
304-
(``Presentation.append_from``) and ``Slide.duplicate`` are tracked
305-
under issue #11 (Phase 2/3) and not yet implemented.
317+
Companion operations: :meth:`remove`, :meth:`move`,
318+
:meth:`duplicate`. Cross-deck copy (``Presentation.append_from``)
319+
is tracked under issue #11 (Phase 3) and not yet implemented.
306320
"""
307321
# ---validate index BEFORE creating the new SlidePart so a bad index
308322
# does not leak a partial part into the package---
@@ -368,6 +382,49 @@ def remove(self, slide: Slide) -> None:
368382
self._sldIdLst.remove_sldId(sldId)
369383
self.part.drop_rel(target_rId)
370384

385+
def duplicate(self, slide: Slide, index: int | None = None) -> Slide:
386+
"""Return a deep copy of `slide` added to this collection.
387+
388+
The duplicate is inserted at zero-based position `index`. When
389+
`index` is |None| (the default), the new slide is inserted at
390+
``source_index + 1`` — immediately after `slide`. ``index`` may
391+
equal ``len(self)`` to append explicitly. Negative indices are
392+
not supported.
393+
394+
Image, media, slide-layout, and slide-master parts are shared
395+
with the source via package-level dedup — duplicating a slide
396+
that contains pictures does NOT increase the deck's image-part
397+
count. Chart parts, OLE-object parts, and the notes-slide (when
398+
present) are deep-copied so edits to the duplicate don't bleed
399+
back into the source. Comments parts (if any) are dropped —
400+
deferred to a later phase of issue #11.
401+
402+
Raises |ValueError| if `slide` is not a member of this
403+
collection. Raises |IndexError| if `index` is out of range
404+
(negative or greater than `len(self)`).
405+
"""
406+
from pptx.parts.slide import duplicate_notes_slide_for
407+
408+
# ---validate membership BEFORE doing any work; raises ValueError if absent---
409+
src_idx = self.index(slide)
410+
if index is None:
411+
index = src_idx + 1
412+
if index < 0 or index > len(self._sldIdLst):
413+
raise IndexError("slide index out of range")
414+
415+
src_part = slide.part
416+
new_slide_part = src_part.duplicate()
417+
418+
# ---register new slide part with presentation; this allocates an rId---
419+
new_rId = self.part.relate_to(new_slide_part, RT.SLIDE)
420+
self._sldIdLst.insert_sldId_at(new_rId, index)
421+
422+
# ---if source had a notes-slide, give the duplicate its own---
423+
if src_part.has_notes_slide:
424+
duplicate_notes_slide_for(src_part, new_slide_part)
425+
426+
return new_slide_part.slide
427+
371428

372429
class SlideLayout(_BaseSlide):
373430
"""Slide layout object.

0 commit comments

Comments
 (0)