Skip to content

pdf to svg, pptx to pdf, and timing fixes#133

Open
rebeccamccabe wants to merge 14 commits into
mainfrom
ezra-slides
Open

pdf to svg, pptx to pdf, and timing fixes#133
rebeccamccabe wants to merge 14 commits into
mainfrom
ezra-slides

Conversation

@rebeccamccabe
Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings April 27, 2026 04:59
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a small Python-based conversion toolchain to the publications workflow (PPTX→PDF and PDF→SVG) and wires it into the Calkit/DVC pipeline so the Ezra seminar slides and derived figure assets can be built reproducibly.

Changes:

  • Add py-convert Calkit environment plus dependency specs/lock for conversion utilities.
  • Introduce pptx_to_pdf.py and pdf_to_svg.py scripts and new pipeline stages to generate PDFs/SVGs for pubs/ezra_seminar_slides/.
  • Add DVC tracking/ignores for the seminar PPTX and generated SVG outputs.

Reviewed changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
requirements.txt Defines pip requirements for the new py-convert environment.
pubs/pptx_to_pdf.py Adds a CLI script to convert a slides directory from PPTX to PDF.
mdocean/plots/util/pdf_to_svg.py Adds a CLI script to convert PDFs in a folder to SVGs.
calkit.yaml Adds py-convert env, new conversion stages, and registers the generated seminar PDF as a publication.
dvc.yaml Adds DVC stages for the conversion steps.
.calkit/env-locks/py-convert.txt Locks Python dependencies for the conversion environment.
environment.yml Adds a conda environment definition for conversion dependencies.
pubs/ezra_seminar_slides/.gitignore Ignores the DVC-tracked PPTX in Git.
pubs/ezra_seminar_slides/figs/.gitignore Ignores generated SVG outputs in Git.
pubs/ezra_seminar_slides/ezra_emerging_scholar_seminar.pptx.dvc Adds DVC metadata to track the seminar PPTX artifact.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread calkit.yaml Outdated
Comment on lines +5317 to +5320
- path: pubs/renewable-energy-mdo/mdocean.synctex.gz
storage: null
storage:
- path: pubs/renewable-energy-mdo/aux/
storage: null
storage:
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These storage: keys have empty values. Elsewhere in this file, storage: null is used explicitly; leaving the value blank can be confusing and may be treated differently by some tooling/formatters. Consider reverting to storage: null (or an explicit storage value) for consistency.

Copilot uses AI. Check for mistakes.
Comment thread calkit.yaml
- path: pubs/ezra_seminar_slides/ezra_emerging_scholar_seminar.pdf
kind: presentation
title: Ezra emerging scholar seminar
description: Slides for the Cornell systems engineering Erza
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in the publication description: "Erza" should be "Ezra".

Suggested change
description: Slides for the Cornell systems engineering Erza
description: Slides for the Cornell systems engineering Ezra

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +21
file_out = folder_out + os.path.splitext(base)[0]
with open(file_out + ".svg", "w", encoding="utf-8") as f:
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output path concatenation assumes folder_out ends with a path separator (file_out = folder_out + ...). Use os.path.join(folder_out, stem) / pathlib so the output path is correct regardless of whether folder_out ends with /.

Suggested change
file_out = folder_out + os.path.splitext(base)[0]
with open(file_out + ".svg", "w", encoding="utf-8") as f:
stem = os.path.splitext(base)[0]
file_out = os.path.join(folder_out, stem + ".svg")
with open(file_out, "w", encoding="utf-8") as f:

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +27
args = sys.argv[1:]
pdf_to_svg(args[0],args[1]) No newline at end of file
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This CLI entrypoint uses args[0]/args[1] without validating argument count, which will raise IndexError on misuse. Add argument validation (and ideally an if __name__ == "__main__": guard) so importing this module doesn’t execute conversion as a side effect.

Suggested change
args = sys.argv[1:]
pdf_to_svg(args[0],args[1])
if __name__ == "__main__":
args = sys.argv[1:]
if len(args) != 2:
print(
"Usage: python mdocean/plots/util/pdf_to_svg.py <input_pdf_folder/> <output_svg_folder/>",
file=sys.stderr,
)
sys.exit(1)
pdf_to_svg(args[0], args[1])

Copilot uses AI. Check for mistakes.
Comment thread environment.yml Outdated
Comment on lines +1 to +8
name: mdocean-py-convert
channels:
- conda-forge
dependencies:
- pip
- pip:
- pptxtopdf
- pymupdf
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This conda environment file appears to be unused in the repo (the new py-convert Calkit environment points at requirements.txt and no references to environment.yml were found). Consider removing it to avoid confusion, or wire it into the pipeline/docs if it’s intended to be supported.

Copilot uses AI. Check for mistakes.
Comment thread calkit.yaml
Comment on lines +5349 to +5350
args: [pubs/ezra_seminar_slides/figs/pdf/
pubs/ezra_seminar_slides/figs/svg/]
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args is written as a flow-style YAML list but is missing a comma separator between the two paths, which will make calkit.yaml fail to parse. Use a comma-separated flow list or switch to block list syntax (- ...).

Suggested change
args: [pubs/ezra_seminar_slides/figs/pdf/
pubs/ezra_seminar_slides/figs/svg/]
args:
- pubs/ezra_seminar_slides/figs/pdf/
- pubs/ezra_seminar_slides/figs/svg/

Copilot uses AI. Check for mistakes.
Comment thread dvc.yaml Outdated
Comment on lines +6234 to +6236
- mdocean/plots/util/pdf_to_svg.py
- pubs/ezra_seminar_slides/figs/pdf/
- .calkit/env-locks/py-convert
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dvc.yaml depends on .calkit/env-locks/py-convert, but the lock file added in this PR is .calkit/env-locks/py-convert.txt. As-is, dvc repro will treat the dependency path as missing. Update this dependency to point at the actual lock file (or rename the lock file to match the expected path).

Copilot uses AI. Check for mistakes.
Comment thread dvc.yaml Outdated
Comment on lines +6246 to +6248
- pubs/ezra_seminar_slides/ezra_emerging_scholar_seminar.pptx
- pubs/ezra_seminar_slides/figs/
- .calkit/env-locks/py-convert
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dvc.yaml depends on .calkit/env-locks/py-convert, but the lock file added in this PR is .calkit/env-locks/py-convert.txt. As-is, dvc repro will treat the dependency path as missing. Update this dependency to point at the actual lock file (or rename the lock file to match the expected path).

Copilot uses AI. Check for mistakes.
Comment thread pubs/pptx_to_pdf.py
Comment on lines +6 to +7
args = sys.argv[1]
convert(args, args)
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This script assumes at least one CLI argument (sys.argv[1]) and will raise IndexError if called without it. Add a small --help/usage message and validate argument count before calling convert (ideally under an if __name__ == "__main__": guard).

Suggested change
args = sys.argv[1]
convert(args, args)
def main():
if len(sys.argv) != 2 or sys.argv[1] in ("-h", "--help"):
print("Usage: python pubs/pptx_to_pdf.py <input_path>")
return 0 if len(sys.argv) == 2 and sys.argv[1] in ("-h", "--help") else 1
args = sys.argv[1]
convert(args, args)
return 0
if __name__ == "__main__":
sys.exit(main())

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +10
def pdf_to_svg(folder_in,folder_out):
files_in = glob.glob(folder_in + '*.pdf')
for file in files_in:
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Path handling here relies on folder_in ending with a path separator (glob(folder_in + "*.pdf")). Use os.path.join(folder_in, "*.pdf") (or pathlib.Path(folder_in).glob("*.pdf")) to avoid silently producing no matches when the caller passes a path without a trailing /.

Copilot uses AI. Check for mistakes.
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants