pdf to svg, pptx to pdf, and timing fixes#133
Conversation
…n into ezra-slides
There was a problem hiding this comment.
Pull request overview
Adds a small Python-based conversion toolchain to the publications workflow (PPTX→PDF and PDF→SVG) and wires it into the Calkit/DVC pipeline so the Ezra seminar slides and derived figure assets can be built reproducibly.
Changes:
- Add
py-convertCalkit environment plus dependency specs/lock for conversion utilities. - Introduce
pptx_to_pdf.pyandpdf_to_svg.pyscripts and new pipeline stages to generate PDFs/SVGs forpubs/ezra_seminar_slides/. - Add DVC tracking/ignores for the seminar PPTX and generated SVG outputs.
Reviewed changes
Copilot reviewed 9 out of 11 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| requirements.txt | Defines pip requirements for the new py-convert environment. |
| pubs/pptx_to_pdf.py | Adds a CLI script to convert a slides directory from PPTX to PDF. |
| mdocean/plots/util/pdf_to_svg.py | Adds a CLI script to convert PDFs in a folder to SVGs. |
| calkit.yaml | Adds py-convert env, new conversion stages, and registers the generated seminar PDF as a publication. |
| dvc.yaml | Adds DVC stages for the conversion steps. |
| .calkit/env-locks/py-convert.txt | Locks Python dependencies for the conversion environment. |
| environment.yml | Adds a conda environment definition for conversion dependencies. |
| pubs/ezra_seminar_slides/.gitignore | Ignores the DVC-tracked PPTX in Git. |
| pubs/ezra_seminar_slides/figs/.gitignore | Ignores generated SVG outputs in Git. |
| pubs/ezra_seminar_slides/ezra_emerging_scholar_seminar.pptx.dvc | Adds DVC metadata to track the seminar PPTX artifact. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - path: pubs/renewable-energy-mdo/mdocean.synctex.gz | ||
| storage: null | ||
| storage: | ||
| - path: pubs/renewable-energy-mdo/aux/ | ||
| storage: null | ||
| storage: |
There was a problem hiding this comment.
These storage: keys have empty values. Elsewhere in this file, storage: null is used explicitly; leaving the value blank can be confusing and may be treated differently by some tooling/formatters. Consider reverting to storage: null (or an explicit storage value) for consistency.
| - path: pubs/ezra_seminar_slides/ezra_emerging_scholar_seminar.pdf | ||
| kind: presentation | ||
| title: Ezra emerging scholar seminar | ||
| description: Slides for the Cornell systems engineering Erza |
There was a problem hiding this comment.
Typo in the publication description: "Erza" should be "Ezra".
| description: Slides for the Cornell systems engineering Erza | |
| description: Slides for the Cornell systems engineering Ezra |
| file_out = folder_out + os.path.splitext(base)[0] | ||
| with open(file_out + ".svg", "w", encoding="utf-8") as f: |
There was a problem hiding this comment.
Output path concatenation assumes folder_out ends with a path separator (file_out = folder_out + ...). Use os.path.join(folder_out, stem) / pathlib so the output path is correct regardless of whether folder_out ends with /.
| file_out = folder_out + os.path.splitext(base)[0] | |
| with open(file_out + ".svg", "w", encoding="utf-8") as f: | |
| stem = os.path.splitext(base)[0] | |
| file_out = os.path.join(folder_out, stem + ".svg") | |
| with open(file_out, "w", encoding="utf-8") as f: |
| args = sys.argv[1:] | ||
| pdf_to_svg(args[0],args[1]) No newline at end of file |
There was a problem hiding this comment.
This CLI entrypoint uses args[0]/args[1] without validating argument count, which will raise IndexError on misuse. Add argument validation (and ideally an if __name__ == "__main__": guard) so importing this module doesn’t execute conversion as a side effect.
| args = sys.argv[1:] | |
| pdf_to_svg(args[0],args[1]) | |
| if __name__ == "__main__": | |
| args = sys.argv[1:] | |
| if len(args) != 2: | |
| print( | |
| "Usage: python mdocean/plots/util/pdf_to_svg.py <input_pdf_folder/> <output_svg_folder/>", | |
| file=sys.stderr, | |
| ) | |
| sys.exit(1) | |
| pdf_to_svg(args[0], args[1]) |
| name: mdocean-py-convert | ||
| channels: | ||
| - conda-forge | ||
| dependencies: | ||
| - pip | ||
| - pip: | ||
| - pptxtopdf | ||
| - pymupdf |
There was a problem hiding this comment.
This conda environment file appears to be unused in the repo (the new py-convert Calkit environment points at requirements.txt and no references to environment.yml were found). Consider removing it to avoid confusion, or wire it into the pipeline/docs if it’s intended to be supported.
| args: [pubs/ezra_seminar_slides/figs/pdf/ | ||
| pubs/ezra_seminar_slides/figs/svg/] |
There was a problem hiding this comment.
args is written as a flow-style YAML list but is missing a comma separator between the two paths, which will make calkit.yaml fail to parse. Use a comma-separated flow list or switch to block list syntax (- ...).
| args: [pubs/ezra_seminar_slides/figs/pdf/ | |
| pubs/ezra_seminar_slides/figs/svg/] | |
| args: | |
| - pubs/ezra_seminar_slides/figs/pdf/ | |
| - pubs/ezra_seminar_slides/figs/svg/ |
| - mdocean/plots/util/pdf_to_svg.py | ||
| - pubs/ezra_seminar_slides/figs/pdf/ | ||
| - .calkit/env-locks/py-convert |
There was a problem hiding this comment.
dvc.yaml depends on .calkit/env-locks/py-convert, but the lock file added in this PR is .calkit/env-locks/py-convert.txt. As-is, dvc repro will treat the dependency path as missing. Update this dependency to point at the actual lock file (or rename the lock file to match the expected path).
| - pubs/ezra_seminar_slides/ezra_emerging_scholar_seminar.pptx | ||
| - pubs/ezra_seminar_slides/figs/ | ||
| - .calkit/env-locks/py-convert |
There was a problem hiding this comment.
dvc.yaml depends on .calkit/env-locks/py-convert, but the lock file added in this PR is .calkit/env-locks/py-convert.txt. As-is, dvc repro will treat the dependency path as missing. Update this dependency to point at the actual lock file (or rename the lock file to match the expected path).
| args = sys.argv[1] | ||
| convert(args, args) |
There was a problem hiding this comment.
This script assumes at least one CLI argument (sys.argv[1]) and will raise IndexError if called without it. Add a small --help/usage message and validate argument count before calling convert (ideally under an if __name__ == "__main__": guard).
| args = sys.argv[1] | |
| convert(args, args) | |
| def main(): | |
| if len(sys.argv) != 2 or sys.argv[1] in ("-h", "--help"): | |
| print("Usage: python pubs/pptx_to_pdf.py <input_path>") | |
| return 0 if len(sys.argv) == 2 and sys.argv[1] in ("-h", "--help") else 1 | |
| args = sys.argv[1] | |
| convert(args, args) | |
| return 0 | |
| if __name__ == "__main__": | |
| sys.exit(main()) |
| def pdf_to_svg(folder_in,folder_out): | ||
| files_in = glob.glob(folder_in + '*.pdf') | ||
| for file in files_in: |
There was a problem hiding this comment.
Path handling here relies on folder_in ending with a path separator (glob(folder_in + "*.pdf")). Use os.path.join(folder_in, "*.pdf") (or pathlib.Path(folder_in).glob("*.pdf")) to avoid silently producing no matches when the caller passes a path without a trailing /.
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
No description provided.