Python tools workspace for practical document and content-processing utilities.
The repository started as a PDF-focused project. It now includes conversion, extraction, organization, and publishing helpers, with a stronger focus on Markdown as the output format for downstream AI use.
The main conversion tools are:
pptx_to_epub.py: converts PowerPoint files into structured Markdownpdf_to_epub.py: converts text-based PDFs into structured Markdown
Existing PDF tools still in the repo:
organize_batch.py: AI-assisted PDF organizationpdf_signature.py: PDF signature placement with GUI supportwatch_organizer.py: watch-mode PDF organization
pptx_to_epub.py converts .pptx files into Markdown by extracting slide text and preserving slide structure.
What it does:
- extracts text from slide titles, text boxes, and tables
- preserves slide order
- renders nested bullets as nested Markdown lists
- creates one Markdown section per slide
- supports a single PowerPoint file or an entire directory
- includes both GUI and CLI modes
GUI:
python pptx_to_epub.pyCLI:
python pptx_to_epub.py --input "C:\path\deck.pptx" --output-dir "C:\path\markdown"
python pptx_to_epub.py --input "C:\path\slides" --output-dir "C:\path\markdown"pdf_to_epub.py converts text-based .pdf files into Markdown by extracting page text and inferring headings, paragraphs, lists, and simple code blocks.
What it does:
- extracts readable text from PDF pages
- infers heading levels from font size and emphasis
- merges wrapped lines into paragraphs
- renders detected lists as Markdown lists
- supports a single PDF file or an entire directory
- includes both GUI and CLI modes
What it does not do:
- OCR scanned or image-only PDFs
- preserve visual PDF layout exactly
GUI:
python pdf_to_epub.pyCLI:
python pdf_to_epub.py --input "C:\path\book.pdf" --output-dir "C:\path\markdown"
python pdf_to_epub.py --input "C:\path\pdfs" --output-dir "C:\path\markdown"git clone https://github.com/peterbamuhigire/pyPDFLibrarianSort.git
cd pyPDFLibrarianSort
pip install -r requirements.txtCore dependencies for the Markdown converters:
python-pptxpdfplumberpypdf
- Project Brief
- Project Summary
- Quick Start
- Getting Started
- Features Summary
- PowerPoint To EPUB Guide
- PDF To EPUB Guide
- Web Interface Guide
- PDF Signature Guide
python test_pptx_to_epub.py
python test_pdf_to_epub.py
python test_signature.pyMIT License.