pyPDFLibrarianSort

Python tools workspace for practical document and content-processing utilities.

The repository started as a PDF-focused project. It now includes conversion, extraction, organization, and publishing helpers, with a stronger focus on Markdown as the output format for downstream AI use.

Current Direction

The main conversion tools are:

pptx_to_epub.py: converts PowerPoint files into structured Markdown
pdf_to_epub.py: converts text-based PDFs into structured Markdown

Existing PDF tools still in the repo:

organize_batch.py: AI-assisted PDF organization
pdf_signature.py: PDF signature placement with GUI support
watch_organizer.py: watch-mode PDF organization

Tool: PowerPoint To Markdown

pptx_to_epub.py converts .pptx files into Markdown by extracting slide text and preserving slide structure.

What it does:

extracts text from slide titles, text boxes, and tables
preserves slide order
renders nested bullets as nested Markdown lists
creates one Markdown section per slide
supports a single PowerPoint file or an entire directory
includes both GUI and CLI modes

GUI:

python pptx_to_epub.py

CLI:

python pptx_to_epub.py --input "C:\path\deck.pptx" --output-dir "C:\path\markdown"
python pptx_to_epub.py --input "C:\path\slides" --output-dir "C:\path\markdown"

Tool: PDF To Markdown

pdf_to_epub.py converts text-based .pdf files into Markdown by extracting page text and inferring headings, paragraphs, lists, and simple code blocks.

What it does:

extracts readable text from PDF pages
infers heading levels from font size and emphasis
merges wrapped lines into paragraphs
renders detected lists as Markdown lists
supports a single PDF file or an entire directory
includes both GUI and CLI modes

What it does not do:

OCR scanned or image-only PDFs
preserve visual PDF layout exactly

GUI:

python pdf_to_epub.py

CLI:

python pdf_to_epub.py --input "C:\path\book.pdf" --output-dir "C:\path\markdown"
python pdf_to_epub.py --input "C:\path\pdfs" --output-dir "C:\path\markdown"

Installation

git clone https://github.com/peterbamuhigire/pyPDFLibrarianSort.git
cd pyPDFLibrarianSort
pip install -r requirements.txt

Core dependencies for the Markdown converters:

python-pptx
pdfplumber
pypdf

Documentation

Testing

python test_pptx_to_epub.py
python test_pdf_to_epub.py
python test_signature.py

License

MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.claude		.claude
docs		docs
skills		skills
static		static
templates		templates
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
DOCUMENTATION_UPDATE_SUMMARY.md		DOCUMENTATION_UPDATE_SUMMARY.md
PROJECT_BRIEF.md		PROJECT_BRIEF.md
README.md		README.md
_test_stamp.py		_test_stamp.py
category_template.json		category_template.json
diagnose.py		diagnose.py
fetch-categories.py		fetch-categories.py
git_puller.py		git_puller.py
index-app.py		index-app.py
install.ps1		install.ps1
install.sh		install.sh
organize_batch.py		organize_batch.py
pdf_content_analyzer.py		pdf_content_analyzer.py
pdf_signature.py		pdf_signature.py
pdf_to_epub.py		pdf_to_epub.py
pptx_to_epub.py		pptx_to_epub.py
requirements.txt		requirements.txt
run_gui.sh		run_gui.sh
setup.py		setup.py
sign_setup.py		sign_setup.py
test_basic.py		test_basic.py
test_pdf_to_epub.py		test_pdf_to_epub.py
test_pptx_to_epub.py		test_pptx_to_epub.py
test_signature.py		test_signature.py
watch_organizer.py		watch_organizer.py
watch_setup.py		watch_setup.py
web_interface.py		web_interface.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pyPDFLibrarianSort

Current Direction

Tool: PowerPoint To Markdown

Tool: PDF To Markdown

Installation

Documentation

Testing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

pyPDFLibrarianSort

Current Direction

Tool: PowerPoint To Markdown

Tool: PDF To Markdown

Installation

Documentation

Testing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages