Skip to content

Commit 46a7f70

Browse files
feat: New command: bundle: Bundle multiple files info one PDF
+ It replaces `merge` and `image-to-pdf` + Upgraded log21: CLI has changed a bit
1 parent 580ca0f commit 46a7f70

4 files changed

Lines changed: 134 additions & 120 deletions

File tree

README.md

Lines changed: 24 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ A simple python package that helps with doing simple stuff with PDFs.
66
Features
77
--------
88

9+
+ [x] **Bundle**: Bundle multiple files into one PDF
10+
+ [x] PDF inputs
11+
+ [x] Image inputs (e.g. PNG, JPG, etc.)
12+
+ [ ] Markdown inputs
913
+ [x] **Merge PDFs**: Merge multiple PDFs into one PDF
1014
+ [x] **Split PDFs**: Split a PDF into multiple PDFs, each containing a range of pages from
1115
the original PDF
@@ -17,7 +21,6 @@ Features
1721
+ [ ] Extract images from a PDF
1822
+ [x] **Extract text**: Export text from a PDF file and optionally save it to a text file
1923
+ [ ] Extract links from a PDF
20-
+ [x] **Image to PDF**: Export one or multiple images as a PDF file
2124

2225
If you want any other feature to be added, feel free to open an [issue](https://github.com/MPCodeWriter21/PDF-To-Image/issues)
2326
or fork the repo and make a [pull request](https://github.com/MPCodeWriter21/PDF-To-Image/pulls)
@@ -57,79 +60,74 @@ cd PDF-Helper
5760
uv run pdf-helper <command> [options]
5861
```
5962

60-
### Merge PDFs
63+
### Bundle PDFs
6164

6265
Merge multiple PDFs into one PDF:
6366

6467
```bash
65-
pdf-helper merge -i <input_file_1> <input_file_2>... <input_file_n> -o <output_file>
68+
pdf-helper bundle <input_file_1> <input_file_2>... <input_file_n> <output_file>
6669

6770
# E.g. Merge PDFs 1, 2 and 3 into a new PDF
68-
pdf-helper merge -i 1.pdf 2.pdf 3.pdf -o new.pdf
71+
pdf-helper merge 1.pdf 2.pdf 3.pdf new.pdf
72+
73+
# E.g. Take 1.png, 2.jpg, and 3.png and create a PDF named 123.pdf and override
74+
# if already exists
75+
pdf-helper bundle 1.png 2.jpg 3.png 123.pdf -f
76+
77+
# E.g. Take part1.pdf, image1.png, ending.pdf and bundle them into a PDF named final.pdf
78+
pdf-helper bundle part1.pdf image1.png ending.pdf final.pdf -v
6979
```
7080

7181
### Split PDFs
7282

7383
Split a PDF into multiple PDFs, each containing a range of pages:
7484

7585
```bash
76-
pdf-helper split -i <input_file> -o <output_folder> -s <split_point_1>,<split_point_2>
86+
pdf-helper split <input_file> <output_folder> -s <split_point_1>,<split_point_2>
7787

7888
# E.g. Split a PDF into three PDFs, one with pages 1-10, the second with pages 11-20 and
7989
# the third with pages 21-end
80-
pdf-helper split -i my-pdf.pdf -o my-split-pdfs -s 10,20
90+
pdf-helper split my-pdf.pdf my-split-pdfs -s 10,20
8191

8292
# E.g. Split a PDF into PDFs each containing one page
83-
pdf-helper split -i my-pdf.pdf -o my-split-pdfs # No need to specify split points
93+
pdf-helper split my-pdf.pdf my-split-pdfs # No need to specify split points
8494
```
8595

8696
### Export PDF pages as image files
8797

8898
Export PDF pages as image files:
8999

90100
```bash
91-
pdf-helper to-image -i <input_file> -o <output_folder> \
101+
pdf-helper to-image <input_file> <output_folder> \
92102
-p <page_number_1>,<page_number_2>,...,<page_number_n> -s <scale_factor>
93103

94104
# E.g. Export pages 1, 2, 3 and 6 from a PDF with scale factor 1
95-
pdf-helper to-image -i 1.pdf -o images -p 1-3,6 -s 1
105+
pdf-helper to-image 1.pdf images -p 1-3,6 -s 1
96106

97107
# E.g. Export all pages from a PDF with scale 2
98-
pdf-helper to-image -i my-pdf.pdf -o my-images
108+
pdf-helper to-image my-pdf.pdf my-images
99109
```
100110

101111
### Remove pages from a PDF
102112

103113
Remove pages from a PDF:
104114

105115
```bash
106-
pdf-helper remove-pages -i <input_file> -o <output_file> -p <page_number_1>,<page_number_2>,...,<page_number_n>
116+
pdf-helper remove-pages <input_file> <output_file> <page_number_1>,<page_number_2>,...,<page_number_n>
107117

108118
# E.g. Remove pages 1, 2, 3 and 6 from a PDF
109-
pdf-helper remove-pages -i 1.pdf -o new.pdf -p 1-3,6
119+
pdf-helper remove-pages 1.pdf new.pdf 1-3,6
110120
```
111121

112122
### Export text from a PDF
113123

114124
To extract text from a PDF file and export them to text files you can do as follows:
115125

116126
```bash
117-
pdf-helper extract-text -i <input_file> -o <output_file_name>
127+
pdf-helper extract-text <input_file> -o <output_file_name>
118128

119129
# E.g. Extract text from a PDF named my-pdf.pdf and save it to my-text.txt
120-
pdf-helper extract-text -i my-pdf.pdf -o my-text.txt
121-
```
122-
123-
### Export one or multiple images as a PDF file
124-
125-
You simply provide the script with your images, and it will create a PDF file with them:
126-
127-
```bash
128-
pdf-helper image-to-pdf -i <image_1> <image_2> <image_3> ... -o <output_file>
129-
130-
# E.g. Take 1.png, 2.jpg, and 3.png and create a PDF named 123.pdf and override
131-
# if already exists
132-
pdf-helper image-to-pdf -i 1.png 2.jpg 3.png -o 123.pdf -f
130+
pdf-helper extract-text my-pdf.pdf -o my-text.txt
133131
```
134132

135133
About

pyproject.toml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
[project]
22
name = "PDF-Helper"
3-
version = "0.1.0"
3+
version = "0.2.0"
44
authors = [
55
{name = "CodeWriter21(Mehrad Pooryoussof)", email = "CodeWriter21@gmail.com"}
66
]
77
description = "A simple python script that helps with doing simple stuff with PDFs."
88
readme = {file = "README.md", content-type = "text/markdown"}
99
license = {text = "MIT", file = "LICENSE"}
10-
requires-python = ">=3.9"
10+
requires-python = ">=3.10"
1111
dependencies = [
12-
"log21>=3.0.0",
12+
"log21>=3.3.1",
1313
"pypdfium2>=4.30.0",
1414
"Pillow>=11.0.0"
1515
]
@@ -41,7 +41,7 @@ wrap-descriptions = 88
4141
[tool.ruff]
4242
show-fixes = true
4343
exclude = ["migrations"]
44-
target-version = "py39"
44+
target-version = "py310"
4545
line-length = 88
4646

4747
[tool.ruff.lint]

src/pdf_helper/__init__.py

Lines changed: 95 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,70 @@
1111
from PIL import Image
1212
from pypdfium2 import PdfImage, PdfBitmap, PdfDocument
1313

14+
__version__ = '0.2.0'
15+
1416
__all__ = [
15-
'merge_pdfs', 'remove_pages', 'pdf_to_image', 'extract_text', 'image_to_pdf',
16-
'split_pdf', 'watermark_pdf'
17+
'bundle', 'merge_pdfs', 'remove_pages', 'pdf_to_image', 'extract_text',
18+
'image_to_pdf', 'split_pdf', 'watermark_pdf'
1719
]
1820

1921

22+
def bundle(
23+
input_files: Sequence[str | bytes | Path | os.PathLike[str] | io.BytesIO],
24+
output_stream: str | Path | io.BytesIO | io.BufferedWriter
25+
) -> int:
26+
"""Bundle multiple files together.
27+
28+
:param input_files: List of files to bundle together. Each file can be a PDF or an
29+
image. Supported image formats are those supported by Pillow.
30+
:param output_stream: Output stream to write to.
31+
:return: Number of pages in the bundled PDF.
32+
"""
33+
writer = PdfDocument.new()
34+
for input_file in input_files:
35+
log21.info(f'Adding {input_file}...')
36+
if isinstance(input_file, (str, bytes, Path, os.PathLike)):
37+
if str(input_file).lower().endswith('.pdf'):
38+
reader = PdfDocument(input_file)
39+
writer.import_pages(reader)
40+
else:
41+
image = Image.open(input_file)
42+
bitmap = PdfBitmap.from_pil(image)
43+
pdf_image = PdfImage.new(writer)
44+
pdf_image.set_bitmap(bitmap)
45+
matrix = pdfium.PdfMatrix().scale(bitmap.width, bitmap.height)
46+
pdf_image.set_matrix(matrix)
47+
page = writer.new_page(bitmap.width, bitmap.height)
48+
page.insert_obj(pdf_image)
49+
page.gen_content()
50+
page.close()
51+
pdf_image.close()
52+
bitmap.close()
53+
image.close()
54+
elif isinstance(input_file, io.BytesIO):
55+
try:
56+
reader = PdfDocument(input_file)
57+
writer.import_pages(reader)
58+
except Exception:
59+
image = Image.open(input_file)
60+
bitmap = PdfBitmap.from_pil(image)
61+
pdf_image = PdfImage.new(writer)
62+
pdf_image.set_bitmap(bitmap)
63+
matrix = pdfium.PdfMatrix().scale(bitmap.width, bitmap.height)
64+
pdf_image.set_matrix(matrix)
65+
page = writer.new_page(bitmap.width, bitmap.height)
66+
page.insert_obj(pdf_image)
67+
page.gen_content()
68+
page.close()
69+
pdf_image.close()
70+
bitmap.close()
71+
image.close()
72+
else:
73+
raise ValueError(f'Unsupported input file type: {type(input_file)}')
74+
writer.save(output_stream)
75+
return len(writer)
76+
77+
2078
def merge_pdfs(
2179
input_files: Sequence[str | Path | io.TextIOWrapper],
2280
output_stream: str | Path | io.BytesIO | io.BufferedWriter
@@ -36,6 +94,41 @@ def merge_pdfs(
3694
return len(writer)
3795

3896

97+
def image_to_pdf(
98+
input_files: Sequence[str | bytes | Path | os.PathLike[str] | io.BytesIO],
99+
output_stream: str | Path | io.BytesIO | io.BufferedWriter
100+
) -> int:
101+
"""Convert images to a PDF file.
102+
103+
:param input_files: List of images to convert.
104+
:param output_stream: Output stream to write to.
105+
:return: Number of pages in the output PDF
106+
"""
107+
writer = PdfDocument.new()
108+
for input_file in input_files:
109+
log21.info(f'Adding {input_file}...')
110+
# Open the image file
111+
image = Image.open(input_file)
112+
# Create a bitmap from the image
113+
bitmap = PdfBitmap.from_pil(image)
114+
# Create a PdfImage object from the bitmap
115+
pdf_image = PdfImage.new(writer)
116+
pdf_image.set_bitmap(bitmap)
117+
matrix = pdfium.PdfMatrix().scale(bitmap.width, bitmap.height)
118+
pdf_image.set_matrix(matrix)
119+
# Create a new page and insert the PdfImage object
120+
page = writer.new_page(bitmap.width, bitmap.height)
121+
page.insert_obj(pdf_image)
122+
page.gen_content()
123+
# Close the objects
124+
page.close()
125+
pdf_image.close()
126+
bitmap.close()
127+
image.close()
128+
writer.save(output_stream)
129+
return len(writer)
130+
131+
39132
def remove_pages(
40133
input_file: str | Path | io.BytesIO | io.TextIOWrapper,
41134
pages_to_remove: Collection[int],
@@ -160,41 +253,6 @@ def extract_text(
160253
return text
161254

162255

163-
def image_to_pdf(
164-
input_files: Sequence[str | bytes | Path | os.PathLike[str] | io.BytesIO],
165-
output_stream: str | Path | io.BytesIO | io.BufferedWriter
166-
) -> int:
167-
"""Convert images to a PDF file.
168-
169-
:param input_files: List of images to convert.
170-
:param output_stream: Output stream to write to.
171-
:return: Number of pages in the output PDF
172-
"""
173-
writer = PdfDocument.new()
174-
for input_file in input_files:
175-
log21.info(f'Adding {input_file}...')
176-
# Open the image file
177-
image = Image.open(input_file)
178-
# Create a bitmap from the image
179-
bitmap = PdfBitmap.from_pil(image)
180-
# Create a PdfImage object from the bitmap
181-
pdf_image = PdfImage.new(writer)
182-
pdf_image.set_bitmap(bitmap)
183-
matrix = pdfium.PdfMatrix().scale(bitmap.width, bitmap.height)
184-
pdf_image.set_matrix(matrix)
185-
# Create a new page and insert the PdfImage object
186-
page = writer.new_page(bitmap.width, bitmap.height)
187-
page.insert_obj(pdf_image)
188-
page.gen_content()
189-
# Close the objects
190-
page.close()
191-
pdf_image.close()
192-
bitmap.close()
193-
image.close()
194-
writer.save(output_stream)
195-
return len(writer)
196-
197-
198256
def split_pdf(
199257
input_file: str | Path,
200258
output_directory: str | Path,

0 commit comments

Comments
 (0)