diff --git a/docs/index.rst b/docs/index.rst index 3646bcbf..bd5f1f50 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -16,6 +16,15 @@ pdfly is a command line tool to get information about PDF documents and to manip user/subcommand-meta user/subcommand-cat user/subcommand-x2pdf + user/subcommand-extract-images + user/subcommand-2-up + user/subcommand-booklet + user/subcommand-rm + user/subcommand-pagemeta + user/subcommand-extract-text + user/subcommand-compress + user/subcommand-uncompress + user/subcommand-update-offsets .. toctree:: diff --git a/docs/user/subcommand-2-up.md b/docs/user/subcommand-2-up.md new file mode 100644 index 00000000..11bab2c1 --- /dev/null +++ b/docs/user/subcommand-2-up.md @@ -0,0 +1,32 @@ +# 2-up + +Create a booklet-style PDF from a single input. + +## Usage + +``` +$ pdfly 2-up --help + Usage: pdfly 2-up [OPTIONS] PDF OUT + + Create a booklet-style PDF from a single input. + + Pairs of two pages will be put on one page (left and right) + + usage: python 2-up.py input_file output_file + +╭─ Arguments ──────────────────────────────────────────────────────────────────╮ +│ * pdf FILE [default: None] [required] │ +│ * out PATH [default: None] [required] │ +╰──────────────────────────────────────────────────────────────────────────────╯ +╭─ Options ────────────────────────────────────────────────────────────────────╮ +│ --help Show this message and exit. │ +╰──────────────────────────────────────────────────────────────────────────────╯ +``` + +## Examples + +Convert `document.pdf` into a booklet and write the output in `booklet.pdf`. +``` +pdfly 2-up document.pdf booklet.pdf + +``` diff --git a/docs/user/subcommand-booklet.md b/docs/user/subcommand-booklet.md new file mode 100644 index 00000000..7147401d --- /dev/null +++ b/docs/user/subcommand-booklet.md @@ -0,0 +1,44 @@ +# booklet + +Reorder and two-up PDF pages for booklet printing. + +## Usage + +``` +$ pdfly booklet --help + Usage: pdfly booklet [OPTIONS] FILENAME OUTPUT + + Reorder and two-up PDF pages for booklet printing. + + If the number of pages is not a multiple of four, pages are + added until it is a multiple of four. This includes a centerfold + in the middle of the booklet and a single page on the inside + back cover. The content of those pages are from the + centerfold-file and blank-page-file files, if specified, otherwise + they are blank pages. + + Example: + pdfly booklet input.pdf output.pdf + +╭─ Arguments ──────────────────────────────────────────────────────────────────╮ +│ * filename FILE [default: None] [required] │ +│ * output FILE [default: None] [required] │ +╰──────────────────────────────────────────────────────────────────────────────╯ +╭─ Options ────────────────────────────────────────────────────────────────────╮ +│ --blank-page-file -b FILE page added if input is odd number of pages │ +│ [default: None] │ +│ --centerfold-file -c FILE double-page added if input is missing >= 2 │ +│ pages │ +│ [default: None] │ +│ --help Show this message and exit. │ +╰──────────────────────────────────────────────────────────────────────────────╯ + +``` + +## Examples + +Convert `document.pdf` into a booklet and write the output in `booklet.pdf`. +``` +pdfly booklet document.pdf booklet.pdf + +``` diff --git a/docs/user/subcommand-compress.md b/docs/user/subcommand-compress.md new file mode 100644 index 00000000..ba2c181b --- /dev/null +++ b/docs/user/subcommand-compress.md @@ -0,0 +1,28 @@ +# compress + +Compress a PDF. + +## Usage + +``` +$ pdfly compress --help + Usage: pdfly compress [OPTIONS] PDF OUTPUT + + Compress a PDF. + +╭─ Arguments ───────────────────────────────────────────╮ +│ * pdf FILE [default: None] [required] │ +│ * output PATH [default: None] [required] │ +╰───────────────────────────────────────────────────────╯ +╭─ Options ─────────────────────────────────────────────╮ +│ --help Show this message and exit. │ +╰───────────────────────────────────────────────────────╯ +``` +## Examples + +Compress the file `document.pdf` and output `document_compressed.pdf` + +``` +pdfly compress document.pdf document_compressed.pdf + +``` diff --git a/docs/user/subcommand-extract-images.md b/docs/user/subcommand-extract-images.md new file mode 100644 index 00000000..1095b58e --- /dev/null +++ b/docs/user/subcommand-extract-images.md @@ -0,0 +1,36 @@ +# extract-images + +Extract text from a PDF file. +## Usage + +``` +$ pdfly extract-images --help + Usage: pdfly extract-images [OPTIONS] PDF + + Extract images from PDF without resampling or altering. + + Adapted from work by Sylvain Pelissier + http://stackoverflow.com/questions/2693820/extract-images-from-pdf-without-res + ampling-in-python + +╭─ Arguments ──────────────────────────────────────────────────────────────────╮ +│ * pdf FILE [default: None] [required] │ +╰──────────────────────────────────────────────────────────────────────────────╯ +╭─ Options ────────────────────────────────────────────────────────────────────╮ +│ --help Show this message and exit. │ +╰──────────────────────────────────────────────────────────────────────────────╯ + +``` + +## Examples + +Extract the first page of `document.pdf` and extract the images present in it. + +``` +pdfly cat document.pdf 9 -o page.pdf + +pdfly extract-text page.pdf + Extracted 1 images: + - 0-Im0.png + +``` \ No newline at end of file diff --git a/docs/user/subcommand-extract-text.md b/docs/user/subcommand-extract-text.md new file mode 100644 index 00000000..e39c3766 --- /dev/null +++ b/docs/user/subcommand-extract-text.md @@ -0,0 +1,31 @@ +# extract-text + +Extract text from a PDF file. +## Usage + +``` +$ pdfly extract-text --help + Usage: pdfly extract-text [OPTIONS] PDF + + Extract text from a PDF file. + + +╭─ Arguments ──────────────────────────────────────────────────────────────────╮ +│ * pdf FILE [default: None] [required] │ +╰──────────────────────────────────────────────────────────────────────────────╯ +╭─ Options ────────────────────────────────────────────────────────────────────╮ +│ --help Show this message and exit. │ +╰──────────────────────────────────────────────────────────────────────────────╯ + +``` + +## Examples + +Extract the text from the 10th page of `document.pdf`, redirecting the output into `page.txt`. + +``` +pdfly cat document.pdf 9 -o page.pdf + +pdfly extract-text page.pdf + +``` diff --git a/docs/user/subcommand-pagemeta.md b/docs/user/subcommand-pagemeta.md new file mode 100644 index 00000000..d8f11690 --- /dev/null +++ b/docs/user/subcommand-pagemeta.md @@ -0,0 +1,57 @@ +# pagemeta + +Give details about a PDF's single page. + +## Usage + +``` +$ pdfly pagemeta --help + Usage: pdfly pagemeta [OPTIONS] PDF PAGE_INDEX + + Give details about a single page. + + +╭─ Arguments ──────────────────────────────────────────────────────────────────╮ +│ * pdf FILE [default: None] [required] │ +│ * page_index INTEGER [default: None] [required] │ +╰──────────────────────────────────────────────────────────────────────────────╯ +╭─ Options ────────────────────────────────────────────────────────────────────╮ +│ --output -o [json|text] output format [default: text] │ +│ --help Show this message and exit. │ +╰──────────────────────────────────────────────────────────────────────────────╯ +``` + +## Examples + +Get the metadata of the 101st page of `document.pdf` in text format. +``` +pdfly pagemeta document.pdf 100 + /home/user/.../document.pdf, page index 100 + + ┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ + ┃ Attribute ┃ Value ┃ + ┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ + │ mediabox │ (0.0, 0.0, 504.0, 661.5): with=504.0 x height=661.5 │ + │ cropbox │ (0.0, 0.0, 504.0, 661.5): with=504.0 x height=661.5 │ + │ artbox │ (0.0, 0.0, 504.0, 661.5): with=504.0 x height=661.5 │ + │ bleedbox │ (0.0, 0.0, 504.0, 661.5): with=504.0 x height=661.5 │ + │ annotations │ 8 │ + └─────────────┴─────────────────────────────────────────────────────┘ + All annotations: + 1. /Link at [232.05524, 385.79007, 343.6091, 396.29007] + 2. /Link at [157.63988, 209.99002, 243.69913, 220.49002] + 3. /Link at [72, 178.19678, 249.65918, 188.69678] + 4. /Link at [196.12769, 152.40353, 361.02328, 162.90353] + 5. /Link at [360.97717, 139.80353, 432, 150.30353] + 6. /Link at [72, 127.20352, 213.9915, 137.70352] + 7. /Link at [179.64218, 448.3905, 220.08231, 458.8905] + 8. /Link at [282.84, 347.99005, 340.83148, 358.49005] +``` + +Get the same metadata in `json` format. + +``` +pdfly pagemeta document.pdf 100 -o json + + {"mediabox":[0.0,0.0,504.0,661.5],"cropbox":[0.0,0.0,504.0,661.5],"artbox":[0.0,0.0,504.0,661.5],"bleedbox":[0.0,0.0,504.0,661.5],"annotations":19} +``` \ No newline at end of file diff --git a/docs/user/subcommand-rm.md b/docs/user/subcommand-rm.md new file mode 100644 index 00000000..ee36d4ce --- /dev/null +++ b/docs/user/subcommand-rm.md @@ -0,0 +1,73 @@ +# rm + +Remove pages from PDF files. + +## Usage + +``` +$ pdfly rm --help +Usage: pdfly rm [OPTIONS] FILENAME FN_PGRGS... + + Remove pages from PDF files. + + Page ranges refer to the previously-named file. + A file not followed by a page range means all the pages of the file. + + PAGE RANGES are like Python slices. + + Remember, page indices start with zero. + + Page range expression examples: + + : all pages. -1 last page. + 22 just the 23rd page. :-1 all but the last page. + 0:3 the first three pages. -2 second-to-last page. + :3 the first three pages. -2: last two pages. + 5: from the sixth page onward. -3:-1 third & second to last. + + The third, "stride" or "step" number is also recognized. + + ::2 0 2 4 ... to the end. 3:0:-1 3 2 1 but not 0. + 1:10:2 1 3 5 7 9 2::-1 2 1 0. + ::-1 all pages in reverse order. + + Examples + pdfly rm -o output.pdf document.pdf 2:5 + + Remove pages 2 to 4 from document.pdf, producing output.pdf. + + pdfly rm document.pdf :-1 + + Removes all pages except the last one from document.pdf, modifying the original file. + + pdfly rm report.pdf :6 7: + + Remove all pages except page seven from report.pdf, + producing a single-page report.pdf. + +╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────╮ +│ * filename FILE [default: None] [required] │ +│ * fn_pgrgs FN_PGRGS... filenames and/or page ranges [default: None] [required] │ +╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ +╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────╮ +│ * --output -o PATH [default: None] [required] │ +│ --verbose --no-verbose show page ranges as they are being read [default: no-verbose] │ +│ --help Show this message and exit. │ +╰─────────────────────────────────────────────────────────────────────────────────────────────────────────╯ +``` + +## Examples + +Remove the 5th page of `document.pdf`, modifying the original file. + +``` +pdfly rm document.pdf 4 + +``` + +Remove the first and last page of `document.pdf`, producing `output.pdf`. + +``` +pdfly rm -o output.pdf document.pdf 1:-1 + +``` \ No newline at end of file diff --git a/docs/user/subcommand-uncompress.md b/docs/user/subcommand-uncompress.md new file mode 100644 index 00000000..5acca177 --- /dev/null +++ b/docs/user/subcommand-uncompress.md @@ -0,0 +1,25 @@ +# uncompress + +Module for uncompressing PDF content streams. +## Usage + +``` +$ pdfly ucompress --help + Module for uncompressing PDF content streams. + + ╭─ Arguments ───────────────────────────────────────────╮ + │ * pdf FILE [default: None] [required] │ + │ * output PATH [default: None] [required] │ + ╰───────────────────────────────────────────────────────╯ + ╭─ Options ─────────────────────────────────────────────╮ + │ --help Show this message and exit. │ + ╰───────────────────────────────────────────────────────╯ +``` + +## Examples + +Uncompress `document_compressed.pdf` and output `document.pdf`. + +``` +pdfly uncompress document_compressed.pdf document.pdf +``` \ No newline at end of file diff --git a/docs/user/subcommand-update-offsets.md b/docs/user/subcommand-update-offsets.md new file mode 100644 index 00000000..2b7237b5 --- /dev/null +++ b/docs/user/subcommand-update-offsets.md @@ -0,0 +1,56 @@ +# update-offsets + +Updates offsets and lengths in a simple PDF file. + +## Usage + +``` +$ pdfly update-offsets --help + Usage: pdfly update-offsets [OPTIONS] FILE_IN FILE_OUT + + Updates offsets and lengths in a simple PDF file. + + The PDF specification requires that the xref section at the end + of a PDF file has the correct offsets of the PDF's objects. + It further requires that the dictionary of a stream object + contains a /Length-entry giving the length of the encoded stream. + + When editing a PDF file using a text-editor (e.g. vim) it is + elaborate to compute or adjust these offsets and lengths. + + This command tries to compute /Length-entries of the stream dictionaries + and the offsets in the xref-section automatically. + + It expects that the PDF file has ASCII encoding only. It may + use ISO-8859-1 or UTF-8 in its comments. + The current implementation incorrectly replaces CR (0x0d) by LF (0x0a) in + binary data. + It expects that there is one xref-section only. + It expects that the /Length-entries have default values containing + enough digits, e.g. /Length 000 when the stream consists of 576 bytes. + + Example: + update-offsets --verbose --encoding ISO-8859-1 issue-297.pdf + issue-297.out.pdf + +╭─ Arguments ──────────────────────────────────────────────────────────────────╮ +│ * file_in FILE [default: None] [required] │ +│ * file_out PATH [default: None] [required] │ +╰──────────────────────────────────────────────────────────────────────────────╯ +╭─ Options ────────────────────────────────────────────────────────────────────╮ +│ --encoding TEXT Encoding used to read and write the │ +│ files, e.g. UTF-8. │ +│ [default: ISO-8859-1] │ +│ --verbose --no-verbose Show progress while processing. │ +│ [default: no-verbose] │ +│ --help Show this message and exit. │ +╰──────────────────────────────────────────────────────────────────────────────╯ + +``` + +## Examples + +Update the offsets of `document.pdf` with UTF-8 encoding and write the output to `document.out.pdf`. +``` +pdfly update-offsets document.pdf --verbose --encoding UTF-8 document.out.pdf +``` \ No newline at end of file