Depending on the pdf creator/engine used, a .pdf file may include
content not this much relevant for reading by a human. Examples include
bitmap illustrations with a resolution greater than 300 dpi, or color
images while an illustration in grayscale is good enough. A reprint of
the .pdf again as a pdf with ghostscript may address this and yield
a smaller file size easier to store / to transfer (for instance attached
to an email). Text-only files print into postscript (e.g., with a2ps
1), then distilled into a pdf equally can benefit from such an
optimization.
By far, this bash script does not claim to be the first one collecting
bits and bolts to address the issue. It rather serves as an aide-memoire
of finds encountered earlier, and to moderate ghostscript in Linux
accordingly. Within reason, the snippets were joined as provided; thus,
the credit belongs to those already in the field.
The intended use adds the executable bit to the script
(chmod +x pdf_reprint.sh) and accesses the functionality by an
alias. Debian's default shell is BASH; thus, this configuration can be
set either in file /etc/bash.bashrc for any user, in file file
~/.bashrc for the current user; or as a template for any (new) user's
own file ~/.bashrc in file /etc/skel/.bashrc in a pattern of
alias pdf_rewrite="/path/to/pdf_rewrite.sh"-
Presuming the alias to
pdf_rewrite.shset ispdf_rewrite, run either one of the following commands to reprint the.pdfwhile retaining the color:pdf_rewrite --reprint input.pdf pdf_rewrite -r input.pdf pdf_rewrite --colour input.pdf pdf_rewrite --color input.pdf pdf_rewrite -c input.pdf
If the reprint is smaller in size than the original file, the reprint replaces the original file. In addition, a brief note states the percentage of the savings achieved. If wanted, you may repeat the reprint; eventually, either the savings are insignificant in comparison to the remaining file size, or the script itself will report no change and thus retain the original file.
The credit for the underlying approach and implementation belongs to Evan Langlois.2
-
Often, a reprint in grayscale is sufficient. Use either one of the following commands
pdf_rewrite --gray input.pdf pdf_rewrite --grey input.pdf pdf_rewrite -g input.pdf
to overwrite file
input.pdfaccordingly. The credit for this approach belongs to userslmon the Unix stackexchange.3 -
To process multiple
.pdf, you may consider a for loop as infor file in *.pdf do echo "$file" bash ./pdf_rewrite.sh -r "$file" done
This approach equally provides you a brief progress report, too.
Note, illustrations in a reprint in grayscale may render illustrations less intelligible. When preparing a document, a service like https://colorbrewer2.org/ may guide your selection for color palettes suitable for this kind of "photocopying" / identify a palette safe for the color blind.
Keep a backup of the .pdf to be processed. Though the script may report problems while processing the data (or even crash, which may destroy the .pdf), it is not a PDF validator such as e.g., veraPDF.4
Initially written for Linux Xubuntu 18.04.3 LTS and ghostscript (version 9.26), the script is known to work well for instance with Debian 13/trixie and GPL Ghostscript (version 10.05.1).
-
File
link2web.pdfincluded to the project was compiled with pdfLaTeX based on an example provided by www.texample.net. This .pdf contains a color figure and link to an external reference. Note, the simplification into half-tones (option-g) affects the document printed; depending on the pdf viewer used, the highlighting box around the link may remain colored for the display on screen. -
The performance of the utility was tested on a couple of recent publications in chemistry and gs (10.07.0) as currently provided in Debian 14/forky (currently branch testing). To ease a replication, only open access publications used for the bench mark were used.
The table below compares the difference of the file size prior and after the processing with either option after a single run of optimization.
| journal | publisher | original | with -r |
saved % |
with -g |
saved % |
|---|---|---|---|---|---|---|
| 2023ACR3654 | ACS | 2.8 MB | 1.6 MB | 42.9 | 1.6 MB | 42.9 |
| 2026ACR1414 | ACS | 9.1 MB | 1.9 MB | 79.1 | 1.8 MB | 80.2 |
| 2023CrystGrowthDes8469 | ACS | 3.7 MB | 0.7 MB | 81.1 | 0.7 MB | 81.1 |
| 2026CrystGrowthDes2939 | ACS | 8.2 MB | 2.0 MB | 75.6 | 1.8 MB | 78.0 |
| 2023CRV13291 | ACS | 25.5 MB | 3.5 MB | 86.3 | 3.3 MB | 87.1 |
| 2026CRV4375 | ACS | 12.6 MB | 2.6 MB | 79.4 | 2.5 MB | 80.2 |
| 2023JCE4728 | ACS | 3.1 MB | 1.0 MB | 67.7 | 1.0 MB | 67.7 |
| 2026JCE1723 | ACS | 3.2 MB | 1.2 MB | 62.5 | 1.2 MB | 62.5 |
| 2023JOC16719 | ACS | 9.9 MB | 2.3 MB | 76.8 | 2.0 MB | 79.8 |
| 2026JOC5520 | ACS | 5.4 MB | 2.0 MB | 63.0 | 1.8 MB | 66.7 |
| 2023OL9243 | ACS | 2.2 MB | 1.4 MB | 36.4 | 1.3 MB | 40.9 |
| 2026OL5021 | ACS | 2.5 MB | 1.4 MB | 44.0 | 1.4 MB | 44.0 |
| 2026Tetrahedron135290 | Elsevier | 2.4 MB | 2.2 MB | 8.3 | 1.4 MB | 41.7 |
| 2026Tetrahedron135286 | Elsevier | 1.1 MB | 1.0 MB | 9.1 | 0.9 MB | 18.2 |
| 2026TL156046 | Elsevier | 498 kB | 374 kB | 24.9 | 286 kB | 42.6 |
| 2026TL156057 | Elsevier | 2.1 MB | 2.0 MB | 4.8 | 0.5 MB | 76.2 |
| 2024PCCP770 | RSC | 2.3 MB | 1.3 MB | 43.5 | 0.9 MB | 60.9 |
| 2026PCCP9840 | RSC | 5.3 MB | 1.9 MB | 64.2 | 1.3 MB | 75.5 |
| 2024TheorChemAcc4 | Springer | 1.8 MB | 0.7 MB | 61.1 | 0.7 MB | 61.1 |
| 2026TheorChemAcc25 | Springer | 1.5 MB | 0.9 MB | 40.0 | 0.9 MB | 40.0 |
| 2023Synthesis3777 | Thieme | 976 kB | 949 kB | 2.8 | 540 kB | 44.7 |
| 2026Synthesis910 | Thieme | 759 kB | 438 kB | 42.3 | 436 kB | 42.6 |
| 2024ACIEe202314446 | Wiley | 2.5 MB | 1.3 MB | 48.0 | 0.8 MB | 68.0 |
| 2026ACIEe26144 | Wiley | 5.3 MB | 4.0 MB | 24.5 | 2.9 MB | 45.3 |
| 2023HCAe202300110 | Wiley | 10.4 MB | 1.2 MB | 88.5 | 1.2 MB | 88.5 |
| 2026HCA0:e00224 | Wiley | 4.1 MB | 1.3 MB | 68.3 | 1.3 MB | 68.3 |
| 2023JApplCryst1639 | Wiley | 1.1 MB | 0.7 MB | 36.4 | 0.7 MB | 36.4 |
| 2026JApplCryst291 | Wiley | 7.3 MB | 1.3 MB | 82.2 | 0.7 MB | 90.4 |
| 2026BJoc672 | Beilstein | 1.6 MB | 0.5 MB | 68.8 | 0.3 MB | 81.2 |
| 2026BJoc620 | Beilstein | 461 kB | 461 kB | 0.0 | 350 kB | 24.1 |
| 2026Molecules1499 | MDPI | 771 kB | 558 kB | 27.6 | 328 kB | 57.5 |
| 2026Molecules1495 | MDPI | 2.8 MB | 0.8 MB | 71.4 | 0.6 MB | 78.6 |
| 2026JOSS0825 | JOSS | 213 kB | 84 kB | 60.6 | 84 kB | 60.6 |
| 2026JOSS09890 | JOSS | 921 kB | 392 kB | 57.4 | 246 kB | 73.3 |
| arXiv:2605.00564v1 | arxiv | 3.1 MB | 1.4 MB | 54.8 | 0.9 MB | 71.0 |
| arXiv:2605.00149v1 | arxiv | 371 kB | 232 kB | 37.5 | 232 kB | 37.5 |
| link2web.pdf | pdflatex | 38.9 kB | 10.7 kB | 72.5 | 10.6 kB | 72.8 |
By inspection with the utilities of pdfinfo and exiftools, the
rewrite overwrites pdf metadata such as Producer (which can be an
entry like LaTeX with hyperref), CreationDate, and ModDate, while
others are lost for good. With metadata such as TITLE, SUBJECT,
KEYWORDS, and AUTHOR typically retained, a reference manager like
manager zotero (tested with version 9.0.1)
still can collect complementary bibliographic metadata.
A conversion to grayscale is more likely to be successful if the pdf of
interest is converted directly. This seems especially the case if the
document includes ligatures like fl, fl, ae, oe, umlauts in the
Latin script; if the (intermediate) color retaining reprint failed to
properly define these, a subsequent reprint constraint to gray scale may
yield a gap. This issue depends both on the version of ghostscript
installed, and font / pdf-engine of the pdf to be processed because
recent journal publications (like by ACS, member of STIX project5
tend to be less frequently affected by this. This pdf-reprinter is not
tested on pdf about documents predominantly written in other scripts
than Latin.