kcroker
diff --git a/‎CHANGELOG.md‎
Lines changed: 61 additions & 51 deletions b/‎CHANGELOG.md‎
Lines changed: 61 additions & 51 deletions
diff --git a/‎README.md‎
Lines changed: 12 additions & 4 deletions b/‎README.md‎
Lines changed: 12 additions & 4 deletions
diff --git a/‎docs/examples.man‎
Lines changed: 6 additions & 2 deletions b/‎docs/examples.man‎
Lines changed: 6 additions & 2 deletions
@@ -7,159 +7,169 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## Unreleased
 
-* Restructure the code into smaller chunks
-* General maintenance work
+### Additions
+
+* Implement page ranges for `--mode`, `--dpi` and `--quality`.
+* Add a `--socr` ("streamlined" OCR) option that abbreviates `--ocr '{"language": ["eng", "grc"]}'` to `--ocrs eng,grc`.
+* Add a `-f` short variant for `--overwrite`.
+
+### Changes
+
+* Deprecate the short overwrite flag `-o` in favor of `-f`.
+* Warnings and errors are not logger to stderr.
+* Restructure the code into smaller chunks.
+* General maintenance work.
 
 ## 2.5.4 - 2026-04-24
 
-* Run `uv` security audit and update some dependencies
+* Run `uv` security audit and update some dependencies.
 
 ## 2.5.3 - 2026-03-25
 
-* Fix broken workflow without text layer translation
-* Shorter names for temporary directories
-* Code maintenance
+* Fix broken workflow without text layer translation.
+* Shorter names for temporary directories.
+* Code maintenance.
 
 ## 2.5.2 - 2026-03-25
 
-* Relax dependency versions
+* Relax dependency versions.
 
 ## 2.5.1 - 2026-03-14
 
-* Allow manually configuring PDF page resolution (DPI)
+* Allow manually configuring PDF page resolution (DPI).
 
 ## 2.5.0 - 2026-03-13
 
-* Account for DjVu file resolution
-* Simplify image diffing and regenerate better-quality fixtures
+* Account for DjVu file resolution.
+* Simplify image diffing and regenerate better-quality fixtures.
 
 ## 2.4.2 - 2026-02-24
 
-* Fix issue where only the main process has its logger configured
+* Fix issue where only the main process has its logger configured.
 
 ## 2.4.1 - 2026-02-24
 
-* Fix compatibility issues with the new OCRmyPDF API
-* Remove support for Python 3.10
+* Fix compatibility issues with the new OCRmyPDF API.
+* Remove support for Python 3.10.
 
 ## 2.4.0 - 2026-02-24
 
-* Migrate to `uv` from `pyenv` + `poetry`
-* Update dependencies
+* Migrate to `uv` from `pyenv` + `poetry`.
+* Update dependencies.
 
 ## 2.3.1 - 2025-10-28
 
-* Fix mixed-up email format
+* Fix mixed-up email format.
 
 ## 2.3.0 - 2025-10-28
 
-* Remove support for Python 3.9
-* Migrate to standardized `pyproject.toml`
-* Update dependencies
+* Remove support for Python 3.9.
+* Migrate to standardized `pyproject.toml`.
+* Update dependencies.
 
 ## 2.2.15 - 2025-07-02
 
-* Add support for installation via `pipx`
+* Add support for installation via `pipx`.
 
 ## 2.2.14 - 2025-05-27
 
-* Improve installation notes
-* Bump djvulibre-python version
+* Improve installation notes.
+* Bump djvulibre-python version.
 
 ## 2.2.13 - 2025-02-12
 
-* Fail-safe quality settings for non-JPEG images
+* Fail-safe quality settings for non-JPEG images.
 
 ## 2.2.12 - 2025-01-27
 
-* Update pytest_image_diff and fix newly broken tests
+* Update pytest_image_diff and fix newly broken tests.
 
 ## 2.2.11 - 2025-01-26
 
-* Update dependencies
+* Update dependencies.
 
 ## 2.2.10 - 2024-10-25
 
-* Improve interface with OCRmyPDF
-* Fix CI build
+* Improve interface with OCRmyPDF.
+* Fix CI build.
 
 ## 2.2.9 - 2024-10-25
 
-* Improve type hints
-* Update dependencies
+* Improve type hints.
+* Update dependencies.
 
 ## 2.2.8 - 2024-10-18
 
-* Support single characters in the text layer
+* Support single characters in the text layer.
 
 ## 2.2.7 - 2024-08-27
 
-* Improve tab and newline handling
+* Improve tab and newline handling.
 
 ## 2.2.6 - 2024-08-05
 
-* Fix accidental whitespace removal from text blocks
+* Fix accidental whitespace removal from text blocks.
 
 ## 2.2.5 - 2024-07-20
 
-* Re-add ability to force the image mode (RGB/Grayscale/Monochrome)
+* Re-add ability to force the image mode (RGB/Grayscale/Monochrome).
 
 ## 2.2.4 - 2024-02-24
 
-* Update dependencies
+* Update dependencies.
 
 ## 2.2.3 - 2023-12-09
 
-* Fix CI build
-* Ignore invalid UTF-8 sequences
-* Ignore unrecognized page titles in the outline (#23)
+* Fix CI build.
+* Ignore invalid UTF-8 sequences.
+* Ignore unrecognized page titles in the outline (#23).
 
 ## 2.2.2 - 2023-10-29
 
-* Update dependencies
+* Update dependencies.
 
 ## 2.2.1 - 2023-11-06
 
-* Handle invalid PDF pages
-* Fix exception in text layer processing (#20)
+* Handle invalid PDF pages.
+* Fix exception in text layer processing (#20).
 
 ## 2.2.0 - 2023-10-28
 
-* Add options for disabling the text layer and for directly running OCR
+* Add options for disabling the text layer and for directly running OCR.
 
 ## 2.1.5 - 2023-10-27
 
-* Fix inverted colors in images (#16)
+* Fix inverted colors in images (#16).
 
 ## 2.1.4 - 2023-10-06
 
-* Fix typo in logging code
+* Fix typo in logging code.
 
 ## 2.1.3 - 2023-10-06
 
-* Improve logging
+* Improve logging.
 
 ## 2.1.2 - 2023-10-02
 
-* Accidental version bump
+* Accidental version bump.
 
 ## 2.1.1 - 2023-10-02
 
-* Remove debug code
+* Remove debug code.
 
 ## 2.1.0 - 2023-10-02
 
-* Add support for OCRmyPDF
+* Add support for OCRmyPDF.
 
 ## 2.0.2 - 2023-08-03
 
-* Update some other dependencies
-* Replace `python-djvulibre` with `djvulibre-python`
+* Update some other dependencies.
+* Replace `python-djvulibre` with `djvulibre-python`.
 
 ## 2.0.1 - 2023-06-22
 
-* Minor improvements in packaging
+* Minor improvements in packaging.
 
 ## 2.0.0 - 2023-05-04
 
-* Fully rewrite
+* Fully rewrite.
@@ -16,13 +16,19 @@ If you have [OCRmyPDF](https://github.com/ocrmypdf/OCRmyPDF) installed, you can
 
     dpsprep -O3 input.djvu
 
-You can also skip translating the text layer (it is sometimes not translated well) and redo the OCR (rather than launching the `ocrmypdf` CLI, we use the API directly and accept options in JSON format):
+You can also skip translating the text layer (it is sometimes not being translated well) and redo the OCR (rather than launching the `ocrmypdf` CLI, we use the API directly and accept options in JSON format):
 
-    dpsprep --ocr '{"language": ["rus", "eng"]}' input.djvu
+    dpsprep --socr rus,eng,grc input.djvu
 
-Consult the man file ([online](https://github.com/kcroker/dpsprep/wiki/dpsprep.1)) for details; there are a lot of options.
+Sometimes the pages of scanned books are saved as colorful images. For PDF, saving bitonal page backgrounds as RGB images can inflate the file by an order of magnitude (see [below](#compression)). We try to infer the color mode of each page, however that is sometimes inefficient. In such cases, we can force the color mode as follows:
 
-See the next section for different ways to run the program.
+    dpsprep --mode bitonal input.djvu start.pdf
+
+In case we want to preserve the cover page as-is, we can use ranges:
+
+    dpsprep --mode bitonal[2-end] input.djvu start.pdf
+
+For details on these and other options, as well as the allowed range syntax, consult the man file ([online](https://github.com/kcroker/dpsprep/wiki/dpsprep.1)).
 
 ## Installation
 
@@ -88,6 +94,8 @@ If you want `dpsprep` to be able to use `ocrmypdf` from `pipx`'s isolated enviro
 
 ### Compression
 
+PDF files full of images cannot be compressed as efficiently as DjVu, leading to files that are hundreds of megabytes large. Fortunately, books are often bitonal, which allows for efficient compression like `group4` or `jbig2`. Unfortunately, in badly digitized books the scanned images may be saved as colorful JPEG files, which can partially be mitigated using `--mode bitonal` (possibly for only a range of pages).
+
 We perform compression in two stages:
 
 * The first one is the default compression provided by [Pillow](https://github.com/python-pillow/Pillow). For bitonal images, [the PDF generation code says](https://github.com/python-pillow/Pillow/blob/a088d54509e42e4eeed37d618b42d775c0d16ef5/src/PIL/PdfImagePlugin.py#L138C16-L138C16) that, if `libtiff` is available, `group4` compression is used.
 
@@ -12,13 +12,17 @@ Produce an output file using a large pool of workers:
 .IP
 dpsprep --pool=16 input.djvu
 .P
-Force bitonal images:
+Force all pages to be bitonal:
 .IP
 dpsprep --mode bitonal input.djvu
 .P
+Force bitonal pages but leave the cover page as-is (can be useful with badly digitized books):
+.IP
+dpsprep --mode bitonal[2-end] input.djvu
+.P
 Produce an output file by disregarding the text layer and running OCRmyPDF instead:
 .IP
-dpsprep --ocr '{"language": ["rus", "eng"]}' input.djvu
+dpsprep --socr rus,eng,grc input.djvu
 .P
 Simply disregard the text layer without OCR:
 .IP