Skip to content

Commit 592b1f3

Browse files
committed
README updates following Copilot suggestions.
1 parent 4b80fc7 commit 592b1f3

1 file changed

Lines changed: 4 additions & 4 deletions

File tree

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232

3333
## Why PyMuPDF?
3434

35-
- **Fast** — powered by [MuPDF](https://mupdf.com/) , a best-in-class C rendering engine
35+
- **Fast** — powered by [MuPDF](https://mupdf.com/), a best-in-class C rendering engine
3636
- **Accurate** — pixel-perfect text extraction with font, color, and position metadata
3737
- **Versatile** — read, write, annotate, redact, merge, split, and convert documents
3838
- **LLM-ready** — native Markdown output via [PyMuPDF4LLM](https://pypi.org/project/pymupdf4llm/) for RAG and AI pipelines
@@ -513,13 +513,13 @@ for rect in locations:
513513

514514
### `get_images` shows no images but I can clearly see charts in the PDF. Why?
515515

516-
Charts and diagrams created by tools like matplotlib, Excel, or R are typically rendered as vector graphics (PDF drawing commands), not raster images. `get_images ` only lists embedded raster image objects and will not detect vector graphics. To capture these, rasterise the entire page with `page.get_pixmap()`.
516+
Charts and diagrams created by tools like matplotlib, Excel, or R are typically rendered as vector graphics (PDF drawing commands), not raster images. `get_images` only lists embedded raster image objects and will not detect vector graphics. To capture these, rasterise the entire page with `page.get_pixmap()`.
517517

518518

519519

520520
### How does OCR work in PyMuPDF? Does it require a separate Tesseract installation?
521521

522-
PyMuPDF uses Tesseract for OCR, but Tesseract's C++ code is compiled directly into MuPDF — it is not called as an external subprocess. The only external requirement is the **Tesseract language data files** (`tessdata`). Over 100 languages are supported. There is no Python-level pytesseract dependency.
522+
PyMuPDF uses MuPDF's built-in Tesseract-based OCR support, so there is no Python-level `pytesseract` dependency. However, PyMuPDF still needs access to the **Tesseract language data files** (`tessdata`), and automatic tessdata discovery may invoke the `tesseract` executable (for example, to list available languages) if you do not explicitly provide a tessdata path. In practice, the recommended setup is to either install Tesseract so discovery works automatically, or configure the tessdata location yourself via the `tessdata` parameter or the `TESSDATA_PREFIX` environment variable. Over 100 languages are supported.
523523

524524
```python
525525
import pymupdf
@@ -740,7 +740,7 @@ Full installation guide, API reference, cookbook, and tutorial at **[pymupdf.rea
740740

741741
| Project | Description |
742742
|---|---|
743-
| [PyMuPDF4LLM](https://github.com/pymupdf/pymupdf4llm) | TLLM/RAG-optimised Markdown and JSON extraction |
743+
| [PyMuPDF4LLM](https://github.com/pymupdf/pymupdf4llm) | LLM/RAG-optimised Markdown and JSON extraction |
744744
| [PyMuPDF Pro](https://pymupdf.io/pro) | Adds Office and HWP document support |
745745
| [pymupdf-fonts](https://pypi.org/project/pymupdf-fonts/) | Extended font collection for PyMuPDF text output |
746746

0 commit comments

Comments
 (0)