Commit f51769b
authored
feat(cli): add unstructured doctor diagnostics command (#4342)
## Summary
Adds a first-class unstructured doctor command so users can verify
Python extras, optional system tools, and partitioning readiness before
hitting runtime import or tool errors.
Closes #4341
## What’s included
- Console script: unstructured (see [project.scripts] in
pyproject.toml).
- Module entry: python -m unstructured → doctor (and __main__.py).
- unstructured doctor: tables for environment, system tools (libmagic
smoke, tesseract, pandoc, ffmpeg, LibreOffice), and partitionable file
types with pip install "unstructured[extra]" hints.
- unstructured doctor --for <type>: e.g. pdf, docx, image, audio; exits
1 if the requested capability is not ready, 2 on unknown type.
- unstructured doctor --file <path>: infer type via detect_filetype,
same exit semantics.
- Tests: test_unstructured/test_cli_doctor.py.
- Release notes: version 0.22.23 and CHANGELOG.md entry.
## How to verify
- unstructured doctor
- unstructured doctor --for pdf
- unstructured doctor --file path/to/some.pdf
- python -m pytest test_unstructured/test_cli_doctor.py -q
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Medium Risk**
> Medium risk due to introducing a new CLI entrypoint and changing CSV
parsing behavior (engine selection) plus tweaks to metrics DataFrame
writes; failures would mainly affect tooling/metrics rather than core
partitioning output.
>
> **Overview**
> Introduces a first-class `unstructured` CLI (and `python -m
unstructured`) with a `doctor` subcommand that reports environment
details, optional system tool availability (e.g.,
libmagic/tesseract/pandoc/ffmpeg/LibreOffice), and per-filetype
partitioning readiness; adds `--for` and `--file` modes with non-zero
exit codes when capabilities are missing.
>
> Also tightens pandas usage to avoid chained-assignment issues in
metrics reporting, adjusts CSV partitioning to use the Python engine
when delimiter inference is needed (`sep=None`), and fixes tests to set
environment variables as strings; adds comprehensive tests for the new
doctor command and bumps version/docs to `0.22.25`.
>
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
580e27b. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->1 parent b909cf4 commit f51769b
13 files changed
Lines changed: 782 additions & 10 deletions
File tree
- test_unstructured
- partition
- pdf_image
- utils
- unstructured
- metrics
- partition
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
1 | 7 | | |
2 | 8 | | |
3 | 9 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
141 | 141 | | |
142 | 142 | | |
143 | 143 | | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
144 | 147 | | |
145 | 148 | | |
146 | 149 | | |
| |||
259 | 262 | | |
260 | 263 | | |
261 | 264 | | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
262 | 271 | | |
263 | 272 | | |
264 | 273 | | |
| |||
Lines changed: 4 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
251 | 251 | | |
252 | 252 | | |
253 | 253 | | |
254 | | - | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
255 | 258 | | |
256 | 259 | | |
257 | 260 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
234 | 234 | | |
235 | 235 | | |
236 | 236 | | |
237 | | - | |
| 237 | + | |
238 | 238 | | |
239 | 239 | | |
240 | 240 | | |
| |||
246 | 246 | | |
247 | 247 | | |
248 | 248 | | |
249 | | - | |
| 249 | + | |
250 | 250 | | |
251 | 251 | | |
252 | 252 | | |
| |||
258 | 258 | | |
259 | 259 | | |
260 | 260 | | |
261 | | - | |
| 261 | + | |
262 | 262 | | |
263 | 263 | | |
264 | 264 | | |
| |||
270 | 270 | | |
271 | 271 | | |
272 | 272 | | |
273 | | - | |
| 273 | + | |
274 | 274 | | |
275 | 275 | | |
276 | 276 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
15 | | - | |
| 15 | + | |
16 | 16 | | |
17 | 17 | | |
18 | 18 | | |
| |||
0 commit comments