Commit 220e954
committed
fix(deps): pin markitdown >=0.1.5 with narrow extras (closes #64)
The bare `markitdown[all]` pulled ~67MB of unused deps (azure-*, pdfminer, pdfplumber, speechrecognition, youtube-transcript-api, pydub, xlrd, olefile) and let users land on pre-0.1.0 markitdown where pptx chart parse errors (`#N/A`, `#DIV/0!`, blanks → `ValueError` from python-pptx `CT_StrVal_NumVal_Composite.value`) propagate out and abort the whole conversion.
Switch to `markitdown[docx,pptx,xlsx,xls]>=0.1.5` so we only install Office-format extras we actually use, and we always run against a version whose `_convert_chart_to_markdown` wraps the python-pptx call in `except Exception` — the offending chart degrades to `[unsupported chart]` instead of killing the file. Also add `.xls` to `SUPPORTED_EXTENSIONS` / `_SHORT_DOC_TYPES` so the `[xls]` extra has a code path that exercises it.
Chart numeric data is still lost on bad cells (upstream limitation in python-pptx — no fix or open PR there). A higher-fidelity pptx path can be added behind a config flag if users need it.1 parent 91cf6d2 commit 220e954
3 files changed
Lines changed: 17 additions & 274 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
95 | | - | |
| 95 | + | |
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
| |||
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
104 | | - | |
| 104 | + | |
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
31 | 31 | | |
32 | 32 | | |
33 | 33 | | |
| |||
0 commit comments