Skip to content

Commit d1f1bdf

Browse files
luke-kucingclauderyannikolaidis
authored
chorse sep bump to resolve open CVEs (#4205)
resolving open CVEs and bumping versions pr grew kinda large files changed due to the changes in unstructured ingest -- i had to re run the ingest_fixtures PR workflow and the markdown fixtures make command --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: luke-kucing <luke-kucing@users.noreply.github.com>
1 parent d4caedf commit d1f1bdf

24 files changed

Lines changed: 277 additions & 291 deletions

File tree

CHANGELOG.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,16 @@
1-
## 0.18.31-dev2
1+
## 0.18.31
22

33
### Enhancements
44
- Changed default DPI to 350
55
- **Add token-based chunking support**: Added `max_tokens`, `new_after_n_tokens`, and `tokenizer` parameters to `chunk_by_title()` and `chunk_elements()` for chunking by token count instead of character count. Uses tiktoken for token counting. Install with `pip install "unstructured[chunking-tokens]"`. (fixes #4127)
66

77
### Fixes
8+
- Resolved security vulnerabilities in base system dependencies
9+
Bumped dependencies to address the following CVEs:
10+
**glibc & related (glibc, glibc-locale-posix, ld-linux, libcrypt1, posix-libc-utils, posix-libc-utils-bin)**: CVE-2026-0915, CVE-2026-0861, GHSA-5pf6-63v3-88hw, GHSA-xp56-6525-9chf
11+
**pyasn1**: GHSA-63vm-454h-vhhq
12+
**py3-setuptools** (Python 3.12/3.13): GHSA-58pv-8j8x-9vj2
13+
**ffmpeg (via OpenCV)**: CVE-2025-9951, CVE-2025-1594, CVE-2023-6604, CVE-2023-49502, CVE-2023-6602, CVE-2023-6605, CVE-2025-0518, CVE-2023-6601, CVE-2025-22919, CVE-2023-50010, CVE-2023-50008, CVE-2024-31582, CVE-2025-59729, CVE-2025-59730, CVE-2023-50007
814
- **Fix Pandoc exitcode 97 during ODT conversion**: Try with sandbox=True first, fallback without sandbox only if `ALLOW_PANDOC_NO_SANDBOX=true` env var is set (fixes #3997)
915
- **Fix `coordinates=True` causing TypeError in hi_res PDF processing**: Filter out `coordinates` and `coordinate_system` from kwargs before passing to `add_element_metadata()` to prevent conflict with explicit parameters (fixes #4126)
1016
- **Preserve line breaks in code blocks during chunking**: `<pre>` elements now generate `CodeSnippet` elements instead of `Text`, and chunking preserves internal whitespace for code snippets. (fixes #4095)

requirements/base.txt

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -83,15 +83,15 @@ olefile==0.47
8383
# via python-oxmsg
8484
orderly-set==5.5.0
8585
# via deepdiff
86-
packaging==25.0
86+
packaging==26.0
8787
# via
8888
# marshmallow
8989
# unstructured-client
9090
psutil==7.2.1
9191
# via -r ./base.in
92-
pycparser==2.23
92+
pycparser==3.0
9393
# via cffi
94-
pypdf==6.6.0
94+
pypdf==6.6.1
9595
# via unstructured-client
9696
python-dateutil==2.9.0.post0
9797
# via unstructured-client
@@ -103,7 +103,7 @@ python-oxmsg==0.0.2
103103
# via -r ./base.in
104104
rapidfuzz==3.14.3
105105
# via -r ./base.in
106-
regex==2025.11.3
106+
regex==2026.1.15
107107
# via nltk
108108
requests==2.32.5
109109
# via
@@ -118,7 +118,7 @@ six==1.17.0
118118
# langdetect
119119
# python-dateutil
120120
# unstructured-client
121-
soupsieve==2.8.1
121+
soupsieve==2.8.3
122122
# via beautifulsoup4
123123
tqdm==4.67.1
124124
# via

requirements/dev.txt

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,12 @@ importlib-metadata==8.7.1
1919
# via build
2020
nodeenv==1.10.0
2121
# via pre-commit
22-
packaging==25.0
22+
packaging==26.0
2323
# via
2424
# -c ./base.txt
2525
# -c ./test.txt
2626
# build
27+
# wheel
2728
pip-tools==7.5.2
2829
# via -r ./dev.in
2930
platformdirs==4.5.1
@@ -50,7 +51,7 @@ typing-extensions==4.15.0
5051
# virtualenv
5152
virtualenv==20.36.1
5253
# via pre-commit
53-
wheel==0.45.1
54+
wheel==0.46.3
5455
# via pip-tools
5556
zipp==3.23.0
5657
# via importlib-metadata
Lines changed: 9 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,16 @@
11
# This file was autogenerated by uv via the following command:
2-
# uv pip compile --python-version 3.10 --no-strip-extras extra-chunking-tokens.in -o extra-chunking-tokens.txt --no-emit-package pip --no-emit-package setuptools -c base.txt
2+
# uv pip compile --python-version 3.10 --no-strip-extras ./extra-chunking-tokens.in -o ./extra-chunking-tokens.txt --no-emit-package pip --no-emit-package setuptools
33
certifi==2026.1.4
4-
# via
5-
# -c base.txt
6-
# requests
4+
# via requests
75
charset-normalizer==3.4.4
8-
# via
9-
# -c base.txt
10-
# requests
6+
# via requests
117
idna==3.11
12-
# via
13-
# -c base.txt
14-
# requests
15-
regex==2025.11.3
16-
# via
17-
# -c base.txt
18-
# tiktoken
8+
# via requests
9+
regex==2026.1.15
10+
# via tiktoken
1911
requests==2.32.5
20-
# via
21-
# -c base.txt
22-
# tiktoken
12+
# via tiktoken
2313
tiktoken==0.12.0
24-
# via -r extra-chunking-tokens.in
14+
# via -r ./extra-chunking-tokens.in
2515
urllib3==2.6.3
26-
# via
27-
# -c base.txt
28-
# requests
16+
# via requests

requirements/extra-markdown.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# This file was autogenerated by uv via the following command:
22
# uv pip compile --python-version 3.10 --no-strip-extras ./extra-markdown.in -o ./extra-markdown.txt --no-emit-package pip --no-emit-package setuptools
3-
markdown==3.10
3+
markdown==3.10.1
44
# via -r ./extra-markdown.in

requirements/extra-paddleocr.txt

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -84,17 +84,17 @@ numpy==2.2.6
8484
# shapely
8585
# tifffile
8686
# unstructured-paddleocr
87-
opencv-contrib-python==4.12.0.88
87+
opencv-contrib-python==4.13.0.90
8888
# via unstructured-paddleocr
89-
opencv-python==4.12.0.88
89+
opencv-python==4.13.0.90
9090
# via unstructured-paddleocr
91-
opencv-python-headless==4.12.0.88
91+
opencv-python-headless==4.13.0.90
9292
# via
9393
# albucore
9494
# albumentations
9595
opt-einsum==3.3.0
9696
# via paddlepaddle
97-
packaging==25.0
97+
packaging==26.0
9898
# via
9999
# -c ./base.txt
100100
# lazy-loader
@@ -143,7 +143,7 @@ shapely==2.1.2
143143
# via unstructured-paddleocr
144144
simsimd==6.5.12
145145
# via albucore
146-
soupsieve==2.8.1
146+
soupsieve==2.8.3
147147
# via
148148
# -c ./base.txt
149149
# beautifulsoup4

requirements/extra-pdf-image.txt

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ google-auth==2.47.0
5252
# via
5353
# google-api-core
5454
# google-cloud-vision
55-
google-cloud-vision==3.11.0
55+
google-cloud-vision==3.12.0
5656
# via -r ./extra-pdf-image.in
5757
googleapis-common-protos==1.72.0
5858
# via
@@ -124,9 +124,9 @@ onnxruntime==1.23.2
124124
# via
125125
# -r ./extra-pdf-image.in
126126
# unstructured-inference
127-
opencv-python==4.12.0.88
127+
opencv-python==4.13.0.90
128128
# via unstructured-inference
129-
packaging==25.0
129+
packaging==26.0
130130
# via
131131
# -c ./base.txt
132132
# accelerate
@@ -144,7 +144,7 @@ pdfminer-six==20260107
144144
# via
145145
# -r ./extra-pdf-image.in
146146
# unstructured-inference
147-
pi-heif==1.1.1
147+
pi-heif==1.2.0
148148
# via -r ./extra-pdf-image.in
149149
pikepdf==10.2.0
150150
# via -r ./extra-pdf-image.in
@@ -174,21 +174,21 @@ psutil==7.2.1
174174
# via
175175
# -c ./base.txt
176176
# accelerate
177-
pyasn1==0.6.1
177+
pyasn1==0.6.2
178178
# via
179179
# pyasn1-modules
180180
# rsa
181181
pyasn1-modules==0.4.2
182182
# via google-auth
183183
pycocotools==2.0.11
184184
# via effdet
185-
pycparser==2.23
185+
pycparser==3.0
186186
# via
187187
# -c ./base.txt
188188
# cffi
189-
pyparsing==3.3.1
189+
pyparsing==3.3.2
190190
# via matplotlib
191-
pypdf==6.6.0
191+
pypdf==6.6.1
192192
# via
193193
# -c ./base.txt
194194
# -r ./extra-pdf-image.in
@@ -199,7 +199,7 @@ python-dateutil==2.9.0.post0
199199
# -c ./base.txt
200200
# matplotlib
201201
# pandas
202-
python-multipart==0.0.21
202+
python-multipart==0.0.22
203203
# via unstructured-inference
204204
pytz==2025.2
205205
# via pandas
@@ -214,7 +214,7 @@ rapidfuzz==3.14.3
214214
# via
215215
# -c ./base.txt
216216
# unstructured-inference
217-
regex==2025.11.3
217+
regex==2026.1.15
218218
# via
219219
# -c ./base.txt
220220
# transformers
@@ -249,14 +249,14 @@ tokenizers==0.21.4
249249
# via
250250
# -c ././deps/constraints.txt
251251
# transformers
252-
torch==2.9.1
252+
torch==2.10.0
253253
# via
254254
# accelerate
255255
# effdet
256256
# timm
257257
# torchvision
258258
# unstructured-inference
259-
torchvision==0.24.1
259+
torchvision==0.25.0
260260
# via
261261
# effdet
262262
# timm
@@ -278,7 +278,7 @@ typing-extensions==4.15.0
278278
# torch
279279
tzdata==2025.3
280280
# via pandas
281-
unstructured-inference==1.1.4
281+
unstructured-inference==1.1.7
282282
# via -r ./extra-pdf-image.in
283283
unstructured-pytesseract==0.3.15
284284
# via -r ./extra-pdf-image.in

requirements/extra-xlsx.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ openpyxl==3.1.5
2626
# via -r ./extra-xlsx.in
2727
pandas==2.3.3
2828
# via -r ./extra-xlsx.in
29-
pycparser==2.23
29+
pycparser==3.0
3030
# via
3131
# -c ./base.txt
3232
# cffi

requirements/huggingface.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ numpy==2.2.6
5151
# via
5252
# -c ./base.txt
5353
# transformers
54-
packaging==25.0
54+
packaging==26.0
5555
# via
5656
# -c ./base.txt
5757
# huggingface-hub
@@ -60,7 +60,7 @@ pyyaml==6.0.3
6060
# via
6161
# huggingface-hub
6262
# transformers
63-
regex==2025.11.3
63+
regex==2026.1.15
6464
# via
6565
# -c ./base.txt
6666
# sacremoses
@@ -86,7 +86,7 @@ tokenizers==0.21.4
8686
# via
8787
# -c ././deps/constraints.txt
8888
# transformers
89-
torch==2.9.1
89+
torch==2.10.0
9090
# via -r ./huggingface.in
9191
tqdm==4.67.1
9292
# via

requirements/test.txt

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@ annotated-types==0.7.0
44
# via pydantic
55
autoflake==2.3.1
66
# via -r ./test.in
7-
black==25.12.0
7+
black==26.1.0
88
# via -r ./test.in
99
click==8.3.1
1010
# via
1111
# -c ./base.txt
1212
# black
13-
coverage[toml]==7.13.1
13+
coverage[toml]==7.13.2
1414
# via
1515
# -r ./test.in
1616
# pytest-cov
@@ -45,7 +45,7 @@ mypy-extensions==1.1.0
4545
# -c ./base.txt
4646
# black
4747
# mypy
48-
packaging==25.0
48+
packaging==26.0
4949
# via
5050
# -c ./base.txt
5151
# black
@@ -89,9 +89,9 @@ python-dateutil==2.9.0.post0
8989
# via
9090
# -c ./base.txt
9191
# freezegun
92-
pytokens==0.3.0
92+
pytokens==0.4.0
9393
# via black
94-
ruff==0.14.11
94+
ruff==0.14.14
9595
# via -r ./test.in
9696
semantic-version==2.10.0
9797
# via liccheck

0 commit comments

Comments
 (0)