Skip to content

Commit d9d6362

Browse files
committed
Fixes
1 parent d7d5bb3 commit d9d6362

7 files changed

Lines changed: 98 additions & 85 deletions

File tree

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ jobs:
5454
sudo apt-get update && sudo apt-get install --yes poppler-utils libreoffice
5555
uv sync --group test --locked
5656
make install-pandoc
57-
make install-nltk-models
57+
make install-nlp-models
5858
sudo add-apt-repository -y ppa:alex-p/tesseract-ocr5
5959
sudo apt-get install -y tesseract-ocr tesseract-ocr-kor
6060
tesseract --version

CHANGELOG.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
## 0.1.2
2-
* Bump all packages (refresh uv.lock)
2+
* Bump all packages (refresh uv.lock), pulling `unstructured==0.22.12` which replaces NLTK with spaCy
3+
* Replace `download_nltk_packages` calls with spaCy model pre-download in Makefile, Dockerfile, and CI
34
* Switch `uv sync --frozen` to `uv sync --locked` across Dockerfile, Makefile, and CI workflows
45

56
## 0.1.1

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ RUN ARCH=$(uname -m) && \
7171
cp /tmp/pandoc-${PANDOC_VERSION}/bin/pandoc /home/${USER}/.local/bin/ && \
7272
rm -rf /tmp/pandoc*
7373

74-
RUN ${PYTHON} -c "from unstructured.nlp.tokenize import download_nltk_packages; download_nltk_packages()" && \
74+
RUN ${PYTHON} -c "from unstructured.nlp.tokenize import _load_spacy_model; _load_spacy_model()" && \
7575
${PYTHON} -c "from unstructured.partition.model_init import initialize; initialize()" && \
7676
${PYTHON} -c "from unstructured_inference.models.tables import UnstructuredTableTransformerModel; model = UnstructuredTableTransformerModel(); model.initialize('microsoft/table-transformer-structure-recognition')"
7777

Makefile

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ help: Makefile
1313

1414
## install-base: installs minimum requirements to run the API
1515
.PHONY: install-base
16-
install-base: install-base-packages install-nltk-models
16+
install-base: install-base-packages install-nlp-models
1717

1818
## install: installs all test and dev requirements
1919
.PHONY: install
@@ -27,9 +27,9 @@ install-base-packages:
2727
install-test:
2828
uv sync --group test --locked
2929

30-
.PHONY: install-nltk-models
31-
install-nltk-models:
32-
uv run python -c "from unstructured.nlp.tokenize import download_nltk_packages; download_nltk_packages()"
30+
.PHONY: install-nlp-models
31+
install-nlp-models:
32+
uv run python -c "from unstructured.nlp.tokenize import _load_spacy_model; _load_spacy_model()"
3333

3434
## lock: regenerates uv.lock
3535
.PHONY: lock

prepline_general/api/general.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -145,7 +145,7 @@ def partition_file_via_api(
145145
if not request_url:
146146
raise HTTPException(status_code=500, detail="Parallel mode enabled but no url set!")
147147

148-
api_key = request.headers.get("unstructured-api-key", default="")
148+
api_key = request.headers.get("unstructured-api-key", "")
149149
partition_kwargs["starting_page_number"] = (
150150
partition_kwargs.get("starting_page_number", 1) + page_offset
151151
)

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ requires-python = ">=3.12"
66
dependencies = [
77
"unstructured[all-docs] >=0.18.31, <1.0.0",
88
"fastapi >=0.128.4, <1.0.0",
9+
"python-multipart >=0.0.18",
910
"uvicorn >=0.40.0, <1.0.0",
1011
"backoff >=2.2.1, <3.0.0",
1112
"pandas >=3.0.0, <4.0.0",

uv.lock

Lines changed: 88 additions & 77 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)