Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ jobs:
strategy:
matrix:
python-version: ["3.11", "3.12", "3.13"]
runs-on: ubuntu-latest
runs-on: opensource-linux-8core
needs: [setup, lint]
steps:
- uses: actions/checkout@v4
Expand Down Expand Up @@ -144,7 +144,7 @@ jobs:
uv-extras: "--extra pptx"
- extra: xlsx
uv-extras: "--extra xlsx"
runs-on: ubuntu-latest
runs-on: opensource-linux-8core
needs: [setup, lint, test_unit_no_extras]
steps:
- uses: actions/checkout@v4
Expand Down
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ celerybeat.pid

# Environments
.env
.venv
.venv*
env/
venv/
ENV/
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## 0.21.7

### Enhancements
- **Bump dependencies**: Update pinned dependency versions in the lockfile.

## 0.21.6

### Enhancements
Expand Down
8 changes: 4 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ dependencies = [
"tqdm>=4.67.3, <5.0.0",
"typing-extensions>=4.15.0, <5.0.0",
"unstructured-client>=0.25.9, <1.0.0",
"wrapt>=1.0.0, <2.0.0",
"wrapt>=2.1.1, <3.0.0",
"filelock>=3.12.0,<4.0.0",
]

Expand Down Expand Up @@ -122,7 +122,7 @@ huggingface = [
"sentencepiece>=0.2.0, <1.0.0",
"torch>=2.10.0, <3.0.0; platform_system != 'Windows'",
"torch>=2.10.0, <3.0.0; platform_system == 'Windows' and python_version < '3.13'",
"transformers>=4.55.4, <5.0.0",
"transformers>=5.2.0, <6.0.0",
]
local-inference = [
"unstructured[all-docs]",
Expand Down Expand Up @@ -170,7 +170,7 @@ test = [
"types-requests>=2.32.4.20260107, <3.0.0",
"types-tabulate>=0.9.0.20241207, <1.0.0",
"unstructured-pytesseract>=0.3.15, <1.0.0",
"weaviate-client>=3.26.7, <4.0.0",
"weaviate-client>=4.20.1, <5.0.0",
]
dev = [
"pre-commit>=4.5.1, <5.0.0",
Expand Down Expand Up @@ -198,7 +198,7 @@ constraint-dependencies = [
"tokenizers>=0.21",
"unstructured-client>=0.25.9",
"urllib3>=2.0.0",
"weaviate-client>=3.26.7",
"weaviate-client>=4.20.1",
]

[tool.pyright]
Expand Down
13 changes: 8 additions & 5 deletions test_unstructured/staging/test_weaviate.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@
# NOTE(robinson) - allows tests that do not require the weaviate client to
# run for the docker container
with contextlib.suppress(ModuleNotFoundError):
from weaviate import Client
from weaviate.embedded import EmbeddedOptions
import weaviate

from unstructured.partition.json import partition_json
from unstructured.staging.weaviate import (
Expand Down Expand Up @@ -58,6 +57,10 @@ def test_stage_for_weaviate():
@pytest.mark.skipif(is_in_docker, reason="Skipping this test in Docker container")
def test_weaviate_schema_is_valid():
unstructured_class = create_unstructured_weaviate_class()
schema = {"classes": [unstructured_class]}
client = Client(embedded_options=EmbeddedOptions())
client.schema.create(schema)
class_name = unstructured_class["class"]
client = weaviate.connect_to_embedded()
try:
client.collections.delete(class_name)
client.collections.create_from_dict(unstructured_class)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weaviate schema format may be incorrect for create_from_dict

Medium Severity

The v3 code passed {"classes": [unstructured_class]} to schema.create(), but the v4 migration passes unstructured_class directly to create_from_dict(). If create_from_dict expects the full schema format (with a "classes" array) for v3 compatibility, the call may fail or create the collection incorrectly.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, there's nothing to fix. The cursor bot is wrong here — it's analyzing the code change without understanding the weaviate v4 API.

To be concrete, create_from_dict in weaviate-client v4 expects a single class config dict, which is exactly what create_unstructured_weaviate_class() returns:

{"class": "UnstructuredDocument", "properties": [...]}

And we have empirical proof — the test ran against a real embedded Weaviate 1.30.5 instance and passed:

test_unstructured/staging/test_weaviate.py::test_weaviate_schema_is_valid PASSED [100%]

If the format were wrong, Weaviate would have rejected it with a 422.

finally:
client.close()
2 changes: 1 addition & 1 deletion unstructured/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.21.6" # pragma: no cover
__version__ = "0.21.7" # pragma: no cover
796 changes: 426 additions & 370 deletions uv.lock

Large diffs are not rendered by default.

Loading