feat: tokenize endpoints + Property.TextAnalyzer + StopwordPresets (Weaviate 1.37.0+)#381
Merged
Conversation
Port of python-client PR #2012, aligned with the TS client's `tokenize`
namespace design. Adds:
- `client.Tokenize().Text()...Do(ctx)` → POST /v1/tokenize
- `client.Tokenize().Property()...Do(ctx)` →
POST /v1/schema/{class}/properties/{prop}/tokenize
Builder chains follow existing repo conventions (WithText, WithTokenization,
WithAnalyzerConfig, WithStopwordPresets, etc.). `AnalyzerConfig.AsciiFold` is
a nested struct pointer (nil = disabled, non-nil = enabled with optional
Ignore list) so the invalid "ignore without fold" state is unrepresentable.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
There was a problem hiding this comment.
Orca Security Scan Summary
| Status | Check | Issues by priority | |
|---|---|---|---|
| Infrastructure as Code | View in Orca | ||
| SAST | View in Orca | ||
| Secrets | View in Orca | ||
| Vulnerabilities | View in Orca |
… (Weaviate 1.37.0+) Port of python-client PR #2006, folded into the tokenize-endpoint PR to keep the 1.37.0 schema features together. - `models.Property.TextAnalyzer` (vendored weaviate v1.37.1) — asciiFold, asciiFoldIgnore, stopwordPreset — round-trips via ClassCreator/ClassGetter. - `models.InvertedIndexConfig.StopwordPresets` — named preset → word-list map — round-trips via ClassCreator/ClassUpdater. - Client-side preflight: ClassCreator.Do / ClassUpdater.Do reject schemas that use these fields when connected to Weaviate < 1.37.0, with a typed WeaviateClientError that names the offending field. Lives in weaviate/internal/ so it's reusable but not part of the public API. - schema.New now takes *db.VersionProvider (was no extra arg) so the preflight can read the connected server version. - go.mod: bump github.com/weaviate/weaviate 1.36.0 → 1.37.1 for the new model fields. - Integration coverage under test/schema/text_analyzer_test.go: stopword presets round-trip, update-replaces-preset, remove-in-use rejection, combined asciifold + stopword preset, asciifold ignore round-trip, and nested property text analyzer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Weaviate v1.37.x declares `go 1.26` in go.mod, which propagates to this client after `go mod tidy`. The CI workflow pinned Go 1.25, causing `go: go.mod requires go >= 1.26 (running go 1.25.9; GOTOOLCHAIN=local)` in the unit-tests, tests-deprecated, and auth-integration jobs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Match the minimum server version the go.mod dep already pins to, and pick up bug fixes shipped since the rc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- text_analyzer_test.go: set Vectorizer: "none" on all classes. The docker-compose default is text2vec-contextionary, which rejects class names like "TestStopwordPresets1" because "stopword" isn't in its dictionary. - docker-compose.yml: opt in to ENABLE_EXPERIMENTAL_ALTER_SCHEMA_DROP_VECTOR_INDEX_ENDPOINT so the existing TestSchema_integration/DELETE_/schema/.../vectors/.../index test keeps working after the server tag bump to 1.37.1 (the endpoint became flag-gated in 1.37.1). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
docker-compose-wcs.yml (used by auth_enabled integration runs on :8085) needs the same ENABLE_EXPERIMENTAL_ALTER_SCHEMA_DROP_VECTOR_INDEX_ENDPOINT flag as docker-compose.yml; the schema delete-vector-index test runs against whichever compose the current matrix selects. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
bevzzz
approved these changes
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related Weaviate 1.37.0 ports, folded into this one PR to keep the 1.37
schema surface together:
1. Tokenize endpoints — port of python-client PR #2012.
Exposes
POST /v1/tokenizeandPOST /v1/schema/{class}/properties/{prop}/tokenize. Public surface mirrors the TS client'stokenizenamespace design (weaviate/typescript-client1.37/introduce-tokenization-namespace), adapted to Go's fluent builder idiom:client.Tokenize().Text().WithText(...).WithTokenization(...).Do(ctx)client.Tokenize().Property().WithClassName(...).WithPropertyName(...).WithText(...).Do(ctx)New package
weaviate/tokenize/with 9Tokenizationconstants (Word, Lowercase, Whitespace, Field, Trigram, Gse, GseCh, KagomeJa, KagomeKr),AnalyzerConfig,StopwordConfig,TokenizeResult.AnalyzerConfig.AsciiFoldis a*AsciiFoldConfig(nil = disabled, non-nil = enabled with optionalIgnorelist) so the invalid "ignore without fold" state is unrepresentable.2. Property.TextAnalyzer + InvertedIndexConfig.StopwordPresets — port of python-client PR #2006.
The vendored
weaviatemodule is bumped 1.36.0 → 1.37.1, which lights up two new model fields that round-trip through the existingClassCreator/ClassUpdater/ClassGetterbuilders with no new API:models.Property.TextAnalyzer—asciiFold,asciiFoldIgnore,stopwordPresetmodels.InvertedIndexConfig.StopwordPresets— named preset → word-list mapClient-side preflight in
weaviate/internal/textAnalyzerCheck.gorejects schemas that use these fields when connected to Weaviate < 1.37.0, with a typedWeaviateClientErrorthat names the offending field.schema.Newnow takes a*db.VersionProviderso the preflight can read the connected server version.Out of scope
Test plan
go vet -mod=mod ./...cleango build -mod=mod ./...cleango test -mod=mod -count=1 ./test/tokenize/...→ passing against Weaviate 1.37.1go test -mod=mod -count=1 ./test/schema/... -run TestTextAnalyzer_integration→ 6/6 passing against Weaviate 1.37.1🤖 Generated with Claude Code