refactor: deterministic literal property targets + indexed WHERE filters#837
Draft
HexaField wants to merge 13 commits into
Draft
refactor: deterministic literal property targets + indexed WHERE filters#837HexaField wants to merge 13 commits into
HexaField wants to merge 13 commits into
Conversation
…lue (V1) Part of the Channel V / Channel E separation. See ~/.sovereign/membranes/coasys/research/literal-encoding-and-sparql-pushdown-2026-06-03.md §9.1 for the audit. When resolve_language == "literal", encode the value directly via literal_encode and return a deterministic plain URI (literal:string:X / :number: / :boolean: / :json:) instead of going through LanguageController.expression_create. Provenance for property values lives on the link reifier — the signed-envelope shape is for Channel E (expression.create) callers, not for property storage. For all other resolve_language values (real language controllers — note, image, etc.) the existing expression_create flow is unchanged.
literal_decode used to wrap non-object primitives in a fake
{author: "<unknown>", timestamp: "<unknown>", data: value, proof: {}}
envelope. That conflated Channel V (link-property values; provenance
lives on the link reifier) with Channel E (signed expressions).
Now primitives decode to their raw JSON value and objects pass through
unchanged. Channel-E callers that legitimately want envelope shape go
through create_signed_expression, not literal_decode.
Unit tests updated to assert raw round-trip; the JSON-object test is
unchanged.
After the Channel V refactor, Literal.fromUrl(val).get() for property values returns the plain primitive directly. The only objects it can return now are legitimate JSON objects from literal:json: payloads, which we JSON-stringify for display. Drops the .data extraction branch that used to unwrap signed-envelope literals — those no longer exist for new property writes.
After the Channel V refactor, properties with resolveLanguage: "literal" store their values as plain literal:string:X / :number: / :boolean: / :json: URIs — no signed envelope. Tests now assert the value rather than the envelope shape. Renames the long-value test's local `expression` to `literal` for consistency with the other two cases.
…ack-compat read path retained) Restores migrate_signed_envelopes_to_plain_literals (originally added in a9e98cc, removed by 353a7ad). Walks every reifier, finds targets of the form literal:json:<envelope> whose JSON has data+author+proof, extracts .data, re-encodes as the appropriate plain literal type, and rebuilds both the direct triple and the reifier (the reifier IRI hash includes the target, so it must be recomputed). Removes the old direct triple only if no other reifiers still reference it. Migration is idempotent — short-circuits when migration_version() >= 3 — and runs during perspective initialization right after the named-graphs migration (v2 -> v3). Also annotates the back-compat envelope-unwrap branches in parse_literal_fn (sparql_store.rs) and parse_literal_value (model_query/utils.rs) with a note that they exist for pre-migration data; new writes use plain literal: forms. See §9.1 / §9.6 of literal-encoding-and-sparql-pushdown-2026-06-03.md.
Equality WHERE filters on `resolveLanguage: literal` properties now emit direct IRI matches against the deterministic encoding produced by `resolve_property_value` (Phase 1): ?source <pred> <literal:string:VAL_ENCODED> . ?source <pred> <literal:number:N> . ?source <pred> <literal:boolean:true|false> . Previously each row was checked via `STR(fn/parse_literal(?v)) = "val"` inside a FILTER, which forced Oxigraph to scan every triple bound to the predicate. Switching to a direct IRI lets the POS index probe straight to the matching row. The `fn/parse_literal` BIND + FILTER path is preserved for: - `WhereCondition::Ops` (gt/lt/gte/lte/between/contains/not) — typed comparison still needs the unwrapped value. - Properties without `resolveLanguage: literal` — back-compat for raw string storage. - Legacy envelope-form data that hasn't yet been migrated by `migrate_signed_envelopes_to_plain_literals` (Phase 2) — `parse_literal` also unwraps `.data` from signed envelopes. Where a where-value also looks like an absolute IRI (scheme + colon), a UNION fallback matches the raw-IRI form too — covers the case where a constructor's initial value was stored as a raw URI on a property whose shape declares `resolveLanguage: literal` (enum-like state defaults). Values are validated before injection: strings are `NON_ALPHANUMERIC`-percent-encoded (matching `literal_encode`), numbers must be finite, booleans must be `true|false`. Non-finite numeric filters short-circuit to `FILTER(false)`. Refs: research/literal-encoding-and-sparql-pushdown-2026-06-03.md (V4) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirror the V4 transformation in the projection where-clause builder (`build_projection_where_patterns`). Equality on a target-shape property declared `resolveLanguage: literal` now emits a direct IRI match against `<literal:string:X>` / `<literal:number:N>` / `<literal:boolean:b>` instead of `STR(fn/parse_literal(?v)) = "X"`. The pred_lookup now carries `(predicate, is_literal_prop)` so the projection layer can distinguish properties whose targets are stored as literal IRIs (use POS-index probe) from those backed by raw URIs or unknown storage (keep the fn/parse_literal fallback). String/StringArray inputs that themselves look like absolute IRIs also include the raw-IRI form via VALUES, matching V4's UNION fallback semantics. Refs: research/literal-encoding-and-sparql-pushdown-2026-06-03.md (V5) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…s (T4) Restructure the WHERE-clause regression tests around the Phase 1/2/3 storage contract: - Rename helper `signed_envelope_literal` -> `legacy_envelope_literal`. The envelope form models pre-migration data shape only; it is not the current write path. - `test_signed_envelope_where_paginate_count` -> `test_legacy_envelope_migrated_then_paginate_count`. Now seeds envelope-form data, runs `migrate_signed_envelopes_to_plain_literals`, then asserts the V4 direct-IRI WHERE probe finds the migrated rows. - `test_mixed_plain_and_signed_envelope_where` -> `test_legacy_mixed_migrated_then_contains`. Same migration-then-query pattern; mixes pre-existing plain literals with envelope rows to verify the migration is idempotent on plain data. - Add `test_plain_literal_where_paginate_count` — parallel coverage using plain literals from the start (no migration call). Asserts the V4 POS-index path returns identical results to the legacy migrated test. - Add `test_plain_literal_contains_works_on_fn_parse_literal_path` — exercises `WhereOps::contains`, which still routes through `fn/parse_literal`, against plain `literal:string:X` storage. Confirms the fallback path still produces correct substring semantics. `test_resolve_projections_where_filter_via_target_shape_property` is unchanged — it already stored plain `literal:string:like_type_id123` and continues to pass because V5 emits a direct IRI probe matching the stored form. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Contributor
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
The previous pass left a number of comments referencing internal phase labels (V1/V4/V5), a 'Channel V/E' design vocabulary, and a 2026-06-03 research doc path. Replace those with comments that describe what the code does and why a reader can't infer it from the code itself.
Compares two equivalent SPARQL queries against the same data — the indexed direct-IRI probe that the WHERE builders now emit, and the fn/parse_literal-wrapped FILTER they used to emit. Gated to release builds via cfg!(debug_assertions); scale tunable via WT_BENCH_LINKS. Indexed stays flat at ~12-40us across 1k-50k links (POS index probe). Filter scales linearly because every row materialises through the custom function.
This was referenced Jun 4, 2026
Draft
This was referenced Jun 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two coupled changes to how
resolveLanguage='literal'property values flow through the executor.Property writes become deterministic. Today, setting a literal-language property routes through
LanguageController.expression_create, which signs the value into aliteral:json:<envelope>URL containing{author, timestamp, data, proof}— the same author + timestamp the link reifier already carries. The signature varies per write, so writingname = "general"produces a different IRI every time, and exact-match SPARQL filters never hit. This PR drops the wrap:resolve_property_valueemits a plainliteral:string:X(or:number:/:boolean:/:json:for primitives) URL vialiteral_encodedirectly. The reifier remains the single source of truth for per-link provenance.WHERE filters use the index. With deterministic IRIs in storage, the WHERE builders for the model query layer can emit direct IRI matches (
?source <pred> <literal:string:X>) instead of wrapping every target in a customfn/parse_literalSPARQL function call. Native POS-index probe replaces a per-row function call.cc @lucksus @data-bot-coasys — lands on top of #803's history. Commit range
a9e98ccd..a00b595biterated this same territory; the353a7ad7revert was driven by integration tests that codified the envelope shape, which this PR restructures.What changed
Writes (executor)
perspective_instance.rs::resolve_property_value— whenresolve_language == "literal", encodes the value directly vialiteral_encodeand returns the plain URL. OtherresolveLanguagevalues (real language controllers —note,image, etc.) still go throughexpression_create; those produce addressable signed expressions by design.languages/literal.rs::literal_decode— no longer wraps primitive return values in a synthetic{author: "<unknown>", timestamp: "<unknown>", data: …, proof: {}}envelope. Returns the raw value. Genuine signed expressions (JSON objects) pass through unchanged.core/src/perspectives/SparqlBindings.ts::parseLit— drops the.dataextraction branch. With property values now stored as plain literals, theLiteral.fromUrl(val).get()result is already the value, and the only objects it returns are real JSON payloads (which we JSON-stringify for display).Migration
migrate_signed_envelopes_to_plain_literals(originally landed ina9e98ccd, removed by353a7ad7). On startup it walks every reifier, finds targets shaped asliteral:json:<envelope>whose JSON hasdata+author+proof, extracts the inner value, and rewrites both the direct triple and the reifier (the reifier IRI hash includes the target, so it must be recomputed). Idempotent — short-circuits atmigration_version >= 3. Wired intoinitialize_from_dbafter the named-graphs migration.parse_literal_fnandparse_literal_valueso SPARQL queries continue to handle pre-migration data + the small set of expressions that are themselves stored as signed envelopes (e.g. entanglement proofs in the dapp).Indexed WHERE filters
model_query/sparql_builder.rs—is_literal_propequality WHERE filters emit?source <pred> <literal:string:X_encoded>directly. String / number / boolean /StringArray/NumberArraycovered.Ops(gt/lt/between/contains/not) keepsfn/parse_literalsince it needs typed comparison.state = 'todo://ready'on aresolveLanguage='literal'property — the #803d3da07d7case):{ ?source <pred> <literal:string:encoded> } UNION { ?source <pred> <raw-iri> }. Both branches are POS-index probes; the planner picks one.model_query/projection.rs::build_projection_where_patterns— same transformation for the projection WHERE-clause builder.pred_lookupwidened to carryis_literal_prop.Tests
tests/js/tests/prolog-and-literals.test.ts— three round-trip assertions flipped fromexpect(literal.data).to.equal(X)toexpect(literal).to.equal(X)to match the new (un-wrapped)Literal.fromUrl().get()return.model_query/integration_tests.rs— the helper that built envelope-shaped targets and the tests that asserted query behaviour against them are renamedlegacy_envelope_*and now callmigrate_signed_envelopes_to_plain_literalsbefore querying (genuine migration tests, not "queries work on envelope storage" tests). Parallelplain_literal_*tests cover the post-migration form.What did NOT change
expression.create(value, "literal")still produces signed envelopes wrapped asliteral:json:<envelope>.ExpressionClient.get(url)'s envelope fast-path is preserved.create_signed_expressionfor link metadata) is untouched.Performance
Microbenchmark (
bench_indexed_iri_vs_fn_parse_literal_filterinmodel_query/integration_tests.rs, gated to release builds, scale viaWT_BENCH_LINKS) running the two equivalent SPARQL forms against the sameliteral:string:data on this branch:fn/parse_literalFILTERIndexed stays flat (POS-index probe — sub-linear in the result count, ~unchanged by the dataset size). FILTER scales linearly because every row materialises through the custom function. Apple Silicon, 5 runs each, warm cache; reproduce with
cargo test --release --lib bench_indexed_iri_vs_fn_parse_literal_filter -- --nocapture.Wind tunnel S8 comparison vs
origin/devwas attempted but the cachedCUSTOM_DENO_SNAPSHOT.binproduces a V8 magic-number mismatch against fresh executor builds (Check failed: magic_number_ == SerializedData::kMagicNumber), causing the executor to crash on language-runtime init under the wind tunnel'scargo build --releasepath regardless of branch. That's a tunnel-infrastructure issue unrelated to this PR; the microbench above captures the relevant per-query delta directly.Test plan
cargo check --testscleancargo test --lib languages::literal— 4 passcargo test --lib perspectives::model_query::utils— 15 passcargo test --lib perspectives::sparql_store— 78 passcargo test --lib perspectives::model_query— 146 pass (incl. the renamedlegacy_envelope_migrated_then_*+ newplain_literal_*tests)pnpm exec tsc --noEmit(core) — cleanpnpm exec jest src/Literal.test.ts— 5 passdevvs this branchtests/js) — full suiteMigration safety
migration_version >= 3short-circuits.parse_literal_fnandparse_literal_value..data+.author+.proof, and the new encoded form differs from the old. Won't accidentally rewrite validliteral:json:payloads that aren't envelopes.Follow-ups
"value"^^xsd:stringinstead of<literal:string:value>) and removingfn/parse_literalregistration entirely — tackled in #842, stacked on this PR.ac57680b9was confirmed empirically at ~23,000× regression). The most plausible unblock is storage-level partitioning from named graphs (#812); re-attempt once that lands and bench whether per-graph scoping flattens the planner working set. Note: SPARQL itself does not have window functions — neither 1.1 nor the W3C 1.2 draft introduces them — so the wait is on Oxigraph planner / dataset semantics, not on a spec change..get().dataand write sites that produced envelopes viaclient.expression.create(value, "literal")— handled in coasys/flux#604.