Skip to content

Add public Cypher query REST API + close #549#817

Open
alistair3149 wants to merge 27 commits into
masterfrom
feat/public-query-api
Open

Add public Cypher query REST API + close #549#817
alistair3149 wants to merge 27 commits into
masterfrom
feat/public-query-api

Conversation

@alistair3149
Copy link
Copy Markdown
Member

@alistair3149 alistair3149 commented May 11, 2026

Fixes #549

Summary

  • Adds POST /neowiki/v0/query/cypher — public read-only Cypher endpoint with stable JSON envelope (columns, rows, truncated, resultCount, durationMs) and stable errorType strings keyed to documented HTTP statuses.
  • Funnels all three Cypher surfaces (REST, nw.query Lua, {{#cypher_raw}} parser fn) through a shared QueryService so validation, transaction timeout, result-row cap, and exception translation live in one place.
  • Closes Add protection against expensive Cypher queries #549 (no transaction timeout, no LIMIT enforcement, no rate limiting on user-supplied Cypher) by applying the safety caps in QueryService and rate-limiting at the REST surface only (parser/Lua skip pingLimiter because their output is parser-cached).

Per-surface user controls:

  • neowiki-query user right (default *) gates whether anyone can call the endpoint at all. Sysadmin lockdown: $wgGroupPermissions['*']['neowiki-query'] = false;
  • Core's apihighlimits (default bot/sysop) lifts limits to the expensive tier (300s / 50,000 rows vs default 30s / 5,000 rows).
  • $wgRateLimits['neowiki-query'] ships defaults: anon 10/min, user 60/min, bot 1000/min. Sysop/bureaucrat exempt via core noratelimit.
  • $wgNeoWikiQueryLimits overrides per-query timeout and row cap per tier.

Architecture follows ADR 19 (no NeoWiki-specific query abstraction language; per-backend honesty). The future SPARQL endpoint will sit alongside as /neowiki/v0/query/sparql with its own service.

Files

  • New: src/Application/Query/{QueryService,QueryRequest,QueryResult,QueryLimits,QueryResultNormalizer}.php + 7 typed exception classes
  • New: src/EntryPoints/REST/QueryCypherApi.php
  • New: docs/QueryApi.md (discovery flow, contract, Neo4j-side deployment recommendations)
  • Refactored: CypherQueryRunner (Lua) and CypherRawParserFunction (parser fn) into thin adapters over QueryService
  • Modified: extension.json (right, RateLimits, config, route), Neo4jQueryStore::runReadQuery() (timeout)
  • Tests: 80+ new test methods across unit, integration, and end-to-end-against-real-Neo4j layers

Markup change

{{#cypher_raw}} output wrapping is now <div class="mw-neowiki-cypher-result"><pre>...</pre></div> (was <pre><code class="json">...</code></pre>). Anyone who styled the old markup will need to update selectors.

Test plan

  • make phpunit filtered by each new test class (QueryServiceTest, QueryCypherApiTest, QueryLimitsTest, QueryResultNormalizerTest, QueryExceptionTest, Neo4jQueryStoreTimeoutTest, QueryCypherEndToEndTest) and the refactored ones (CypherQueryRunnerTest, CypherRawParserFunctionTest) — all green
  • make cs clean
  • OpenAPI spec drift test (ModuleSpecHandlerNeoWikiTest) still green; new route appears at /rest.php/specs/v0/module/-

Manual Browser Check

The parser-function markup change is visible to anyone using {{#cypher_raw}} on a wiki page.

  1. Visit a page that uses {{#cypher_raw}} (the demo wiki has examples; if unsure, edit a sandbox page and add {{#cypher_raw: MATCH (p:Page) RETURN p.name LIMIT 5}}).
  2. View source in browser devtools and confirm the rendered output is wrapped in <div class="mw-neowiki-cypher-result"><pre>...</pre></div>.
  3. Confirm the JSON inside the <pre> is pretty-printed and HTML-escaped (no <script> execution from data values).

Manual API Check

The REST endpoint can be exercised directly via curl. Replace localhost:8484 with your dev wiki URL.

# Happy path — count pages
curl -sS -X POST http://localhost:8484/rest.php/neowiki/v0/query/cypher \
  -H 'Content-Type: application/json' \
  -d '{"cypher": "MATCH (n:Page) RETURN count(n) AS pages"}'
# Expect: 200 with {"columns":["pages"],"rows":[{"pages":N}],...}

# Empty query — error contract
curl -sS -w '\nHTTP %{http_code}\n' -X POST http://localhost:8484/rest.php/neowiki/v0/query/cypher \
  -H 'Content-Type: application/json' \
  -d '{"cypher": "   "}'
# Expect: 400 with {"errorType":"emptyQuery","message":"Query is empty."}

# Write query — rejected
curl -sS -w '\nHTTP %{http_code}\n' -X POST http://localhost:8484/rest.php/neowiki/v0/query/cypher \
  -H 'Content-Type: application/json' \
  -d '{"cypher": "CREATE (n:Test) RETURN n"}'
# Expect: 422 with {"errorType":"writeQueryRejected",...}

# Cypher syntax error — clean message, correct status
curl -sS -w '\nHTTP %{http_code}\n' -X POST http://localhost:8484/rest.php/neowiki/v0/query/cypher \
  -H 'Content-Type: application/json' \
  -d '{"cypher": "MATCH (n RETURN n"}'
# Expect: 400 with {"errorType":"cypherSyntaxError","message":"Invalid input 'RETURN'..."}

# Parameters work
curl -sS -X POST http://localhost:8484/rest.php/neowiki/v0/query/cypher \
  -H 'Content-Type: application/json' \
  -d '{"cypher": "RETURN $name AS hi", "parameters": {"name": "world"}}'
# Expect: 200 with {"rows":[{"hi":"world"}],...}

# OpenAPI spec exposes the route (requires $wgRestAPIAdditionalRouteFiles[] = 'includes/Rest/specs.v0.json';)
curl -sS http://localhost:8484/rest.php/specs/v0/module/- | python3 -c 'import json,sys; print(json.load(sys.stdin)["paths"]["/neowiki/v0/query/cypher"])'
# Expect: a JSON object describing the POST endpoint

Known gaps (deliberate)

  • QueryCypherApiTest::testRateLimitedReturns429 is markTestIncompletepingLimiter is hard to trip in integration tests without lots of session setup; behavior is exercised by manual testing.
  • Truncation happens after full normalization rather than slicing the CypherList first. Bounded by maxRows so the worst case is bounded; flagged in code review as a perf optimization to revisit if real workloads show a problem.

@alistair3149 alistair3149 marked this pull request as ready for review May 12, 2026 18:23
Comment thread extension.json

"GroupPermissions": {
"*": {
"neowiki-query": true
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked a separate user rights for easier spam protections for wiki admins

Comment thread extension.json
"description": "Whether development UIs should be enabled",
"value": false
},
"NeoWikiQueryLimits": {
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make the query limit more granular with timeout and max rows.

@JeroenDeDauw
Copy link
Copy Markdown
Member

Has some organization/architecture issues. I'm investigating

@alistair3149 alistair3149 force-pushed the feat/public-query-api branch from c44764b to 382c35a Compare May 14, 2026 16:27
@alistair3149 alistair3149 marked this pull request as draft May 14, 2026 16:48
@alistair3149 alistair3149 marked this pull request as ready for review May 14, 2026 17:04
@alistair3149 alistair3149 requested a review from JeroenDeDauw May 14, 2026 17:04
alistair3149 and others added 21 commits May 18, 2026 16:13
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ultNormalizer

Move the converter up to the Application layer so all query surfaces (Lua,
parser function, future REST handler) can share it. Behavior is preserved;
only namespace, class name, and file location change.

Add a refining @param annotation to convertAssociative() to satisfy PHPStan
at the new path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Match the file's existing import style for Laudis types instead of an
inline FQN.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…arkup

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove now-unused RuntimeException import from CypherRawParserFunction.
- Restructure QueryService validator handling: catch any Throwable from
  the validator call, then check the bool result outside the try. Avoids
  the rethrow-in-catch pattern.
- Add testValidatorExceptionBecomesBackendUnavailable in QueryServiceTest
  so the validator-throws contract has a focused unit test (was only
  exercised via CypherRawParserFunctionTest::testValidatorExceptionReturnsError).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… config

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… description

Strip alignment padding from the new RateLimits and NeoWikiQueryLimits
config blocks to match the rest of the file's single-space convention.
Reword the NeoWikiQueryLimits description so the per-tier trigger
condition is stated correctly (default = users WITHOUT apihighlimits).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Core MediaWiki grants the 'bot' group the noratelimit right by default,
so a per-bot RateLimits entry is unnecessary. The [0, 0] value also
triggers WRStatsError in the rate-limiter implementation, which had
required a setUp() override in QueryCypherApiTest. Dropping the entry
fixes both: bots remain unlimited via noratelimit, and the test setup
no longer needs the workaround.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Switch from RequestContext::getMain()->getUser() to $this->getAuthority()
  for the permission check, matching the pattern of other REST handlers in
  this extension. Resolve a User from the authority via UserFactory for
  pingLimiter and QueryLimits::forUser, which both require a full User.
- Drop the redundant `?? []` for $body['parameters'] now that PARAM_DEFAULT
  on the body field is honored by getValidatedBody().
- Add a comment on the match default arm to make its defensive intent
  explicit.
- Add testParameterMissingExceptionMapsTo400ParameterMissing for the one
  error mapping that lacked coverage.
- Update testReturns403WhenUserLacksNeowikiQueryRight to pass an explicit
  mockAnonAuthorityWithPermissions([]) via the new authority parameter on
  executeRequest, since getAuthority() now drives the gate.
- Re-sort the QueryCypherApi import in NeoWikiExtension.php into
  alphabetical order with its REST siblings.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Task 9 review caught two gaps in the new Query API doc:
- The cell normalization table omitted UnboundRelationship, which the
  normalizer handles for undirected pattern matches.
- The "Successful response" example showed Alice/Ben year values that
  contradicted the matching rows in the Step 4 walkthrough response.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two cases hit the full HTTP path through QueryCypherApi against the
test container's Neo4j instance: a happy-path count query (asserting
the envelope shape and durationMs presence) and a write-query rejection
(asserting the 422 / writeQueryRejected error contract).

Skips gracefully via NeoWikiIntegrationTestCase::setUpNeo4j() if the
backend is unreachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final pre-merge review caught three real findings:

- The earlier removal of the bot RateLimits entry was based on a wrong
  premise. Core MediaWiki grants noratelimit to sysop and bureaucrat,
  NOT to bot. The bot group only has apihighlimits. Without an explicit
  RateLimits entry, bots fall through to the user bucket (60/min), which
  defeats the purpose of having a bot account. Restore the entry with a
  generous-but-bounded value (1000/min) that doesn't trigger WRStats the
  way [0,0] did. Update the docs to reflect actual behavior.
- CypherRawParserFunction was using the wfMessage() global. The Parser
  context is in scope, so use $parser->msg() per the project preference.
- QueryCypherApi::run() called QueryLimits::forUser() inside the
  QueryException catch block, but forUser() can throw a ConfigException
  (non-QueryException) if NeoWikiQueryLimits is missing. Wrap that call
  separately and map to a 500 internalError so the public error contract
  is preserved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- QueryCypherApi::mapException() now uses $e->errorType() directly for
  the error-type string, leaving the match to map exception class to
  HTTP status only. Eliminates the two-sources-of-truth maintenance
  trap noted in the final review without leaking HTTP semantics into
  the domain-layer exception hierarchy.
- Remove the five i18n keys that became dead code when CypherQueryRunner
  and CypherRawParserFunction stopped formatting their own error
  messages and started surfacing QueryService exception messages
  directly. The neowiki-cypher-raw-error-json-encode key remains since
  the parser function still uses it for the JSON-encode failure path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ption

Smoke test against a live wiki revealed that a Cypher syntax error
returned 503 backendUnavailable instead of 400 cypherSyntaxError. The
ExplainCypherQueryValidator runs an EXPLAIN against Neo4j to detect
non-keyword write operations; when the user submits invalid Cypher,
EXPLAIN itself throws a Neo4jException with the SyntaxError code. The
previous validator try/catch swept all Neo4jExceptions into
BackendUnavailableException, which both (a) hid the real failure mode
from clients and (b) returned 503 for what should be a 400 client error.

Catch Neo4jException from the validator separately and route it through
the same translateNeo4jException() the engine path uses, so syntax,
parameter-missing, and timeout errors surfaced by the validator
translate to the same typed exceptions and HTTP statuses they would on
the engine path. Non-Neo4j Throwables from the validator (real
connection failures) still become BackendUnavailableException.

Add a unit test that exercises this routing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
alistair3149 and others added 6 commits May 18, 2026 16:15
Laudis's Neo4jException constructor wraps the actual server message in
a verbose template ('Neo4j errors detected. First one with code "..."
and message "..."'). Switching to getNeo4jMessage() strips that wrapper
and surfaces just the Neo4j-supplied text the user actually wants to
see — e.g. "Invalid input 'RETURN': expected a graph pattern ..." or
"Expected parameter(s): missing". Falls back to the original
getMessage() when getNeo4jMessage() is null.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ApiTest

Match the rest of the file's import style; the inline FQNs were a
holdover from the spec template that listed them as "either is fine".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inline fully-qualified names are not acceptable in this project. Add a
use Throwable import at the top of the file and drop the leading
backslash on the catch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ry\Cypher

Move QueryService, QueryRequest, QueryResult, and QueryResultNormalizer
into a new Application\Query\Cypher namespace and drop their language
prefix from the class names. Rename the factory accordingly:
NeoWikiExtension::newQueryService() becomes newCypherQueryService().

Per ADR 19 each backend is its own plugin with its own native query
language. Putting Cypher-specific types in their own subnamespace makes
that grouping explicit and gives a clean parallel slot for the future
Application\Query\Sparql sibling.

Things that stay generic (no namespace move):
- QueryLimits — timeout and row cap apply to any backend.
- The QueryException hierarchy — most subclasses (EmptyQueryException,
  WriteQueryRejectedException, ParameterMissingException,
  QueryTimeoutException, BackendUnavailableException, InternalQueryException)
  are language-agnostic. CypherSyntaxException stays in the shared
  namespace because it carries its language in the class name.

git mv preserves history on the six relocated files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit moved QueryRequest into Application\Query\Cypher
but left the unqualified QueryLimits type hint in the constructor,
causing PHP to resolve it as Cypher\QueryLimits — which doesn't exist
(QueryLimits stayed at the parent namespace as a shared value object).
CI caught this as a TypeError on every test that constructed a
QueryRequest, plus PHPStan errors on QueryService accessing
$request->limits->timeoutSeconds and ->maxRows.

Add the missing use import.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final-pre-merge review caught a real bug: deriving columns via
array_keys($rowsList[0]) means an empty result reports an empty
columns array, even when the query has named RETURN aliases. The
spec contract promises "RETURN aliases in declaration order", which
should still be true for zero-row results.

SummarizedResult exposes keys() returning the protocol-level alias
list — present even when there are zero rows. Use that.

Update the test stub helper to accept an explicit keys list (defaults
to deriving from the first row, preserving existing test cases) and
add a focused test that asserts columns survive an empty result.
Rename testReturnsRowsAsListWithColumnsFromFirstRow to
testReturnsRowsAsListWithColumnsFromProtocolKeys to match the actual
behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alistair3149 alistair3149 force-pushed the feat/public-query-api branch from 68f5b63 to 666d34f Compare May 18, 2026 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add protection against expensive Cypher queries

2 participants