Skip to content

feat(queryRunner): raise maxResultSize upper bound to 10000#27468

Merged
pmbrull merged 3 commits intomainfrom
feat/query-runner-max-result-size-cap
Apr 17, 2026
Merged

feat(queryRunner): raise maxResultSize upper bound to 10000#27468
pmbrull merged 3 commits intomainfrom
feat/query-runner-max-result-size-cap

Conversation

@pmbrull
Copy link
Copy Markdown
Collaborator

@pmbrull pmbrull commented Apr 17, 2026

fixes https://github.com/open-metadata/ai-platform/issues/555

Summary

  • Raises QueryRunnerRequest.maxResultSize schema maximum from 1000 to 10000.
  • Default (100 on the companion Collate config) and minimum (1) unchanged.
  • No code changes — enforcement in the Python workflow (query_runner_utils.validate_and_enforce_query_limit) and backend injection from QueryRunnerConfig.querySettings.maxResultSize already respect whatever integer the schema allows.

Motivation

The previous 1000 cap forced admins to pick a conservative row ceiling for Query Runner / SQL Studio executions even when their ingestion runner could comfortably return larger result sets. Widening the maximum to 10000 gives operators a meaningful, conservative bump while keeping payload sizes predictable through the batch workflow transport (Argo response → REST body → UI/LLM).

Why this schema matters (and why Collate needs a matching PR)

The Query Runner feature has its ceiling enforced at two schema boundaries:

Schema Repo Role
QueryRunnerConfig.querySettings.maxResultSize Collate (entity) Persistent admin config — stored in the DB, set via the config form, has a default
QueryRunnerRequest.maxResultSize (this PR) OSS (automations) Ephemeral workflow trigger payload — built per-execution, carries the value into the Python runner, discarded after

The field appears in both because the Python runner doesn't read the Collate DB — the value has to be packed into the OSS-owned automations request envelope for the generic runner to consume it. Bounds in both are defense-in-depth: widening only one doesn't help — injection would still be capped by the other.

So this OSS PR is the transport-side half. The Collate-side PR raises the admin-config ceiling and updates the help docs: https://github.com/open-metadata/openmetadata-collate/pull/3696

Both need to merge for the higher ceiling to be usable end-to-end.

Test plan

  • mvn -pl openmetadata-spec clean install regenerates the Java model with the new bound.
  • make generate regenerates the Pydantic model and QueryRunnerRequest(maxResultSize=10000) validates; QueryRunnerRequest(maxResultSize=10001) fails with a validation error.
  • Admin UI form (on Collate after the companion merge) accepts maxResultSize values up to 10000 without client-side rejection.
  • Query Runner end-to-end: configure a service with maxResultSize = 5000, execute a query without LIMIT, verify the Python workflow injects LIMIT 5000 and returns successfully.

🤖 Generated with Claude Code

The previous 1000 cap forced admins to pick a conservative row limit for
Query Runner executions even when their ingestion runner could comfortably
return larger result sets. Raise the schema maximum to 100000 so operators
have headroom to expose more rows to SQL Studio and downstream consumers.

Enforcement and the default (100) are unchanged; the backend still injects
the value from QueryRunnerConfig.querySettings.maxResultSize on every
trigger, so this only widens the ceiling admins can opt into.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 17, 2026 10:13
@github-actions github-actions Bot added Ingestion safe to test Add this label to run secure Github workflows on PRs labels Apr 17, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Raises the allowed upper bound for QueryRunnerRequest.maxResultSize so admins can configure larger query result limits for Query Runner / SQL Studio executions without requiring another schema change.

Changes:

  • Increased QueryRunnerRequest.maxResultSize JSON schema maximum from 1000 to 100000.

Reviewer feedback: 50k is a better trade-off between giving admins
headroom and keeping the batch workflow response from returning unwieldy
payloads. Enforcement and defaults unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pmbrull pmbrull changed the title feat(queryRunner): raise maxResultSize upper bound to 100000 feat(queryRunner): raise maxResultSize upper bound to 50000 Apr 17, 2026
Reviewer feedback: 10k is a more conservative ceiling that still gives
admins meaningful headroom over the previous 1k cap while keeping payload
sizes predictable through the batch workflow transport.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 17, 2026 10:39
@pmbrull pmbrull changed the title feat(queryRunner): raise maxResultSize upper bound to 50000 feat(queryRunner): raise maxResultSize upper bound to 10000 Apr 17, 2026
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 17, 2026

Code Review ✅ Approved

Increases the maxResultSize upper bound to 10000 for the query runner. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.

@github-actions
Copy link
Copy Markdown
Contributor

🔴 Playwright Results — 2 failure(s), 16 flaky

✅ 2985 passed · ❌ 2 failed · 🟡 16 flaky · ⏭️ 110 skipped

Shard Passed Failed Flaky Skipped
🔴 Shard 1 451 2 5 26
🟡 Shard 2 649 0 2 7
🟡 Shard 4 629 0 5 27
✅ Shard 5 611 0 0 42
🟡 Shard 6 645 0 4 8

Genuine Failures (failed on all attempts)

Features/DataAssetRulesDisabled.spec.ts › Verify the ApiEndpoint entity item action after rules disabled (shard 1)
Error: �[2mexpect(�[22m�[31mlocator�[39m�[2m).�[22mtoContainText�[2m(�[22m�[32mexpected�[39m�[2m)�[22m failed

Locator: getByTestId('domain-link')
Expected substring: �[32m"PW Domain �[7mb06a83b8�[27m"�[39m
Received string:    �[31m"PW Domain �[7m775f9f62�[27m"�[39m
Timeout: 15000ms

Call log:
�[2m  - Expect "toContainText" with timeout 15000ms�[22m
�[2m  - waiting for getByTestId('domain-link')�[22m
�[2m    19 × locator resolved to <a data-testid="domain-link" href="/domain/%22PW%25domain.775f9f62%22" class="no-underline domain-link domain-link-text font-medium text-sm render-domain-lebel-style">PW Domain 775f9f62</a>�[22m
�[2m       - unexpected value "PW Domain 775f9f62"�[22m

Pages/SearchIndexApplication.spec.ts › Search Index Application (shard 1)
Error: �[2mexpect(�[22m�[31mreceived�[39m�[2m).�[22mtoEqual�[2m(�[22m�[32mexpected�[39m�[2m) // deep equality�[22m

Expected: �[32mStringMatching /success|activeError/g�[39m
Received: �[31m"failed"�[39m
🟡 16 flaky test(s) (passed on retry)
  • Features/DataAssetRulesDisabled.spec.ts › Verify the Database entity item action after rules disabled (shard 1, 2 retries)
  • Features/CustomizeDetailPage.spec.ts › API Collection - customization should work (shard 1, 1 retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Pages/Customproperties-part1.spec.ts › Number (shard 1, 1 retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/BulkEditEntity.spec.ts › Glossary (shard 2, 1 retry)
  • Features/ChangeSummaryBadge.spec.ts › Automated badge should appear on entity description with Automated source (shard 2, 1 retry)
  • Pages/Customproperties-part2.spec.ts › entityReferenceList shows item count, scrollable list, no expand toggle (shard 4, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for DashboardDataModel (shard 4, 1 retry)
  • Pages/DataContractsSemanticRules.spec.ts › Validate Description Rule Is_Set (shard 4, 1 retry)
  • Pages/DomainUIInteractions.spec.ts › Add expert to domain via UI (shard 4, 1 retry)
  • Pages/Entity.spec.ts › Tag Add, Update and Remove (shard 4, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/ServiceEntity.spec.ts › Announcement create, edit & delete (shard 6, 1 retry)
  • Pages/Users.spec.ts › Permissions for table details page for Data Consumer (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

@sonarqubecloud
Copy link
Copy Markdown

@pmbrull pmbrull merged commit dbf6410 into main Apr 17, 2026
81 of 85 checks passed
@pmbrull pmbrull deleted the feat/query-runner-max-result-size-cap branch April 17, 2026 13:43
pmbrull added a commit that referenced this pull request Apr 20, 2026
* feat(queryRunner): raise maxResultSize upper bound to 100000

The previous 1000 cap forced admins to pick a conservative row limit for
Query Runner executions even when their ingestion runner could comfortably
return larger result sets. Raise the schema maximum to 100000 so operators
have headroom to expose more rows to SQL Studio and downstream consumers.

Enforcement and the default (100) are unchanged; the backend still injects
the value from QueryRunnerConfig.querySettings.maxResultSize on every
trigger, so this only widens the ceiling admins can opt into.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(queryRunner): lower maxResultSize ceiling from 100000 to 50000

Reviewer feedback: 50k is a better trade-off between giving admins
headroom and keeping the batch workflow response from returning unwieldy
payloads. Enforcement and defaults unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(queryRunner): further lower maxResultSize ceiling to 10000

Reviewer feedback: 10k is a more conservative ceiling that still gives
admins meaningful headroom over the previous 1k cap while keeping payload
sizes predictable through the batch workflow transport.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ingestion safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants