Skip to content

feat: add pagination and filtering for column grid response in column#27279

Open
kartikangiras wants to merge 1 commit intoopen-metadata:mainfrom
kartikangiras:filter
Open

feat: add pagination and filtering for column grid response in column#27279
kartikangiras wants to merge 1 commit intoopen-metadata:mainfrom
kartikangiras:filter

Conversation

@kartikangiras
Copy link
Copy Markdown

@kartikangiras kartikangiras commented Apr 11, 2026

Describe your changes:

Fixes #26824

Implemented a two-phase filtering approach in ColumnRepository

  • Detect filtering requirements checks if any filters are active (metadataStatus, hasConflicts, hasMissingMetadata)

  • Removes filter parameters from aggregation requests to get full result set

  • Added 8 new private methods to ColumnRepository for filtering logic
  • Added ColumnRepositoryTest with 2 regression tests validating exact status filtering and stable pagination
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.602 s -- in org.openmetadata.service.jdbi3.ColumnRepositoryTest
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0
[INFO] 
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  01:33 min
[INFO] Finished at: 2026-04-11T17:48:03+05:30
[INFO] ------------------------------------------------------------------------
Screen.Recording.2026-04-18.at.01.35.35.mov

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes <issue-number>: <short explanation>
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: I updated the migration scripts or explained why it is not needed.

Copilot AI review requested due to automatic review settings April 11, 2026 12:40
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect filtering and unstable pagination in the Column Bulk Operations “column grid” endpoint by moving certain filters to a post-aggregation phase and introducing regression tests.

Changes:

  • Added post-aggregation filtering for metadataStatus, hasConflicts, and hasMissingMetadata, with offset-based pagination for filtered results.
  • Implemented a filtered-cursor encoding/decoding mechanism for stable pagination across filtered pages.
  • Added regression tests covering exact status filtering and stable pagination/totals.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnRepository.java Adds two-phase (scan + post-filter) logic and offset-based cursor pagination for filtered grid results.
openmetadata-service/src/test/java/org/openmetadata/service/jdbi3/ColumnRepositoryTest.java Adds unit tests validating exact filtering behavior and stable pagination/totals under filtering.

@harshach
Copy link
Copy Markdown
Collaborator

@kartikangiras provide a screen recording how this functionality looks in the UI

@kartikangiras
Copy link
Copy Markdown
Author

yeah sure

@kartikangiras provide a screen recording how this functionality looks in the UI

@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot AI review requested due to automatic review settings April 16, 2026 20:37
@github-actions
Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 16, 2026

Code Review 🚫 Blocked 1 resolved / 4 findings

Pagination and filtering added for column grid responses, though the implementation performs an inefficient full aggregator scan on every request. Manual field copying in createScanRequest also requires refactoring, and the cursor type mismatch was successfully addressed.

🚨 Performance: Full aggregator scan on every filtered page request

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnRepository.java:147-161

Every call to getFilteredColumnGridPage re-scans the entire dataset from Elasticsearch/OpenSearch (the do-while loop at lines 161-170 iterates until scanCursor is null), then slices out the requested page in memory. There is no caching between page requests.

This means:

  • Page 1 scans everything. Page 2 scans everything again. Page N scans everything N times total.
  • Each iteration of the loop makes a full aggregation query to the search engine, which is expensive.
  • For large column datasets (thousands of unique columns), this turns an O(pageSize) pagination operation into O(totalColumns) per page, with multiple network round-trips to ES/OS.

This is a significant regression compared to the previous approach which only filtered the current page (even though that had correctness issues with totals).

Consider one of these alternatives:

  1. Server-side cache: Cache the filtered result set keyed by filter parameters with a short TTL, and serve subsequent pages from the cache.
  2. Push filters into the aggregation query: Work with the ColumnAggregator to support these filters natively in ES/OS, avoiding the post-aggregation scan entirely.
  3. At minimum: For page 1, accept the full scan but encode the total count in the cursor so subsequent pages can at least skip recomputing totals (though you'd still need to re-scan to find the offset).
Suggested fix
// Minimal improvement: at least avoid re-scanning for
// totals on subsequent pages by encoding them in the cursor.
// Better: push filters into the aggregation query itself,
// or cache the full filtered result set with a short TTL
// keyed by (filter params hash).
⚠️ Edge Case: Cursor type mismatch silently resets to page 1

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnRepository.java:254-268

When a client holds a native aggregator cursor and then adds a filter (or vice versa), the cursor format won't match. decodeFilteredCursorOffset (line 254-281) silently returns offset 0, and the native aggregator path doesn't validate its cursor either. The user silently gets page 1 again with no indication that their pagination state was lost.

While the Javadoc documents this (lines 114-117), a silent reset is a poor UX — the client can't distinguish 'I'm on page 1' from 'my cursor was invalid'. Consider returning an error response (e.g., 400 Bad Request) when a cursor format mismatch is detected, so clients know to reset their pagination state.

💡 Quality: createScanRequest should copy all fields or use a builder/clone

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnRepository.java:198-212

The createScanRequest method (lines 198-218) manually copies ~10 fields from the original request. If new fields are added to ColumnAggregationRequest in the future, this method will silently drop them, leading to incorrect scan results. Consider using a copy constructor, builder pattern, or clone method to ensure all fields are carried over, then explicitly null out only the filter fields.

✅ 1 resolved
Edge Case: Cursor type mismatch silently resets to page 1

📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnRepository.java:110-116 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnRepository.java:217-231 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnRepository.java:141-142 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnRepository.java:146 📄 openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnRepository.java:152-153
The filtered path uses a custom cursor format (filteredOffset:N base64-encoded), while the non-filtered path uses the aggregator's own cursor format. If a user changes filter parameters between page requests (e.g., adds metadataStatus=COMPLETE while holding a normal aggregator cursor, or removes filters while holding a filtered cursor), the cursor silently fails to parse and resets to offset 0 / page 1 — returning duplicate results without any error indication.

decodeFilteredCursorOffset (line 217-232) catches all exceptions and returns 0. The aggregator similarly swallows parse errors. This makes debugging pagination issues very difficult.

🤖 Prompt for agents
Code Review: Pagination and filtering added for column grid responses, though the implementation performs an inefficient full aggregator scan on every request. Manual field copying in createScanRequest also requires refactoring, and the cursor type mismatch was successfully addressed.

1. 🚨 Performance: Full aggregator scan on every filtered page request
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnRepository.java:147-161

   Every call to `getFilteredColumnGridPage` re-scans the **entire** dataset from Elasticsearch/OpenSearch (the `do-while` loop at lines 161-170 iterates until `scanCursor` is null), then slices out the requested page in memory. There is no caching between page requests.
   
   This means:
   - Page 1 scans everything. Page 2 scans everything again. Page N scans everything N times total.
   - Each iteration of the loop makes a full aggregation query to the search engine, which is expensive.
   - For large column datasets (thousands of unique columns), this turns an O(pageSize) pagination operation into O(totalColumns) per page, with multiple network round-trips to ES/OS.
   
   This is a significant regression compared to the previous approach which only filtered the current page (even though that had correctness issues with totals).
   
   Consider one of these alternatives:
   1. **Server-side cache**: Cache the filtered result set keyed by filter parameters with a short TTL, and serve subsequent pages from the cache.
   2. **Push filters into the aggregation query**: Work with the ColumnAggregator to support these filters natively in ES/OS, avoiding the post-aggregation scan entirely.
   3. **At minimum**: For page 1, accept the full scan but encode the total count in the cursor so subsequent pages can at least skip recomputing totals (though you'd still need to re-scan to find the offset).

   Suggested fix:
   // Minimal improvement: at least avoid re-scanning for
   // totals on subsequent pages by encoding them in the cursor.
   // Better: push filters into the aggregation query itself,
   // or cache the full filtered result set with a short TTL
   // keyed by (filter params hash).

2. ⚠️ Edge Case: Cursor type mismatch silently resets to page 1
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnRepository.java:254-268

   When a client holds a native aggregator cursor and then adds a filter (or vice versa), the cursor format won't match. `decodeFilteredCursorOffset` (line 254-281) silently returns offset 0, and the native aggregator path doesn't validate its cursor either. The user silently gets page 1 again with no indication that their pagination state was lost.
   
   While the Javadoc documents this (lines 114-117), a silent reset is a poor UX — the client can't distinguish 'I'm on page 1' from 'my cursor was invalid'. Consider returning an error response (e.g., 400 Bad Request) when a cursor format mismatch is detected, so clients know to reset their pagination state.

3. 💡 Quality: createScanRequest should copy all fields or use a builder/clone
   Files: openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/ColumnRepository.java:198-212

   The `createScanRequest` method (lines 198-218) manually copies ~10 fields from the original request. If new fields are added to `ColumnAggregationRequest` in the future, this method will silently drop them, leading to incorrect scan results. Consider using a copy constructor, builder pattern, or clone method to ensure all fields are carried over, then explicitly null out only the filter fields.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

@kartikangiras
Copy link
Copy Markdown
Author

@PubChimps @harshach I have added the screen recording for this functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect filter effect on Column Bulk Operations for Completeness

3 participants