Skip to content

Introduce AllGroupHeadsCollectorManager#15565

Open
gaobinlong wants to merge 11 commits into
apache:mainfrom
gaobinlong:allGroupHeads
Open

Introduce AllGroupHeadsCollectorManager#15565
gaobinlong wants to merge 11 commits into
apache:mainfrom
gaobinlong:allGroupHeads

Conversation

@gaobinlong
Copy link
Copy Markdown
Contributor

Description

AllGroupHeadsCollectorManager in grouping package did not have until now a corresponding collector manager, this pr introduces one and switches TestAllGroupHeadsCollector to use search concurrency and move away from the deprecated search(Query, Collector) method.

Relates to #12892.

Signed-off-by: Binlong Gao <gbinlong@amazon.com>
Signed-off-by: Binlong Gao <gbinlong@amazon.com>
@github-actions github-actions Bot added this to the 11.0.0 milestone Jan 13, 2026
Signed-off-by: Binlong Gao <gbinlong@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions Bot added the Stale label Jan 28, 2026
@javanna
Copy link
Copy Markdown
Contributor

javanna commented Apr 13, 2026

Thanks for opening this PR @gaobinlong ! I had a quick look and my observations are similar to those I had at #15557 (comment) . We are introducing a new public collector manager that is only used in tests code, while the grouping collector code (GroupingSearch) requires some rework to rely on collector managers. I wonder if we should start with working on that, which looks though involving, or simply update the test so that it no longer relies on a collector manager.

All in all an important longer term question is how are grouping users going to move from collectors to collector managers. Lucene should probably include the building blocks for them to do so, and we need to define which those are.

@github-actions github-actions Bot removed the Stale label Apr 14, 2026
Signed-off-by: Binlong Gao <gbinlong@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!

@github-actions github-actions Bot added the Stale label Apr 28, 2026
Signed-off-by: Binlong Gao <gbinlong@amazon.com>
Signed-off-by: Binlong Gao <gbinlong@amazon.com>
Signed-off-by: Binlong Gao <gbinlong@amazon.com>
@github-actions github-actions Bot modified the milestones: 11.0.0, 10.5.0 Apr 28, 2026
@gaobinlong
Copy link
Copy Markdown
Contributor Author

@javanna, I've modified some code following your comments in the PR of introducing AllGroupsCollectorManager, this also needs your review, thanks!

Copy link
Copy Markdown
Contributor

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! I am unfamiliar with the grouping module and this collector, but I did a review round anyways. Hope this helps.

Comment thread lucene/CHANGES.txt

private final GroupSelector<T> groupSelector;
protected final Sort sort;
protected final boolean fillSortValues;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this belongs in the new collector manager or in the collector instances. Could we have mixed values for the different collectors created by the same collector manager?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In collector manager, we always set fillSortValues to true when creating new collector, sort values is required when we merge across collectors.

Could we have mixed values for the different collectors created by the same collector manager?

I think no, the sort are same in all collectors.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think I would unify the two paths and always fill sort values, considering that we are working on removing the search method that takes a collector. The only path in the future will be the concurrent one after all. I think that would simplify things.

@github-actions github-actions Bot removed the Stale label Apr 29, 2026
Signed-off-by: Binlong Gao <gbinlong@amazon.com>
Signed-off-by: Binlong Gao <gbinlong@amazon.com>
Copy link
Copy Markdown
Contributor

@javanna javanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gaobinlong I left a few more comments.

s.search(new TermQuery(new Term("content", searchTerm)), allGroupHeadsCollector);
AllGroupHeadsCollectorManager<BytesRef> allGroupHeadsCollectorManager =
new AllGroupHeadsCollectorManager<>(
() -> new TermGroupSelector("group"), sortWithinGroup);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we previously were calling createRandomCollector with more randomization in the mix?

@SuppressWarnings({"unchecked", "rawtypes"})
private int compareValues(Object[] values1, Object[] values2, SortField[] sortFields) {
for (int i = 0; i < sortFields.length; i++) {
FieldComparator comparator = sortFields[i].getComparator(1, Pruning.NONE);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need a fresh new comparator per sort field, per group head? Couldn't we reuse the same instance?


@Override
public GroupHeadsResult reduce(Collection<AllGroupHeadsCollector<T>> collectors) {
Map<Object, GroupHeadWithValues> mergedHeads = new HashMap<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Map<T, GroupHeadWithValues> ?


private final GroupSelector<T> groupSelector;
protected final Sort sort;
protected final boolean fillSortValues;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think I would unify the two paths and always fill sort values, considering that we are working on removing the search method that takes a collector. The only path in the future will be the concurrent one after all. I think that would simplify things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants