Skip to content

Fix secondary dataformat to contain lucene#5480

Merged
ahkcs merged 1 commit into
opensearch-project:mainfrom
nssuresh2007:parquet_index_fix
May 28, 2026
Merged

Fix secondary dataformat to contain lucene#5480
ahkcs merged 1 commit into
opensearch-project:mainfrom
nssuresh2007:parquet_index_fix

Conversation

@nssuresh2007

Copy link
Copy Markdown
Contributor

Description

While creating indices in analyticsIndex enabled, secondary dataformat is not specified as lucene which is failing when executing queries which span across both lucene and parquet.
Added the lucene parameter for secondary format.

Related Issues

Without this fix:

./gradlew :integ-test:integTestRemote \
      -Dtests.rest.cluster=localhost:9200 \
      -Dtests.cluster=localhost:9300 \
      -Dtests.clustername=runTask \
      -Dtests.analytics.parquet_indices=true \
      --tests "org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT"

Tests with failures:
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.test_simple_query_string_without_fields
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.verify_escape_in_query_string
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.verify_default_operator_in_simple_query_string
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.test_query_string_without_fields
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.test_wildcard_simple_query_string
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.verify_operator_in_match
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.test_multi_fields_relevance_function_and_implicit_timestamp_filter
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.verify_default_operator_in_multi_match
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.verify_flags_in_simple_query_string
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.verify_default_operator_in_query_string
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.test_multi_match_without_fields
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.test_multi_match_without_fields_with_options
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.test_mixed_relevance_function_and_normal_filter
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.not_pushdown_throws_exception
 - org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT.test_single_field_relevance_function_and_implicit_timestamp_filter

With the fix:

./gradlew :integ-test:integTestRemote \
      -Dtests.rest.cluster=localhost:9200 \
      -Dtests.cluster=localhost:9300 \
      -Dtests.clustername=runTask \
      -Dtests.analytics.parquet_indices=true \
      --tests "org.opensearch.sql.calcite.remote.CalciteRelevanceFunctionIT"


CalciteRelevanceFunctionIT > test_mixed_relevance_function_and_normal_filter PASSED
CalciteRelevanceFunctionIT > verify_escape_in_query_string PASSED
CalciteRelevanceFunctionIT > verify_default_operator_in_query_string PASSED
CalciteRelevanceFunctionIT > test_single_field_relevance_function_and_implicit_timestamp_filter PASSED
CalciteRelevanceFunctionIT > test_multi_match_without_fields PASSED
CalciteRelevanceFunctionIT > test_wildcard_simple_query_string PASSED
CalciteRelevanceFunctionIT > verify_flags_in_simple_query_string PASSED
CalciteRelevanceFunctionIT > test_multi_match_without_fields_with_options PASSED
CalciteRelevanceFunctionIT > test_multi_fields_relevance_function_and_implicit_timestamp_filter PASSED
CalciteRelevanceFunctionIT > not_pushdown_throws_exception PASSED
CalciteRelevanceFunctionIT > verify_operator_in_match PASSED
CalciteRelevanceFunctionIT > verify_default_operator_in_multi_match PASSED
CalciteRelevanceFunctionIT > test_simple_query_string_without_fields PASSED
CalciteRelevanceFunctionIT > test_query_string_without_fields PASSED
CalciteRelevanceFunctionIT > verify_default_operator_in_simple_query_string PASSED

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

While creating indices in analyticsIndex enabled, secondary dataformat
is not specified as lucene which is failing when executing queries which
span across both lucene and parquet.
Added the lucene parameter for secondary format.

Signed-off-by: Suresh N S <nssuresh@amazon.com>
@github-actions

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Validate secondary format compatibility

Consider validating that the secondary data format configuration is compatible with
the primary format. Adding "lucene" as a secondary format when "parquet" is primary
may cause conflicts if both formats attempt to handle the same data operations.

integ-test/src/test/java/org/opensearch/sql/legacy/TestUtils.java [87]

-indexSettings.put("composite.secondary_data_formats", new org.json.JSONArray().put("lucene"));
+JSONArray secondaryFormats = new org.json.JSONArray().put("lucene");
+indexSettings.put("composite.secondary_data_formats", secondaryFormats);
+// Ensure lucene is compatible with parquet as primary format
Suggestion importance[1-10]: 5

__

Why: The suggestion asks to verify compatibility between parquet and lucene formats, which is a valid concern. However, the improved_code doesn't actually add validation logic—it only extracts the JSONArray to a variable and adds a comment. This is a moderate improvement for code clarity but doesn't address the actual validation concern raised.

Low

Comment thread integ-test/src/test/java/org/opensearch/sql/legacy/TestUtils.java

@dai-chen dai-chen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@ahkcs ahkcs merged commit 5233ec4 into opensearch-project:main May 28, 2026
41 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants