Skip to content

Commit f153b99

Browse files
committed
Parquet-back raw-PUT test indices on the analytics-engine route
With -Dtests.analytics.parquet_indices=true, indices created by a raw document PUT (e.g. `PUT /test/_doc/1` in a test's init()) bypass AnalyticsIndexConfig.applyIndexCreationSettings, so they inherit the composite *value* — and are therefore routed to the analytics engine by RestUnifiedQueryAction.isAnalyticsIndex — but not the `pluggable.dataformat.enabled` flag. They are then stored as a plain-Lucene EngineBackedIndexer whose acquireReader() is unimplemented, and the query fails with `StreamException[INTERNAL] Failed to start streaming fragment`. Apply the cluster-level composite defaults in setUpIndices() so every index — including raw-PUT ones — is stored as a parquet-backed DataFormatAwareEngine that is actually scannable by the analytics engine it routes to. No-op unless tests.analytics.parquet_indices=true, so normal CI is unchanged. Signed-off-by: Kai Huang <ahkcs@amazon.com>
1 parent f861d02 commit f153b99

2 files changed

Lines changed: 42 additions & 0 deletions

File tree

integ-test/src/test/java/org/opensearch/sql/legacy/SQLIntegTestCase.java

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,12 @@ public void setUpIndices() throws Exception {
6666
initClient();
6767
}
6868

69+
// When -Dtests.analytics.parquet_indices=true, make every index (including ones a test
70+
// auto-creates via a raw document PUT, which bypasses createIndexByRestClient) parquet-backed
71+
// composite, so it is stored as a DataFormatAwareEngine and is actually scannable by the
72+
// analytics engine it routes to. Must run before init() creates any index.
73+
TestUtils.AnalyticsIndexConfig.applyClusterSettings(client());
74+
6975
if (shouldResetQuerySizeLimit()) {
7076
resetQuerySizeLimit();
7177
}

integ-test/src/test/java/org/opensearch/sql/legacy/TestUtils.java

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,42 @@ static void applyIndexCreationSettings(JSONObject jsonObject) {
8989
jsonObject.put("settings", settings);
9090
}
9191

92+
/**
93+
* Apply the cluster-level composite defaults so EVERY index — including ones auto-created by a
94+
* raw document {@code PUT} that bypasses {@link #applyIndexCreationSettings} (e.g. {@code PUT
95+
* /test/_doc/1} in a test's {@code init()}) — is stored as a parquet-backed composite index
96+
* ({@code DataFormatAwareEngine}) and is therefore scannable by the analytics engine.
97+
*
98+
* <p>Without this, a raw-PUT index inherits the composite <em>value</em> (so {@code
99+
* RestUnifiedQueryAction#isAnalyticsIndex} routes it to the analytics engine) but not the
100+
* {@code .enabled} flag, so it is stored as a plain-Lucene {@code EngineBackedIndexer} whose
101+
* {@code acquireReader()} is unimplemented — surfacing at query time as {@code
102+
* StreamException[INTERNAL] "Failed to start streaming fragment"}. Setting the {@code .enabled}
103+
* flag (plus the parquet primary / lucene secondary formats) at the cluster level makes the
104+
* stored format match the routing decision.
105+
*
106+
* <p>Existing system indices are created at cluster startup (before this runs) and templated
107+
* system indices carry explicit settings, so the cluster default does not retroactively alter
108+
* them. The static {@code cluster.pluggable.dataformat.restrict.allowlist} guard is not
109+
* dynamically updateable, so it must live in the cluster's node config if a run ever needs to
110+
* exempt additional index name prefixes. No-op when disabled; idempotent, so it is safe to
111+
* re-apply before each test.
112+
*/
113+
public static void applyClusterSettings(RestClient client) {
114+
if (!isEnabled()) {
115+
return;
116+
}
117+
Request request = new Request("PUT", "/_cluster/settings");
118+
request.setJsonEntity(
119+
"{\"persistent\":{"
120+
+ "\"cluster.pluggable.dataformat.enabled\":true,"
121+
+ "\"cluster.pluggable.dataformat\":\"composite\","
122+
+ "\"cluster.composite.primary_data_format\":\"parquet\","
123+
+ "\"cluster.composite.secondary_data_formats\":[\"lucene\"]"
124+
+ "}}");
125+
performRequest(client, request);
126+
}
127+
92128
/**
93129
* Returns the {@code _bulk} refresh query string for the current index type. Parquet-backed
94130
* indices in the analytics-backend-lucene composite engine don't yet implement {@code

0 commit comments

Comments
 (0)