You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Inline merge at refresh for composite engine: produce one segment per⦠(opensearch-project#21831)
* Merge on refresh for composite engine with inline merge
Implements merge-on-refresh: when multiple writers flush in the same
refresh cycle, consolidate all formats into a single segment.
Signed-off-by: Bukhtawar Khan <bukhtawa@amazon.com>
Copy file name to clipboardExpand all lines: sandbox/plugins/analytics-backend-datafusion/src/test/java/org/opensearch/be/datafusion/DataFusionNativeBridgeTests.java
Copy file name to clipboardExpand all lines: sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/CompositeConcurrentIndexingIT.java
Copy file name to clipboardExpand all lines: sandbox/plugins/composite-engine/src/internalClusterTest/java/org/opensearch/composite/CompositeMergeIT.java
+97-3Lines changed: 97 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -252,6 +252,94 @@ public void testSortedParquetPrimaryLuceneSecondaryMerge() throws Exception {
252
252
verifyCrossFormatConsistency(snapshot);
253
253
}
254
254
255
+
/**
256
+
* Validates inline consolidation at refresh with fileMappings:
257
+
* When multiple writers flush in the same refresh cycle, the primary (Parquet) merges
258
+
* them and produces a RowIdMapping via fileMappings. The secondary (Lucene) then applies
259
+
* the same mapping. This test uses concurrent indexing to fill multiple writers, then
260
+
* a single refresh to trigger consolidation.
261
+
*
262
+
* Correctness criteria:
263
+
* <ol>
264
+
* <li>After refresh, catalog has 1 segment (consolidated) instead of N per-writer segments</li>
265
+
* <li>Both parquet and lucene formats are present</li>
266
+
* <li>Lucene __row_id__ values are sequential (RowIdMapping correctly applied)</li>
267
+
* <li>Cross-format consistency: parquet row data matches lucene data at same row_id</li>
0 commit comments