Skip to content

Commit d289025

Browse files
committed
review - make highlight embeded into datarows/schemas
Signed-off-by: Jialiang Liang <jiallian@amazon.com>
1 parent e2ef1b2 commit d289025

6 files changed

Lines changed: 101 additions & 160 deletions

File tree

docs/user/ppl/interfaces/endpoint.md

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -204,7 +204,7 @@ Expected output (trimmed):
204204

205205
## Highlight
206206

207-
You can add a `highlight` parameter to the PPL request body to enable search-result highlighting. This parameter follows the same semantics as the [OpenSearch highlight API](https://docs.opensearch.org/latest/search-plugins/searching-data/highlight/). When enabled, the response includes a top-level `highlights` array containing matching fragments with the specified tags. Each entry in the `highlights` array corresponds to the row at the same index in `datarows`.
207+
You can add a `highlight` parameter to the PPL request body to enable search-result highlighting. This parameter follows the same semantics as the [OpenSearch highlight API](https://docs.opensearch.org/latest/search-plugins/searching-data/highlight/). When enabled, the response includes a `_highlight` column in `schema` and `datarows` containing matching fragments with the specified tags. Each `_highlight` value in a datarow is an object whose keys are field names and whose values are arrays of highlight fragments for the corresponding row.
208208

209209
Two formats are supported:
210210

@@ -246,25 +246,21 @@ Expected output (trimmed):
246246
"schema": [
247247
{ "name": "account_number", "type": "bigint" },
248248
{ "name": "firstname", "type": "string" },
249-
{ "name": "lastname", "type": "string" }
249+
{ "name": "lastname", "type": "string" },
250+
{ "name": "_highlight", "type": "struct" }
250251
],
251252
"datarows": [
252-
[578, "Holmes", "Mcknight"],
253-
[828, "Blanche", "Holmes"],
254-
[1, "Amber", "Duke"]
255-
],
256-
"highlights": [
257-
{
253+
[578, "Holmes", "Mcknight", {
258254
"firstname": ["@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"],
259255
"firstname.keyword": ["@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"]
260-
},
261-
{
256+
}],
257+
[828, "Blanche", "Holmes", {
262258
"lastname": ["@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"],
263259
"lastname.keyword": ["@opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@"]
264-
},
265-
{
260+
}],
261+
[1, "Amber", "Duke", {
266262
"address": ["880 @opensearch-dashboards-highlighted-field@Holmes@/opensearch-dashboards-highlighted-field@ Lane"]
267-
}
263+
}]
268264
],
269265
"total": 3,
270266
"size": 3
@@ -292,8 +288,8 @@ Exceeding these limits returns an error.
292288

293289
### Notes
294290

295-
- Highlighting requires a search term in the PPL statement (e.g. `source=accounts "Holmes"`). Without a search term (e.g. just `source=accounts`), the `highlights` array entries will be empty.
296-
- The `highlights` array in the response is parallel to `datarows` — each entry contains the highlighted fragments for the corresponding row.
291+
- Highlighting requires a search term in the PPL statement (e.g. `source=accounts "Holmes"`). Without a search term (e.g. just `source=accounts`), the `_highlight` values in datarows will be empty objects.
292+
- The `_highlight` column appears in `schema` and `datarows` as a regular column. Each `_highlight` value is an object whose keys are field names and whose values are arrays of highlight fragments.
297293
- In the simple array format, `["*"]` highlights all fields. Specific field names like `["firstname", "lastname"]` scope highlighting to those fields only.
298294
- In the object format, each key in the `fields` object is a field name or wildcard. Each value is an object of per-field highlight options. Supported per-field options: `fragment_size`, `number_of_fragments`, `type` (`plain`, `unified`, `fvh`), `pre_tags`, `post_tags`, `require_field_match`, `no_match_size`, `order`. Use `{}` for defaults. Example: `{"title": {"fragment_size": 200}, "body": {"type": "plain"}}`.
299-
- Highlights may include fields that are not explicitly projected in the `schema`/`datarows`. For example, using `{"*": {}}` highlights all fields that matched the search query, including fields not selected by `| fields`. In the example above, the `address` field appears in `highlights` because it contains a match ("880 Holmes Lane") even though only `account_number`, `firstname`, and `lastname` are projected.
295+
- Highlights may include fields that are not explicitly projected in the other columns. For example, using `{"*": {}}` highlights all fields that matched the search query, including fields not selected by `| fields`. In the example above, the `address` field appears in `_highlight` because it contains a match ("880 Holmes Lane") even though only `account_number`, `firstname`, and `lastname` are projected as separate columns.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalciteHighlightIT.java

Lines changed: 58 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -44,13 +44,15 @@ public void testHighlightWildcardWithSearchQuery() throws IOException {
4444
public void testHighlightContainsMatchingFragments() throws IOException {
4545
JSONObject result =
4646
executeQueryWithHighlight("source=" + TEST_INDEX_ACCOUNT + " \"Holmes\"", "[\"*\"]");
47-
JSONArray highlights = result.getJSONArray("highlights");
48-
assertTrue("highlights array should not be empty", highlights.length() > 0);
47+
int hlIndex = getHighlightColumnIndex(result);
48+
assertTrue("_highlight column should exist in schema", hlIndex >= 0);
49+
JSONArray dataRows = result.getJSONArray("datarows");
50+
assertTrue("datarows should not be empty", dataRows.length() > 0);
4951
// At least one highlight entry should have non-empty data
5052
boolean foundFragment = false;
51-
for (int i = 0; i < highlights.length(); i++) {
52-
JSONObject hlEntry = highlights.getJSONObject(i);
53-
if (hlEntry.length() > 0) {
53+
for (int i = 0; i < dataRows.length(); i++) {
54+
JSONObject hlEntry = dataRows.getJSONArray(i).optJSONObject(hlIndex);
55+
if (hlEntry != null && hlEntry.length() > 0) {
5456
foundFragment = true;
5557
break;
5658
}
@@ -69,11 +71,11 @@ public void testHighlightOsdObjectFormat() throws IOException {
6971
assertTrue(dataRows.length() > 0);
7072
assertHighlightsExist(result);
7173
// Verify custom tags are applied
72-
JSONArray highlights = result.getJSONArray("highlights");
74+
int hlIndex = getHighlightColumnIndex(result);
7375
boolean foundCustomTag = false;
74-
for (int i = 0; i < highlights.length(); i++) {
75-
String hlStr = highlights.getJSONObject(i).toString();
76-
if (hlStr.contains("<b>")) {
76+
for (int i = 0; i < dataRows.length(); i++) {
77+
JSONObject hlEntry = dataRows.getJSONArray(i).optJSONObject(hlIndex);
78+
if (hlEntry != null && hlEntry.toString().contains("<b>")) {
7779
foundCustomTag = true;
7880
break;
7981
}
@@ -93,11 +95,12 @@ public void testHighlightOsdObjectFormatWithDashboardsTags() throws IOException
9395
assertTrue(dataRows.length() > 0);
9496
assertHighlightsExist(result);
9597
// Verify dashboards tags are applied
96-
JSONArray highlights = result.getJSONArray("highlights");
98+
int hlIndex = getHighlightColumnIndex(result);
9799
boolean foundDashboardsTag = false;
98-
for (int i = 0; i < highlights.length(); i++) {
99-
String hlStr = highlights.getJSONObject(i).toString();
100-
if (hlStr.contains("@opensearch-dashboards-highlighted-field@")) {
100+
for (int i = 0; i < dataRows.length(); i++) {
101+
JSONObject hlEntry = dataRows.getJSONArray(i).optJSONObject(hlIndex);
102+
if (hlEntry != null
103+
&& hlEntry.toString().contains("@opensearch-dashboards-highlighted-field@")) {
101104
foundDashboardsTag = true;
102105
break;
103106
}
@@ -162,7 +165,7 @@ public void testHighlightWildcardInSearchText() throws IOException {
162165
JSONObject result =
163166
executeQueryWithHighlight("source=" + TEST_INDEX_ACCOUNT + " \"Holm*\"", "[\"*\"]");
164167
assertTrue("Response should contain datarows", result.has("datarows"));
165-
assertTrue("Response should contain highlights array", result.has("highlights"));
168+
assertHighlightsExist(result);
166169
}
167170

168171
@Test
@@ -172,7 +175,7 @@ public void testHighlightMixedFullTextAndStructuredFilter() throws IOException {
172175
"source=" + TEST_INDEX_ACCOUNT + " \"Holmes\" | where age > 30 | fields firstname, age",
173176
"[\"*\"]");
174177
assertTrue("Response should contain datarows", result.has("datarows"));
175-
assertTrue("Response should contain highlights array", result.has("highlights"));
178+
assertHighlightsExist(result);
176179
}
177180

178181
@Test
@@ -201,7 +204,7 @@ public void testHighlightWithHead() throws IOException {
201204
assertTrue("Response should contain datarows", result.has("datarows"));
202205
JSONArray dataRows = result.getJSONArray("datarows");
203206
assertTrue("Should return at most 2 rows", dataRows.length() <= 2);
204-
assertTrue("Response should contain highlights array", result.has("highlights"));
207+
assertHighlightsExist(result);
205208
}
206209

207210
@Test
@@ -222,13 +225,16 @@ public void testHighlightBooleanOrSearch() throws IOException {
222225
assertTrue("OR search should return results", dataRows.length() > 0);
223226
assertHighlightsExist(result);
224227
// Verify highlights contain fragments for both search terms
225-
JSONArray highlights = result.getJSONArray("highlights");
228+
int hlIndex = getHighlightColumnIndex(result);
226229
boolean foundHolmes = false;
227230
boolean foundBond = false;
228-
for (int i = 0; i < highlights.length(); i++) {
229-
String hlStr = highlights.getJSONObject(i).toString();
230-
if (hlStr.contains("Holmes")) foundHolmes = true;
231-
if (hlStr.contains("Bond")) foundBond = true;
231+
for (int i = 0; i < dataRows.length(); i++) {
232+
JSONObject hlEntry = dataRows.getJSONArray(i).optJSONObject(hlIndex);
233+
if (hlEntry != null) {
234+
String hlStr = hlEntry.toString();
235+
if (hlStr.contains("Holmes")) foundHolmes = true;
236+
if (hlStr.contains("Bond")) foundBond = true;
237+
}
232238
}
233239
assertTrue("Highlights should contain Holmes fragments", foundHolmes);
234240
assertTrue("Highlights should contain Bond fragments", foundBond);
@@ -243,13 +249,16 @@ public void testHighlightBooleanAndSearch() throws IOException {
243249
assertTrue("AND search should return results", dataRows.length() > 0);
244250
assertHighlightsExist(result);
245251
// Verify highlights contain fragments for both terms
246-
JSONArray highlights = result.getJSONArray("highlights");
252+
int hlIndex = getHighlightColumnIndex(result);
247253
boolean foundHolmes = false;
248254
boolean foundLane = false;
249-
for (int i = 0; i < highlights.length(); i++) {
250-
String hlStr = highlights.getJSONObject(i).toString();
251-
if (hlStr.contains("Holmes")) foundHolmes = true;
252-
if (hlStr.contains("Lane")) foundLane = true;
255+
for (int i = 0; i < dataRows.length(); i++) {
256+
JSONObject hlEntry = dataRows.getJSONArray(i).optJSONObject(hlIndex);
257+
if (hlEntry != null) {
258+
String hlStr = hlEntry.toString();
259+
if (hlStr.contains("Holmes")) foundHolmes = true;
260+
if (hlStr.contains("Lane")) foundLane = true;
261+
}
253262
}
254263
assertTrue("Highlights should contain Holmes fragments", foundHolmes);
255264
assertTrue("Highlights should contain Lane fragments", foundLane);
@@ -282,10 +291,10 @@ public void testHighlightNoSearchQuery() throws IOException {
282291
}
283292

284293
@Test
285-
public void testWithoutHighlightNoHighlightArray() throws IOException {
286-
// Without highlight parameter, highlights array should NOT appear
294+
public void testWithoutHighlightNoHighlightColumn() throws IOException {
295+
// Without highlight parameter, _highlight column should NOT appear in schema
287296
JSONObject result = executeQuery("source=" + TEST_INDEX_BANK);
288-
assertFalse("Response should NOT contain highlights array", result.has("highlights"));
297+
assertTrue("_highlight column should NOT be in schema", getHighlightColumnIndex(result) < 0);
289298
}
290299

291300
/**
@@ -316,10 +325,26 @@ protected JSONObject executeQueryWithHighlight(String query, String highlightJso
316325
return jsonify(getResponseBody(response, true));
317326
}
318327

319-
/** Assert that the response contains a non-empty highlights array. */
328+
/**
329+
* Find the index of the _highlight column in the schema array.
330+
*
331+
* @return the column index, or -1 if not present
332+
*/
333+
private int getHighlightColumnIndex(JSONObject result) {
334+
JSONArray schema = result.getJSONArray("schema");
335+
for (int i = 0; i < schema.length(); i++) {
336+
if ("_highlight".equals(schema.getJSONObject(i).getString("name"))) {
337+
return i;
338+
}
339+
}
340+
return -1;
341+
}
342+
343+
/** Assert that the response contains a _highlight column with non-empty highlight data. */
320344
private void assertHighlightsExist(JSONObject result) {
321-
assertTrue("Response should contain highlights array", result.has("highlights"));
322-
JSONArray highlights = result.getJSONArray("highlights");
323-
assertTrue("Highlights array should not be empty", highlights.length() > 0);
345+
int hlIndex = getHighlightColumnIndex(result);
346+
assertTrue("Schema should contain _highlight column", hlIndex >= 0);
347+
JSONArray dataRows = result.getJSONArray("datarows");
348+
assertTrue("datarows should not be empty", dataRows.length() > 0);
324349
}
325350
}

protocol/src/main/java/org/opensearch/sql/protocol/response/QueryResult.java

Lines changed: 1 addition & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -5,15 +5,11 @@
55

66
package org.opensearch.sql.protocol.response;
77

8-
import static org.opensearch.sql.expression.HighlightExpression.HIGHLIGHT_FIELD;
9-
108
import java.util.Collection;
119
import java.util.Iterator;
1210
import java.util.LinkedHashMap;
13-
import java.util.List;
1411
import java.util.Locale;
1512
import java.util.Map;
16-
import java.util.stream.Collectors;
1713
import lombok.Getter;
1814
import org.opensearch.sql.data.model.ExprValue;
1915
import org.opensearch.sql.data.model.ExprValueUtils;
@@ -75,7 +71,6 @@ public int size() {
7571
public Map<String, String> columnNameTypes() {
7672
Map<String, String> colNameTypes = new LinkedHashMap<>();
7773
schema.getColumns().stream()
78-
.filter(column -> !HIGHLIGHT_FIELD.equals(getColumnName(column)))
7974
.forEach(
8075
column ->
8176
colNameTypes.put(
@@ -90,41 +85,10 @@ public Iterator<Object[]> iterator() {
9085
.map(ExprValueUtils::getTupleValue)
9186
.map(
9287
tuple ->
93-
tuple.entrySet().stream()
94-
.filter(e -> !HIGHLIGHT_FIELD.equals(e.getKey()))
95-
.map(e -> e.getValue().value())
96-
.toArray(Object[]::new))
88+
tuple.entrySet().stream().map(e -> e.getValue().value()).toArray(Object[]::new))
9789
.iterator();
9890
}
9991

100-
/**
101-
* Extract highlight data from each result row. Each row may contain a {@code _highlight} field
102-
* added by {@code OpenSearchResponse.addHighlightsToBuilder()} and preserved through projection.
103-
* Returns a list parallel to datarows where each entry is either a map of field name to highlight
104-
* fragments, or null if no highlight data exists for that row.
105-
*/
106-
public List<Map<String, Object>> highlights() {
107-
return exprValues.stream()
108-
.map(ExprValueUtils::getTupleValue)
109-
.map(
110-
tuple -> {
111-
ExprValue hl = tuple.get(HIGHLIGHT_FIELD);
112-
if (hl == null || hl.isMissing() || hl.isNull()) {
113-
return null;
114-
}
115-
Map<String, Object> hlMap = new LinkedHashMap<>();
116-
for (Map.Entry<String, ExprValue> entry : hl.tupleValue().entrySet()) {
117-
hlMap.put(
118-
entry.getKey(),
119-
entry.getValue().collectionValue().stream()
120-
.map(ExprValue::stringValue)
121-
.collect(Collectors.toList()));
122-
}
123-
return (Map<String, Object>) hlMap;
124-
})
125-
.collect(Collectors.toList());
126-
}
127-
12892
private String getColumnName(Column column) {
12993
return (column.getAlias() != null) ? column.getAlias() : column.getName();
13094
}

protocol/src/main/java/org/opensearch/sql/protocol/response/format/SimpleJsonResponseFormatter.java

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,6 @@
66
package org.opensearch.sql.protocol.response.format;
77

88
import java.util.List;
9-
import java.util.Map;
10-
import java.util.Objects;
119
import lombok.Builder;
1210
import lombok.Getter;
1311
import lombok.RequiredArgsConstructor;
@@ -57,11 +55,6 @@ public Object buildJsonObject(QueryResult response) {
5755

5856
json.datarows(fetchDataRows(response));
5957

60-
List<Map<String, Object>> highlights = response.highlights();
61-
if (highlights.stream().anyMatch(Objects::nonNull)) {
62-
json.highlights(highlights);
63-
}
64-
6558
formatMetric.set(System.nanoTime() - formatTime);
6659

6760
json.profile(QueryProfiling.current().finish());
@@ -87,7 +80,6 @@ public static class JsonResponse {
8780
private final List<Column> schema;
8881

8982
private final Object[][] datarows;
90-
private final List<Map<String, Object>> highlights;
9183

9284
private long total;
9385
private long size;

0 commit comments

Comments
 (0)