Skip to content

Commit 9ebf317

Browse files
committed
Add technical doc for dynamic fields
Signed-off-by: Tomoyuki Morita <moritato@amazon.com>
1 parent 4188dac commit 9ebf317

2 files changed

Lines changed: 336 additions & 1 deletion

File tree

docs/dev/dynamic-fields.md

Lines changed: 335 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,335 @@
1+
# Dynamic Fields in PPL
2+
3+
## Overview
4+
5+
Dynamic fields enable schema-on-read capabilities in PPL, allowing queries to work with fields that aren't known at query planning time.
6+
The key idea is to map fields that are directly referred in the query to static fields, and others to dynamic fields.
7+
And most of the command implementation only consider static fields, while very few commands such as join need to handle dynamic fields.
8+
9+
**Key Concepts:**
10+
- **Field Resolution**: Analyze query and determines which fields need to be mapped to static/dynamic fields.
11+
- **Static Fields**: Known fields extracted as individual columns
12+
- **Dynamic Fields**: Unknown/wildcard-matched fields stored in `_MAP` field
13+
- **`_MAP` Field**: Special field containing all unmapped fields as a map
14+
15+
## Architecture
16+
17+
### Components
18+
19+
1. **FieldResolutionVisitor** (`core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionVisitor.java`)
20+
- Analyzes PPL AST to determine field requirements
21+
- Propagates field requirements through query plan
22+
- Returns `FieldResolutionResult` for each node
23+
24+
2. **FieldResolutionResult** (`core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionResult.java`)
25+
- Contains regular fields (static, known fields)
26+
- Contains wildcard patterns (e.g., `prefix*`)
27+
- Provides methods to combine/modify field requirements
28+
29+
3. **CalciteRelNodeVisitor** (`core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`)
30+
- Converts AST to Calcite RelNode
31+
- Uses field resolution results to build execution plan
32+
- Handles `_MAP` field creation/modification for dynamic fields
33+
34+
4. **DynamicFieldsHelper** (`core/src/main/java/org/opensearch/sql/calcite/DynamicFieldsHelper.java`)
35+
- Utility methods for dynamic field handling
36+
- Manages `_MAP` field operations
37+
- Handles field projection and merging
38+
39+
5. **DynamicFieldsResultProcessor** (`core/src/main/java/org/opensearch/sql/calcite/utils/DynamicFieldsResultProcessor.java`)
40+
- Expands `_MAP` field into individual columns before returning results to client
41+
- Collects all dynamic field names across result rows
42+
- Converts dynamic field values to STRING type for consistency
43+
- Creates expanded schema with both static and dynamic fields
44+
45+
## How Dynamic Fields Work
46+
47+
### Field Resolution Process
48+
49+
```
50+
Query: source=logs | spath input=data | fields a, b, prefix*
51+
52+
1. FieldResolutionVisitor analyzes query:
53+
- Visits nodes bottom-up
54+
- Collects field requirements at each stage
55+
- Result: {regularFields: {a, b}, wildcard: prefix*}
56+
57+
2. CalciteRelNodeVisitor builds execution plan:
58+
- Extracts JSON: JSON_EXTRACT_ALL(data) → fullMap
59+
- Projects static fields: a, b
60+
- Creates _MAP: MAP_REMOVE(fullMap, ARRAY['a', 'b'])
61+
62+
3. Runtime execution:
63+
- Static fields become individual columns
64+
- _MAP contains all other fields as map
65+
```
66+
67+
### Field Resolution Integration
68+
69+
Field resolution works through a stack-based traversal:
70+
71+
```java
72+
// Example from FieldResolutionVisitor
73+
@Override
74+
public Node visitProject(Project node, FieldResolutionContext context) {
75+
// Get current requirements from downstream
76+
FieldResolutionResult current = context.getCurrentRequirements();
77+
78+
// Analyze project fields
79+
Set<String> projectFields = extractFields(node.getProjectList());
80+
81+
// Combine with current requirements
82+
context.pushRequirements(current.and(projectFields));
83+
84+
// Visit children with updated requirements
85+
visitChildren(node, context);
86+
87+
context.popRequirements();
88+
return node;
89+
}
90+
```
91+
92+
### `_MAP` Field Creation
93+
94+
When wildcards are present, the `_MAP` field stores unmapped attributes:
95+
96+
```java
97+
// From CalciteRelNodeVisitor.spathExtractAll()
98+
if (resolutionResult.hasWildcards()) {
99+
// Build array of static field names to remove
100+
RexNode keyArray = buildKeyArray(regularFields, context);
101+
102+
// Create _MAP = MAP_REMOVE(fullMap, keyArray)
103+
RexNode dynamicMapField = makeCall(
104+
context,
105+
BuiltinFunctionName.MAP_REMOVE,
106+
fullMap,
107+
keyArray
108+
);
109+
110+
fields.add(context.relBuilder.alias(dynamicMapField, "_MAP"));
111+
}
112+
```
113+
114+
### Result Processing and Dynamic Field Expansion
115+
116+
After query execution, the `DynamicFieldsResultProcessor` expands the `_MAP` field into individual columns before returning results to the client:
117+
118+
```java
119+
// From DynamicFieldsResultProcessor.expandDynamicFields()
120+
public static QueryResponse expandDynamicFields(QueryResponse response) {
121+
if (!hasDynamicFields(response)) {
122+
return response;
123+
}
124+
125+
// 1. Collect all dynamic field names across all result rows
126+
Map<String, ExprType> dynamicFieldTypes = getDynamicFieldTypes(response.getResults());
127+
128+
// 2. Create expanded schema: static fields + sorted dynamic fields
129+
Schema expandedSchema = createExpandedSchema(response.getSchema(), dynamicFieldTypes);
130+
131+
// 3. Expand each row: extract _MAP values into individual columns
132+
List<ExprValue> expandedRows = expandResultRows(response.getResults(), expandedSchema);
133+
134+
return new QueryResponse(expandedSchema, expandedRows, response.getCursor());
135+
}
136+
```
137+
138+
**Key Processing Steps:**
139+
140+
1. **Dynamic Field Collection**: Scans all result rows to collect unique field names from `_MAP` fields
141+
2. **Schema Expansion**: Creates new schema with static fields followed by sorted dynamic fields
142+
3. **Row Expansion**: For each row, extracts values from `_MAP` and creates individual columns
143+
4. **Type Conversion**: Converts all dynamic field values to STRING type for consistency
144+
145+
**Example:**
146+
147+
```
148+
Before expansion (internal representation):
149+
| a | b | _MAP |
150+
|-----|-----|-----------------------------------------|
151+
| v1 | v2 | {prefix_x: "val1", prefix_y: "val2"} |
152+
| v3 | v4 | {prefix_x: "val3", other: "val4"} |
153+
154+
After expansion (returned to client):
155+
| a | b | other | prefix_x | prefix_y |
156+
|-----|-----|-------|----------|----------|
157+
| v1 | v2 | NULL | val1 | val2 |
158+
| v3 | v4 | val4 | val3 | NULL |
159+
```
160+
161+
**Important Notes:**
162+
- Dynamic fields are sorted alphabetically in the output schema
163+
- Missing dynamic fields in a row are filled with NULL values
164+
- All dynamic field values are converted to STRING type
165+
- Static fields always appear before dynamic fields in the result
166+
- The `_MAP` field itself is removed from the final output
167+
168+
## Implementing Dynamic Fields for New PPL Commands
169+
170+
### Step 1: Add Field Resolution Support
171+
172+
Implement visitor method in `FieldResolutionVisitor`:
173+
174+
```java
175+
@Override
176+
public Node visitYourCommand(YourCommand node, FieldResolutionContext context) {
177+
// 1. Extract fields used by this command
178+
Set<String> commandFields = extractFieldsFromExpression(node.getField());
179+
180+
// 2. Get current requirements from downstream
181+
FieldResolutionResult current = context.getCurrentRequirements();
182+
183+
// 3. Combine requirements (OR for filter-like, AND for project-like)
184+
context.pushRequirements(current.or(commandFields));
185+
186+
// 4. Visit children
187+
visitChildren(node, context);
188+
189+
// 5. Restore previous requirements
190+
context.popRequirements();
191+
return node;
192+
}
193+
```
194+
195+
**Key Methods:**
196+
- `current.or(fields)`: Union - command needs current fields OR new fields (filters, sorts)
197+
- `current.and(fields)`: Intersection - command outputs only these fields (projections)
198+
- `current.exclude(fields)`: Remove fields - command generates these fields (eval, parse)
199+
200+
### Step 2: Handle Dynamic Fields in Calcite Visitor
201+
202+
Implement visitor method in `CalciteRelNodeVisitor`:
203+
204+
**Note**: Command implementation does not need to handle dynamic fields unless:
205+
- The command takes multiple inputs (e.g. join, append)
206+
- The command works against all the fields
207+
208+
```java
209+
@Override
210+
public RelNode visitYourCommand(YourCommand node, CalcitePlanContext context) {
211+
visitChildren(node, context);
212+
213+
// Check if dynamic fields exist
214+
if (DynamicFieldsHelper.isDynamicFieldsExists(context)) {
215+
// Option A: Reject wildcards if not supported
216+
if (hasWildcardInInput(node)) {
217+
throw new IllegalArgumentException(
218+
"Command does not support wildcard fields");
219+
}
220+
221+
// Option B: Handle dynamic fields explicitly
222+
handleDynamicFields(node, context);
223+
}
224+
225+
// Regular command implementation
226+
// ...
227+
228+
return context.relBuilder.peek();
229+
}
230+
```
231+
232+
### Step 3: Test Dynamic Fields Support
233+
234+
Add integration tests in `integ-test/`:
235+
236+
```java
237+
@Test
238+
public void testYourCommandWithWildcard() {
239+
String query = "source=logs | spath input=data | fields a, prefix* " +
240+
"| yourcommand ...";
241+
242+
JSONObject result = executeQuery(query);
243+
244+
// Verify static/dynamic fields
245+
verifyColumn(result, "a", expectedValues);
246+
}
247+
```
248+
249+
## Limitations and Considerations
250+
251+
### Current Limitations
252+
253+
1. **Type Casting**: All extracted fields are cast to VARCHAR (temporary limitation)
254+
2. **Wildcard Filtering**: `_MAP` contains ALL unmapped fields, not just wildcard matches
255+
3. **Field Ordering**: Wildcard `*` must be at end of field list
256+
257+
### Performance Considerations
258+
259+
1. **JSON Parsing**: `JSON_EXTRACT_ALL` parses entire JSON document
260+
2. **Map Operations**: `MAP_REMOVE` creates new map for each row
261+
3. **Field Access**: Accessing `_MAP` fields requires map lookup at runtime
262+
263+
### Best Practices
264+
265+
1. **Minimize Wildcards**: Use specific field names when possible
266+
2. **Early Filtering**: Apply filters before spath to reduce data volume
267+
3. **Static Fields First**: List known fields explicitly before wildcards
268+
4. **Test Coverage**: Add integration tests for wildcard scenarios
269+
270+
## Examples
271+
272+
### Example 1: Basic Wildcard Usage
273+
274+
```ppl
275+
source=logs | spath input=data | fields a, b, *
276+
```
277+
278+
**Field Resolution:**
279+
- Regular fields: `{a, b}`
280+
- Wildcard: `*`
281+
282+
**Execution Plan:**
283+
```
284+
Project(a, b, _MAP)
285+
├─ a = CAST(ITEM(map, 'a'), VARCHAR)
286+
├─ b = CAST(ITEM(map, 'b'), VARCHAR)
287+
└─ _MAP = MAP_REMOVE(map, ARRAY['a', 'b'])
288+
└─ map = JSON_EXTRACT_ALL(data)
289+
```
290+
291+
### Example 2: Multiple Commands with Wildcards
292+
293+
```ppl
294+
source=logs
295+
| spath input=data
296+
| fields a, *
297+
| where a > 10
298+
| stats count() by a
299+
```
300+
301+
**Field Resolution Flow:**
302+
1. `stats`: needs `{a}` (group-by field)
303+
2. `where`: needs `{a}` (filter field)
304+
3. `fields`: outputs `{a, *}`
305+
4. `spath`: extracts `{a}` + creates `_MAP` for `*`
306+
307+
### Example 3: Join with Dynamic Fields
308+
309+
```ppl
310+
source=logs1
311+
| spath input=data1
312+
| join id logs2
313+
```
314+
315+
**Handling:**
316+
- Both sides may have `_MAP` fields
317+
- Join merges `_MAP` fields based on overwrite mode
318+
- Static fields take precedence over dynamic fields
319+
320+
## Future Enhancements
321+
322+
### Related RFCs
323+
324+
- [RFC #4984](https://github.com/opensearch-project/sql/issues/4984): Schema-on-Read support
325+
- Step 1: Field resolution for spath (completed)
326+
- Step 2: Dynamic spath with `_MAP` field (in progress)
327+
- Step 3: Schema-on-read with static types (planned)
328+
- Step 4: ANY type support (planned)
329+
330+
## References
331+
332+
- **Field Resolution**: `core/src/main/java/org/opensearch/sql/ast/analysis/FieldResolutionVisitor.java`
333+
- **Calcite Integration**: `core/src/main/java/org/opensearch/sql/calcite/CalciteRelNodeVisitor.java`
334+
- **Dynamic Fields Helper**: `core/src/main/java/org/opensearch/sql/calcite/plan/DynamicFieldsHelper.java`
335+
- **Integration Tests**: `integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLSpathWithJoinIT.java`

docs/dev/ppl-commands.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ If you are working on contributing a new PPL command, please read this guide and
2727
- [ ] **Visitor Pattern:**
2828
- Add `visit*` in `AbstractNodeVisitor`
2929
- Overriding `visit*` in `Analyzer`, `CalciteRelNodeVisitor` and `PPLQueryDataAnonymizer`
30-
- Override `visit*` in `FieldResolutionVisitor` for `spath` command support.
30+
- Override `visit*` in `FieldResolutionVisitor` for `spath` command support (see [dynamic-fields.md](./dynamic-fields.md) for details)
3131

3232
- [ ] **Unit Tests:**
3333
- Extend `CalcitePPLAbstractTest`

0 commit comments

Comments
 (0)