Warning: This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, join the discussion on the OpenSearch forum.
The graphLookup command performs recursive graph traversal on a collection using a breadth-first search (BFS) algorithm. It finds documents matching a starting value and recursively traverses relationships between documents based on specified fields. This is useful for hierarchical data such as organizational charts, social networks, or routing graphs.
The graphLookup command performs a breadth-first search (BFS) traversal:
- For each source document, extract the value of
start - Query the lookup index to find documents in which
toFieldmatches the start value - Add matched documents to the result array
- Extract
fromFieldvalues from matched documents to continue traversal - Repeat steps 2-4 until no new documents are found or
maxDepthis reached
For bidirectional traversal (<->), the algorithm also follows edges in the reverse direction by additionally matching fromField values.
The graphLookup command has the following syntax:
source = <sourceIndex> | graphLookup <lookupIndex> start=<startExpression> edge=<fromField><operator><toField> [maxDepth=<maxDepth>] [depthField=<depthField>] [supportArray=(true | false)] [batchMode=(true | false)] [usePIT=(true | false)] [filter=(<condition>)] as <outputField>graphLookup can be used as the first command (without source):
graphLookup <lookupIndex> start=<startExpression> edge=<fromField><operator><toField> [maxDepth=<maxDepth>] [depthField=<depthField>] [usePIT=(true | false)] [filter=(<condition>)] as <outputField>The following are examples of the graphLookup command syntax:
source = employees | graphLookup employees start=reportsTo edge=reportsTo-->name as reportingHierarchy
source = employees | graphLookup employees start=reportsTo edge=reportsTo-->name maxDepth=2 as reportingHierarchy
source = employees | graphLookup employees start=reportsTo edge=reportsTo-->name depthField=level as reportingHierarchy
source = employees | graphLookup employees start=reportsTo edge=reportsTo<->name as connections
source = travelers | graphLookup airports start=nearestAirport edge=connects-->airport supportArray=true as reachableAirports
source = airports | graphLookup airports start=airport edge=connects-->airport supportArray=true as reachableAirports
source = employees | graphLookup employees start=reportsTo edge=reportsTo-->name filter=(status = 'active' AND age > 18) as reportingHierarchy
graphLookup employees start='Eliot' edge=reportsTo-->name as reportingHierarchy
graphLookup employees start='Eliot', 'Andrew' edge=reportsTo-->name as reportingHierarchy
graphLookup employees start='Eliot' edge=reportsTo-->name maxDepth=1 depthField=level as reportingHierarchyThe graphLookup command supports the following parameters.
| Parameter | Required/Optional | Description |
|---|---|---|
<lookupIndex> |
Required | The name of the index to perform the graph traversal on. Can be the same as the source index for self-referential graphs. |
start=<startExpression> |
Required | The starting point for the BFS traversal. The startExpression can be a field reference (e.g., start=reportsTo) from the previous pipe, a literal value (e.g., start='Eliot'), or a literal list (e.g., start='Eliot', 'Andrew'). When a field reference is used, the value of that field in each source row initiates the traversal. When literal values are used, they seed the BFS directly. The start value is matched against toField in the lookup index. |
edge=<fromField><operator><toField> |
Required | Defines the traversal path between nodes, specifying the connection fields and the direction of traversal. See Edge Sub-parameters below. |
maxDepth=<maxDepth> |
Optional | The maximum recursion depth (number of hops). Default is 0. A value of 0 returns only direct connections to the start values. A value of 1 returns the initial matches plus one additional recursive step, and so on. |
depthField=<depthField> |
Optional | The name of the field added to each traversed document to indicate its recursion depth. If not specified, no depth field is added. Depth starts at 0 for the first level of matches. |
supportArray=(true | false) |
Optional | When true, disables early visited-node filter pushdown to OpenSearch. Default is false. Set to true when fromField or toField contains array values to ensure correct traversal behavior. See Array Field Handling for details. |
batchMode=(true | false) |
Optional | When true, collects all start values from all source rows and performs a single unified BFS traversal. Default is false. The output changes to two arrays: [Array<sourceRows>, Array<lookupResults>]. See Batch Mode for details. |
usePIT=(true | false) |
Optional | When true, enables Point In Time (PIT) search for the lookup index, allowing paginated retrieval of complete results without the max_result_window size limit. Default is false. See PIT Search for details. |
filter=(<condition>) |
Optional | A filter condition that restricts which lookup index documents participate in the graph traversal. Only documents matching the condition are considered as candidates during BFS. Parentheses around the condition are required. Example: filter=(status = 'active' AND age > 18). |
as <outputField> |
Required | The name of the output array field that will contain all documents discovered during the graph traversal. |
The edge parameter uses the syntax edge=<fromField><operator><toField> and consists of the following components.
| Component | Description |
|---|---|
fromField |
The field in the lookup index documents used as the source of traversal. After a document is matched, the value of this field is used to find the next set of connected documents. Supports both single values and arrays. |
toField |
The field in the lookup index documents used for matching. Documents in which toField equals the current traversal value are included in the results. |
operator |
Specifies the direction of traversal: - --> performs a unidirectional traversal from fromField to toField only (for example, edge=reportsTo-->name traverses from reportsTo to name in one direction only).- <-> performs a bidirectional traversal between fromField and toField (for example, edge=reportsTo<->name traverses between reportsTo and name in both directions). |
Consider an employees index containing the following documents.
| id | name | reportsTo |
|---|---|---|
| 1 | Dev | Eliot |
| 2 | Eliot | Ron |
| 3 | Ron | Andrew |
| 4 | Andrew | null |
| 5 | Asya | Ron |
| 6 | Dan | Andrew |
The following query finds the reporting chain for each employee:
source = employees
| graphLookup employees
start=reportsTo
edge=reportsTo-->name
as reportingHierarchy
The query returns the following results:
+--------+----------+----+-----------------------------------------------+
| name | reportsTo| id | reportingHierarchy |
+--------+----------+----+-----------------------------------------------+
| Dev | Eliot | 1 | [{name:Eliot, reportsTo:Ron, id:2}] |
| Eliot | Ron | 2 | [{name:Ron, reportsTo:Andrew, id:3}] |
| Ron | Andrew | 3 | [{name:Andrew, reportsTo:null, id:4}] |
| Andrew | null | 4 | [] |
| Asya | Ron | 5 | [{name:Ron, reportsTo:Andrew, id:3}] |
| Dan | Andrew | 6 | [{name:Andrew, reportsTo:null, id:4}] |
+--------+----------+----+-----------------------------------------------+
Each element in the reportingHierarchy array is a struct containing named fields from the lookup index. For the employee named Dev, the traversal starts with reportsTo="Eliot", finds the record for Eliot, and includes it in the reportingHierarchy array.
The following query adds a depthField named level to track the number of levels each manager is from the employee:
source = employees
| graphLookup employees
start=reportsTo
edge=reportsTo-->name
depthField=level
as reportingHierarchy
The query returns the following results:
+--------+----------+----+------------------------------------------------------+
| name | reportsTo| id | reportingHierarchy |
+--------+----------+----+------------------------------------------------------+
| Dev | Eliot | 1 | [{name:Eliot, reportsTo:Ron, id:2, level:0}] |
| Eliot | Ron | 2 | [{name:Ron, reportsTo:Andrew, id:3, level:0}] |
| Ron | Andrew | 3 | [{name:Andrew, reportsTo:null, id:4, level:0}] |
| Andrew | null | 4 | [] |
| Asya | Ron | 5 | [{name:Ron, reportsTo:Andrew, id:3, level:0}] |
| Dan | Andrew | 6 | [{name:Andrew, reportsTo:null, id:4, level:0}] |
+--------+----------+----+------------------------------------------------------+
The level field is added to each struct in the result array. A value of 0 indicates the first level of matches.
The following query limits traversal to two levels using maxDepth=1 (depth 0 and 1):
source = employees
| graphLookup employees
start=reportsTo
edge=reportsTo-->name
maxDepth=1
as reportingHierarchy
The query returns the following results:
+--------+----------+----+---------------------------------------------------------------------------------+
| name | reportsTo| id | reportingHierarchy |
+--------+----------+----+---------------------------------------------------------------------------------+
| Dev | Eliot | 1 | [{name:Eliot, reportsTo:Ron, id:2}, {name:Ron, reportsTo:Andrew, id:3}] |
| Eliot | Ron | 2 | [{name:Ron, reportsTo:Andrew, id:3}, {name:Andrew, reportsTo:null, id:4}] |
| Ron | Andrew | 3 | [{name:Andrew, reportsTo:null, id:4}] |
| Andrew | null | 4 | [] |
| Asya | Ron | 5 | [{name:Ron, reportsTo:Andrew, id:3}, {name:Andrew, reportsTo:null, id:4}] |
| Dan | Andrew | 6 | [{name:Andrew, reportsTo:null, id:4}] |
+--------+----------+----+---------------------------------------------------------------------------------+
Consider an airports index containing the following documents.
| airport | connects |
|---|---|
| JFK | [BOS, ORD] |
| BOS | [JFK, PWM] |
| ORD | [JFK] |
| PWM | [BOS, LHR] |
| LHR | [PWM] |
The following query finds all airports reachable from each airport:
source = airports
| graphLookup airports
start=airport
edge=connects-->airport
as reachableAirports
The query returns the following results:
+---------+------------+-----------------------------------------------+
| airport | connects | reachableAirports |
+---------+------------+-----------------------------------------------+
| JFK | [BOS, ORD] | [{airport:JFK, connects:[BOS, ORD]}] |
| BOS | [JFK, PWM] | [{airport:BOS, connects:[JFK, PWM]}] |
| ORD | [JFK] | [{airport:ORD, connects:[JFK]}] |
| PWM | [BOS, LHR] | [{airport:PWM, connects:[BOS, LHR]}] |
| LHR | [PWM] | [{airport:LHR, connects:[PWM]}] |
+---------+------------+-----------------------------------------------+
The graphLookup command can use different source and lookup indexes.
Consider a travelers index containing the following documents.
| name | nearestAirport |
|---|---|
| Dev | JFK |
| Eliot | JFK |
| Jeff | BOS |
The following query finds reachable airports for each traveler:
source = travelers
| graphLookup airports
start=nearestAirport
edge=connects-->airport
as reachableAirports
The query returns the following results:
+-------+----------------+-----------------------------------------------+
| name | nearestAirport | reachableAirports |
+-------+----------------+-----------------------------------------------+
| Dev | JFK | [{airport:JFK, connects:[BOS, ORD]}] |
| Eliot | JFK | [{airport:JFK, connects:[BOS, ORD]}] |
| Jeff | BOS | [{airport:BOS, connects:[JFK, PWM]}] |
+-------+----------------+-----------------------------------------------+
The following query performs bidirectional traversal to find both managers and colleagues who share the same manager:
source = employees
| where name = 'Ron'
| graphLookup employees
start=reportsTo
edge=reportsTo<->name
as connections
The query returns the following results:
+------+----------+----+-----------------------------------------------------------------------------------------------------+
| name | reportsTo| id | connections |
+------+----------+----+-----------------------------------------------------------------------------------------------------+
| Ron | Andrew | 3 | [{name:Ron, reportsTo:Andrew, id:3}, {name:Andrew, reportsTo:null, id:4}, {name:Dan, reportsTo:Andrew, id:6}] |
+------+----------+----+-----------------------------------------------------------------------------------------------------+
With bidirectional traversal, Ron's connections include the following records:
- His own record (Ron reports to Andrew).
- His manager (Andrew).
- His peer (Dan, who also reports to Andrew).
When batchMode=true, the graphLookup command collects all start values from all source rows and performs a single unified BFS traversal instead of traversing each row separately.
Use batchMode=true when:
- You want to find all nodes reachable from any of the source start values.
- You need a global view of the graph connectivity from multiple starting points.
- You want to avoid duplicate traversals when multiple source rows share overlapping paths.
In batch mode, the output is a single row containing two arrays:
- All source rows collected.
- All lookup results from the unified BFS traversal.
The following query finds all reachable airports from each traveler's nearest airport:
source = travelers
| graphLookup airports
start=nearestAirport
edge=connects-->airport
batchMode=true
maxDepth=2
as reachableAirports
Standard mode (default): Each traveler is assigned a list of reachable airports:
| name | nearestAirport | reachableAirports |
|-------|----------------|--------------------------------------|
| Dev | JFK | [{airport:JFK, connects:[BOS, ORD]}] |
| Jeff | BOS | [{airport:BOS, connects:[JFK, PWM]}] |
Batch mode: All travelers and all reachable airports are combined into a single result:
| travelers | reachableAirports |
|--------------------------------------------------------------------|-------------------------------------------------------------|
| [{name:Dev, nearestAirport:JFK}, {name:Jeff, nearestAirport:BOS}] | [{airport:JFK, connects:[BOS, ORD]}, {airport:BOS, ...}] |
When the fromField or toField contains array values, set supportArray=true to ensure correct traversal behavior.
By default, each level of BFS traversal limits the number of returned documents to the max_result_window setting of the lookup index (typically, 10,000). This avoids the overhead of Point In Time (PIT) search but may return incomplete results when a single traversal level matches more documents than the limit.
When usePIT=true, this limit is removed and the lookup table uses PIT-based pagination, which ensures that all matching documents are retrieved at each traversal level. This provides complete and accurate results at the cost of additional search overhead.
Use usePIT=true when:
- The graph contains high-degree nodes for which a single traversal level may return more than
max_result_windowdocuments. - Result completeness is more important than query performance.
- You observe incomplete or missing results with the default setting.
The following query enables PIT search to ensure complete traversal results:
source = employees
| graphLookup employees
start=reportsTo
edge=reportsTo-->name
usePIT=true
as reportingHierarchy
The filter parameter restricts the documents in the lookup index that are considered during BFS traversal. Only documents matching the filter condition are included as candidates at each traversal level.
The following query traverses only active employees in the reporting hierarchy:
source = employees
| graphLookup employees
start=reportsTo
edge=reportsTo-->name
filter=(status = 'active')
as reportingHierarchy
The filter is applied at the OpenSearch query level, so it combines efficiently with the BFS traversal queries. At each BFS level, the query sent to OpenSearch is bool { filter: [user_filter, bfs_terms_query] }.
When the starting points for graph traversal are known in advance, graphLookup can be used as the first command in a pipeline without source. In this case, start accepts literal values instead of a field reference.
This is useful when:
- You want to explore the graph from specific known nodes
- You don't need source document fields in the output
- You want a quick lookup without creating a source query first
Single start value:
graphLookup employees
start='Eliot'
edge=reportsTo-->name
as reportingHierarchy
The query returns a single row containing the BFS results:
+---------------------------------------------------------------+
| reportingHierarchy |
+---------------------------------------------------------------+
| [{name:Eliot, reportsTo:Ron, id:2}, {name:Ron, ...}, ...] |
+---------------------------------------------------------------+
Multiple start values:
graphLookup employees
start='Eliot', 'Andrew'
edge=reportsTo-->name
as reportingHierarchy
All literal start values are combined into a single BFS traversal. The output is a single row with all discovered nodes.
With depth tracking:
graphLookup employees
start='Eliot'
edge=reportsTo-->name
depthField=level
as reportingHierarchy
Note the following limitations of the graphLookup command:
- The source input, which provides the starting points for traversal, is limited to 100 documents to avoid performance issues.
- When
usePIT=false(default), each traversal level returns up to themax_result_windowof the lookup index, which may result in incomplete results. SetusePIT=trueto retrieve complete results.