Skip to content

Commit 7357dcc

Browse files
ahkcsasifabashar
authored andcommitted
Error handling for dot-containing field names (opensearch-project#4907)
Signed-off-by: Asif Bashar <asif.bashar@gmail.com>
1 parent aee20e9 commit 7357dcc

1 file changed

Lines changed: 132 additions & 0 deletions

File tree

Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
===========
2+
Limitations
3+
===========
4+
5+
.. rubric:: Table of contents
6+
7+
.. contents::
8+
:local:
9+
:depth: 2
10+
11+
Inconsistent Field Types across indices
12+
=======================================
13+
14+
* If the same field has different types across indices (e.g., ``field`` is a ``string`` in one index and an ``integer`` in another), PPL selects a field type from one of the indices—this selection is non-deterministic. Fields with other types are ignored during query execution.
15+
* For ``object`` fields, `PPL merges subfields from different indices to tolerate schema variations <https://github.com/opensearch-project/sql/issues/3625>`_.
16+
17+
Unsupported OpenSearch Field Types
18+
==================================
19+
20+
PPL does not support all `OpenSearch data types <https://docs.opensearch.org/docs/latest/field-types/supported-field-types/index/>`_. (e.g., ``flattened``, some complex ``nested`` usages). Unsupported fields are excluded from ``DESCRIBE`` and ``SOURCE`` outputs. At runtime: Queries referencing unsupported fields fail with semantic or resolution errors. Such fields are ignored in projections unless explicitly filtered out or removed at ingestion.
21+
22+
+---------------------------+---------+
23+
| OpenSearch Data Type | PPL |
24+
+===========================+=========+
25+
| knn_vector | Ignored |
26+
+---------------------------+---------+
27+
| Range field types | Ignored |
28+
+---------------------------+---------+
29+
| Object - flat_object | Ignored |
30+
+---------------------------+---------+
31+
| Object - join | Ignored |
32+
+---------------------------+---------+
33+
| String - Match-only text | Ignored |
34+
+---------------------------+---------+
35+
| String - Wildcard | Ignored |
36+
+---------------------------+---------+
37+
| String - token_count | Ignored |
38+
+---------------------------+---------+
39+
| String - constant_keyword | Ignored |
40+
+---------------------------+---------+
41+
| Autocomplete | Ignored |
42+
+---------------------------+---------+
43+
| Geoshape | Ignored |
44+
+---------------------------+---------+
45+
| Cartesian field types | Ignored |
46+
+---------------------------+---------+
47+
| Rank field types | Ignored |
48+
+---------------------------+---------+
49+
| Star-tree | Ignored |
50+
+---------------------------+---------+
51+
| derived | Ignored |
52+
+---------------------------+---------+
53+
| Percolator | Ignored |
54+
+---------------------------+---------+
55+
56+
Field Parameters
57+
================
58+
59+
For a field to be queryable in PPL, the following index settings must be enabled:
60+
61+
+------------------+--------------------------------------------------+--------------------------------------------------+
62+
| Setting | Description | Required For |
63+
+==================+==================================================+==================================================+
64+
| _source: true | Stores the original JSON document | Required for fetch raw data. |
65+
+------------------+--------------------------------------------------+--------------------------------------------------+
66+
| index: true | Enables field indexing | Required for filtering, search, and aggregations |
67+
+------------------+--------------------------------------------------+--------------------------------------------------+
68+
| doc_values: true | Enables columnar access for aggregations/sorting | Required for `stats`, `sort` |
69+
+------------------+--------------------------------------------------+--------------------------------------------------+
70+
71+
72+
Nested Field Behavior
73+
=====================
74+
75+
* There are `limitations <https://github.com/opensearch-project/sql/issues/52>`_ regarding the nested levels and query types that needs improvement.
76+
77+
Multi-value Field Behavior
78+
==========================
79+
80+
OpenSearch does not natively support the ARRAY data type but does allow multi-value fields implicitly. The
81+
SQL/PPL plugin adheres strictly to the data type semantics defined in index mappings. When parsing OpenSearch
82+
responses, it expects data to match the declared type and does not account for data in array format. If the
83+
plugins.query.field_type_tolerance setting is enabled, the SQL/PPL plugin will handle array datasets by returning
84+
scalar data types, allowing basic queries (e.g., source = tbl | where condition). However, using multi-value
85+
fields in expressions or functions will result in exceptions. If this setting is disabled or absent, only the
86+
first element of an array is returned, preserving the default behavior.
87+
88+
Unsupported Functionalities in Calcite Engine
89+
=============================================
90+
91+
Since 3.0.0, we introduce Apache Calcite as an experimental query engine. Please see `introduce v3 engine <../../../dev/intro-v3-engine.md>`_.
92+
For the following functionalities, the query will be forwarded to the V2 query engine. It means following functionalities cannot work with new PPL commands/functions introduced in 3.0.0 and above.
93+
94+
* All SQL queries
95+
96+
* PPL Queries against non-OpenSearch data sources
97+
98+
* ``dedup`` with ``consecutive=true``
99+
100+
* Search relevant commands
101+
102+
* AD
103+
* ML
104+
* Kmeans
105+
106+
* ``show datasources`` and command
107+
108+
* Commands with ``fetch_size`` parameter
109+
110+
Malformed Field Names in Object Fields
111+
======================================
112+
113+
OpenSearch normally rejects field names containing problematic dot patterns (such as ``.``, ``..``, ``.a``, ``a.``, or ``a..b``). However, when an object field has ``enabled: false``, OpenSearch bypasses field name validation and allows storing documents with any field names.
114+
115+
If a document contains malformed field names inside an object field, PPL ignores those malformed field names. Other valid fields in the document are returned normally.
116+
117+
**Example of affected data:**
118+
119+
.. code-block:: json
120+
121+
{
122+
"log": {
123+
".": "value1",
124+
".a": "value2",
125+
"a.": "value3",
126+
"a..b": "value4"
127+
}
128+
}
129+
130+
When ``log`` is an object field with ``enabled: false``, subfields with malformed names are ignored.
131+
132+
**Recommendation:** Avoid using field names that contain leading dots, trailing dots, consecutive dots, or consist only of dots. This aligns with OpenSearch's default field naming requirements.

0 commit comments

Comments
 (0)