Skip to content

Commit b47b7d6

Browse files
ritvibhattasifabashar
authored andcommitted
Update PPL Command Documentation (opensearch-project#4562)
* remove version info Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix defaults Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * update main docs and move aggregate functions Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix format Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix heading underlines Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix typos/content Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix formatting Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * update rex limitations Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * update underlines and bullet points Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * update function files Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * update index.rst with commands table Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * update formatting Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * update wording Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * move syntax Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * update note Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix append docs Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix subbullet formatting Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix subbullet formatting Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix bullet points Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix bin format Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix bullet points Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * update index.rst Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix stats Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> * fix type in regexp_match Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> --------- Signed-off-by: Ritvi Bhatt <ribhatt@amazon.com> Signed-off-by: ritvibhatt <53196324+ritvibhatt@users.noreply.github.com>
1 parent 0263d58 commit b47b7d6

48 files changed

Lines changed: 1457 additions & 2461 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/user/ppl/cmd/ad.rst

Lines changed: 29 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -10,41 +10,43 @@ ad (deprecated by ml command)
1010

1111

1212
Description
13-
============
13+
===========
1414
| The ``ad`` command applies Random Cut Forest (RCF) algorithm in the ml-commons plugin on the search result returned by a PPL command. Based on the input, the command uses two types of RCF algorithms: fixed in time RCF for processing time-series data, batch RCF for processing non-time-series data.
1515
1616

17-
Fixed In Time RCF For Time-series Data Command Syntax
18-
=====================================================
19-
ad <number_of_trees> <shingle_size> <sample_size> <output_after> <time_decay> <anomaly_rate> <time_field> <date_format> <time_zone>
17+
Syntax
18+
======
2019

21-
* number_of_trees(integer): optional. Number of trees in the forest. The default value is 30.
22-
* shingle_size(integer): optional. A shingle is a consecutive sequence of the most recent records. The default value is 8.
23-
* sample_size(integer): optional. The sample size used by stream samplers in this forest. The default value is 256.
24-
* output_after(integer): optional. The number of points required by stream samplers before results are returned. The default value is 32.
25-
* time_decay(double): optional. The decay factor used by stream samplers in this forest. The default value is 0.0001.
26-
* anomaly_rate(double): optional. The anomaly rate. The default value is 0.005.
27-
* time_field(string): mandatory. It specifies the time field for RCF to use as time-series data.
28-
* date_format(string): optional. It's used for formatting time_field field. The default formatting is "yyyy-MM-dd HH:mm:ss".
29-
* time_zone(string): optional. It's used for setting time zone for time_field filed. The default time zone is UTC.
30-
* category_field(string): optional. It specifies the category field used to group inputs. Each category will be independently predicted.
20+
Fixed In Time RCF For Time-series Data
21+
--------------------------------------
22+
ad [number_of_trees] [shingle_size] [sample_size] [output_after] [time_decay] [anomaly_rate] <time_field> [date_format] [time_zone] [category_field]
3123

24+
* number_of_trees: optional. Number of trees in the forest. **Default:** 30.
25+
* shingle_size: optional. A shingle is a consecutive sequence of the most recent records. **Default:** 8.
26+
* sample_size: optional. The sample size used by stream samplers in this forest. **Default:** 256.
27+
* output_after: optional. The number of points required by stream samplers before results are returned. **Default:** 32.
28+
* time_decay: optional. The decay factor used by stream samplers in this forest. **Default:** 0.0001.
29+
* anomaly_rate: optional. The anomaly rate. **Default:** 0.005.
30+
* time_field: mandatory. Specifies the time field for RCF to use as time-series data.
31+
* date_format: optional. Used for formatting time_field. **Default:** "yyyy-MM-dd HH:mm:ss".
32+
* time_zone: optional. Used for setting time zone for time_field. **Default:** "UTC".
33+
* category_field: optional. Specifies the category field used to group inputs. Each category will be independently predicted.
3234

33-
Batch RCF for Non-time-series Data Command Syntax
34-
=================================================
35-
ad <number_of_trees> <sample_size> <output_after> <training_data_size> <anomaly_score_threshold>
35+
Batch RCF For Non-time-series Data
36+
----------------------------------
37+
ad [number_of_trees] [sample_size] [output_after] [training_data_size] [anomaly_score_threshold] [category_field]
3638

37-
* number_of_trees(integer): optional. Number of trees in the forest. The default value is 30.
38-
* sample_size(integer): optional. Number of random samples given to each tree from the training data set. The default value is 256.
39-
* output_after(integer): optional. The number of points required by stream samplers before results are returned. The default value is 32.
40-
* training_data_size(integer): optional. The default value is the size of your training data set.
41-
* anomaly_score_threshold(double): optional. The threshold of anomaly score. The default value is 1.0.
42-
* category_field(string): optional. It specifies the category field used to group inputs. Each category will be independently predicted.
39+
* number_of_trees: optional. Number of trees in the forest. **Default:** 30.
40+
* sample_size: optional. Number of random samples given to each tree from the training data set. **Default:** 256.
41+
* output_after: optional. The number of points required by stream samplers before results are returned. **Default:** 32.
42+
* training_data_size: optional. **Default:** size of your training data set.
43+
* anomaly_score_threshold: optional. The threshold of anomaly score. **Default:** 1.0.
44+
* category_field: optional. Specifies the category field used to group inputs. Each category will be independently predicted.
4345

4446
Example 1: Detecting events in New York City from taxi ridership data with time-series data
4547
===========================================================================================
4648

47-
The example trains an RCF model and uses the model to detect anomalies in the time-series ridership data.
49+
This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data.
4850

4951
PPL query::
5052

@@ -59,7 +61,7 @@ PPL query::
5961
Example 2: Detecting events in New York City from taxi ridership data with time-series data independently with each category
6062
============================================================================================================================
6163

62-
The example trains an RCF model and uses the model to detect anomalies in the time-series ridership data with multiple category values.
64+
This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data with multiple category values.
6365

6466
PPL query::
6567

@@ -76,7 +78,7 @@ PPL query::
7678
Example 3: Detecting events in New York City from taxi ridership data with non-time-series data
7779
===============================================================================================
7880

79-
The example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data.
81+
This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data.
8082

8183
PPL query::
8284

@@ -91,7 +93,7 @@ PPL query::
9193
Example 4: Detecting events in New York City from taxi ridership data with non-time-series data independently with each category
9294
================================================================================================================================
9395

94-
The example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data with multiple category values.
96+
This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data with multiple category values.
9597

9698
PPL query::
9799

@@ -108,4 +110,3 @@ PPL query::
108110
Limitations
109111
===========
110112
The ``ad`` command can only work with ``plugins.calcite.enabled=false``.
111-
It means ``ad`` command cannot work together with new PPL commands/functions introduced in 3.0.0 and above.

docs/user/ppl/cmd/append.rst

Lines changed: 8 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
=========
1+
======
22
append
3-
=========
3+
======
44

55
.. rubric:: Table of contents
66

@@ -10,16 +10,12 @@ append
1010

1111

1212
Description
13-
============
14-
| Using ``append`` command to append the result of a sub-search and attach it as additional rows to the bottom of the input search results (The main search).
15-
The command aligns columns with the same field names and types. For different column fields between the main search and sub-search, NULL values are filled in the respective rows.
16-
17-
Version
18-
=======
19-
3.3.0
13+
===========
14+
| The ``append`` command appends the result of a sub-search and attaches it as additional rows to the bottom of the input search results (The main search).
15+
| The command aligns columns with the same field names and types. For different column fields between the main search and sub-search, NULL values are filled in the respective rows.
2016
2117
Syntax
22-
============
18+
======
2319
append <sub-search>
2420

2521
* sub-search: mandatory. Executes PPL commands as a secondary search.
@@ -30,7 +26,7 @@ Limitations
3026
* **Schema Compatibility**: When fields with the same name exist between the main search and sub-search but have incompatible types, the query will fail with an error. To avoid type conflicts, ensure that fields with the same name have the same data type, or use different field names (e.g., by renaming with ``eval`` or using ``fields`` to select non-conflicting columns).
3127

3228
Example 1: Append rows from a count aggregation to existing search result
33-
===============================================================
29+
=========================================================================
3430

3531
This example appends rows from "count by gender" to "sum by gender, state".
3632

@@ -50,7 +46,7 @@ PPL query::
5046
+----------+--------+-------+------------+
5147

5248
Example 2: Append rows with merged column names
53-
====================================================================================
49+
===============================================
5450

5551
This example appends rows from "sum by gender" to "sum by gender, state" with merged column of same field name and type.
5652

@@ -68,4 +64,3 @@ PPL query::
6864
| 28 | F | null |
6965
| 101 | M | null |
7066
+-----+--------+-------+
71-

docs/user/ppl/cmd/appendcol.rst

Lines changed: 7 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -11,47 +11,15 @@ appendcol
1111

1212
Description
1313
============
14-
| (Experimental)
15-
| (From 3.1.0)
16-
| Using ``appendcol`` command to append the result of a sub-search and attach it alongside with the input search results (The main search).
17-
18-
Version
19-
=======
20-
3.1.0
14+
The ``appendcol`` command appends the result of a sub-search and attaches it alongside with the input search results (The main search).
2115

2216
Syntax
23-
============
17+
======
2418
appendcol [override=<boolean>] <sub-search>
2519

26-
* override=<boolean>: optional. Boolean field to specify should result from main-result be overwritten in the case of column name conflict.
20+
* override=<boolean>: optional. Boolean field to specify should result from main-result be overwritten in the case of column name conflict. **Default:** false.
2721
* sub-search: mandatory. Executes PPL commands as a secondary search. The sub-search uses the same data specified in the source clause of the main search results as its input.
2822

29-
Configuration
30-
=============
31-
This command requires Calcite enabled.
32-
33-
Enable Calcite::
34-
35-
>> curl -H 'Content-Type: application/json' -X PUT localhost:9200/_plugins/_query/settings -d '{
36-
"transient" : {
37-
"plugins.calcite.enabled" : true
38-
}
39-
}'
40-
41-
Result set::
42-
43-
{
44-
"acknowledged": true,
45-
"persistent": {
46-
"plugins": {
47-
"calcite": {
48-
"enabled": "true"
49-
}
50-
}
51-
},
52-
"transient": {}
53-
}
54-
5523
Example 1: Append a count aggregation to existing search result
5624
===============================================================
5725

@@ -103,6 +71,8 @@ PPL query::
10371
Example 3: Append multiple sub-search results
10472
=============================================
10573

74+
This example shows how to chain multiple appendcol commands to add columns from different sub-searches.
75+
10676
PPL query::
10777

10878
PPL> source=employees | fields name, dept, age | appendcol [ stats avg(age) as avg_age ] | appendcol [ stats max(age) as max_age ];
@@ -124,6 +94,8 @@ PPL query::
12494
Example 4: Override case of column name conflict
12595
================================================
12696

97+
This example demonstrates the override option when column names conflict between main search and sub-search.
98+
12799
PPL query::
128100

129101
PPL> source=employees | stats avg(age) as agg by dept | appendcol override=true [ stats max(age) as agg by dept ];

docs/user/ppl/cmd/appendpipe.rst

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,9 @@ appendpipe
1111

1212
Description
1313
============
14-
| Using ``appendpipe`` command to appends the result of the subpipeline to the search results. Unlike a subsearch, the subpipeline is not run first.The subpipeline is run when the search reaches the appendpipe command.
14+
| The ``appendpipe`` command appends the result of the subpipeline to the search results. Unlike a subsearch, the subpipeline is not run first.The subpipeline is run when the search reaches the appendpipe command.
1515
The command aligns columns with the same field names and types. For different column fields between the main search and sub-search, NULL values are filled in the respective rows.
1616

17-
Version
18-
=======
19-
3.3.0
20-
2117
Syntax
2218
============
2319
appendpipe [<subpipeline>]

0 commit comments

Comments
 (0)