Skip to content

Commit 6dcd025

Browse files
committed
Format PPL cmd/ docs for documentation-website.
Signed-off-by: Kyle Hounslow <kylhouns@amazon.com>
1 parent cbcdbd6 commit 6dcd025

60 files changed

Lines changed: 2213 additions & 1495 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/user/ppl/cmd/ad.md

Lines changed: 32 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,38 @@
1-
# ad (deprecated by ml command)
1+
# ad (deprecated by ml command)
22

3-
## Description
43

5-
The `ad` command applies Random Cut Forest (RCF) algorithm in the ml-commons plugin on the search result returned by a PPL command. Based on the input, the command uses two types of RCF algorithms: fixed-in-time RCF for processing time-series data, batch RCF for processing non-time-series data.
6-
## Syntax
4+
The `ad` command applies Random Cut Forest (RCF) algorithm in the ml-commons plugin on the search results returned by a PPL command. Based on the input, the command uses two types of RCF algorithms: fixed-in-time RCF for processing time-series data, batch RCF for processing non-time-series data.
75

8-
## Fixed In Time RCF For Time-series Data
6+
## Syntax
97

10-
ad [number_of_trees] [shingle_size] [sample_size] [output_after] [time_decay] [anomaly_rate] \<time_field\> [date_format] [time_zone] [category_field]
11-
* number_of_trees: optional. Number of trees in the forest. **Default:** 30.
12-
* shingle_size: optional. A shingle is a consecutive sequence of the most recent records. **Default:** 8.
13-
* sample_size: optional. The sample size used by stream samplers in this forest. **Default:** 256.
14-
* output_after: optional. The number of points required by stream samplers before results are returned. **Default:** 32.
15-
* time_decay: optional. The decay factor used by stream samplers in this forest. **Default:** 0.0001.
16-
* anomaly_rate: optional. The anomaly rate. **Default:** 0.005.
17-
* time_field: mandatory. Specifies the time field for RCF to use as time-series data.
18-
* date_format: optional. Used for formatting time_field. **Default:** "yyyy-MM-dd HH:mm:ss".
19-
* time_zone: optional. Used for setting time zone for time_field. **Default:** "UTC".
20-
* category_field: optional. Specifies the category field used to group inputs. Each category will be independently predicted.
8+
The following sections describe the syntax for each RCF algorithm type.
9+
10+
## Fixed in time RCF for time-series data
11+
12+
`ad [number_of_trees] [shingle_size] [sample_size] [output_after] [time_decay] [anomaly_rate] <time_field> [date_format] [time_zone] [category_field]`
13+
* `number_of_trees`: optional. Number of trees in the forest. **Default:** 30.
14+
* `shingle_size`: optional. A shingle is a consecutive sequence of the most recent records. **Default:** 8.
15+
* `sample_size`: optional. The sample size used by stream samplers in this forest. **Default:** 256.
16+
* `output_after`: optional. The number of points required by stream samplers before results are returned. **Default:** 32.
17+
* `time_decay`: optional. The decay factor used by stream samplers in this forest. **Default:** 0.0001.
18+
* `anomaly_rate`: optional. The anomaly rate. **Default:** 0.005.
19+
* `time_field`: mandatory. Specifies the time field for RCF to use as time-series data.
20+
* `date_format`: optional. Used for formatting time_field. **Default:** "yyyy-MM-dd HH:mm:ss".
21+
* `time_zone`: optional. Used for setting time zone for time_field. **Default:** "UTC".
22+
* `category_field`: optional. Specifies the category field used to group inputs. Each category will be independently predicted.
2123

22-
## Batch RCF For Non-time-series Data
2324

24-
ad [number_of_trees] [sample_size] [output_after] [training_data_size] [anomaly_score_threshold] [category_field]
25-
* number_of_trees: optional. Number of trees in the forest. **Default:** 30.
26-
* sample_size: optional. Number of random samples given to each tree from the training data set. **Default:** 256.
27-
* output_after: optional. The number of points required by stream samplers before results are returned. **Default:** 32.
28-
* training_data_size: optional. **Default:** size of your training data set.
29-
* anomaly_score_threshold: optional. The threshold of anomaly score. **Default:** 1.0.
30-
* category_field: optional. Specifies the category field used to group inputs. Each category will be independently predicted.
25+
## Batch RCF for non-time-series data
26+
27+
`ad [number_of_trees] [sample_size] [output_after] [training_data_size] [anomaly_score_threshold] [category_field]`
28+
* `number_of_trees`: optional. Number of trees in the forest. **Default:** 30.
29+
* `sample_size`: optional. Number of random samples given to each tree from the training dataset. **Default:** 256.
30+
* `output_after`: optional. The number of points required by stream samplers before results are returned. **Default:** 32.
31+
* `training_data_size`: optional. **Default:** size of your training dataset.
32+
* `anomaly_score_threshold`: optional. The threshold of anomaly score. **Default:** 1.0.
33+
* `category_field`: optional. Specifies the category field used to group inputs. Each category will be independently predicted.
3134

35+
3236
## Example 1: Detecting events in New York City from taxi ridership data with time-series data
3337

3438
This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data.
@@ -51,6 +55,7 @@ fetched rows / total rows = 1/1
5155
+---------+---------------------+-------+---------------+
5256
```
5357

58+
5459
## Example 2: Detecting events in New York City from taxi ridership data with time-series data independently with each category
5560

5661
This example trains an RCF model and uses the model to detect anomalies in the time-series ridership data with multiple category values.
@@ -74,6 +79,7 @@ fetched rows / total rows = 2/2
7479
+----------+---------+---------------------+-------+---------------+
7580
```
7681

82+
7783
## Example 3: Detecting events in New York City from taxi ridership data with non-time-series data
7884

7985
This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data.
@@ -96,6 +102,7 @@ fetched rows / total rows = 1/1
96102
+---------+-------+-----------+
97103
```
98104

105+
99106
## Example 4: Detecting events in New York City from taxi ridership data with non-time-series data independently with each category
100107

101108
This example trains an RCF model and uses the model to detect anomalies in the non-time-series ridership data with multiple category values.
@@ -119,6 +126,7 @@ fetched rows / total rows = 2/2
119126
+----------+---------+-------+-----------+
120127
```
121128

129+
122130
## Limitations
123131

124132
The `ad` command can only work with `plugins.calcite.enabled=false`.

docs/user/ppl/cmd/addcoltotals.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,22 @@
1-
# AddColTotals
1+
# addcoltotals
22

33

4-
# Description
54

6-
The `addcoltotals` command computes the sum of each column and add a summary event at the end to show the total of each column. This command works the same way `addtotals` command works with row=false and col=true option. This is useful for creating summary reports with subtotals or grand totals. The `addcoltotals` command only sums numeric fields (integers, floats, doubles). Non-numeric fields in the field list are ignored even if its specified in field-list or in the case of no field-list specified.
5+
The `addcoltotals` command computes the sum of each column and adds a summary event at the end to show the total of each column. This command works the same way `addtotals` command works with row=false and col=true option. This is useful for creating summary reports with subtotals or grand totals. The `addcoltotals` command only sums numeric fields (integers, floats, doubles). Non-numeric fields in the field list are ignored even if its specified in field-list or in the case of no field-list specified.
76

8-
# Syntax
7+
## Syntax
8+
9+
Use the following syntax:
910

1011
`addcoltotals [field-list] [label=<string>] [labelfield=<field>]`
1112

1213
- `field-list`: Optional. Comma-separated list of numeric fields to sum. If not specified, all numeric fields are summed.
1314
- `labelfield=<field>`: Optional. Field name to place the label. If it specifies a non-existing field, adds the field and shows label at the summary event row at this field.
1415
- `label=<string>`: Optional. Custom text for the totals row labelfield\'s label. Default is \"Total\".
1516

16-
# Example 1: Basic Example
17+
# Example 1: Basic example
1718

18-
The example shows placing the label in an existing field.
19+
The following example PPL query shows how to use `addcoltotals` to place the label in an existing field.
1920

2021
```ppl
2122
source=accounts
@@ -38,9 +39,9 @@ fetched rows / total rows = 4/4
3839
+-----------+---------+
3940
```
4041

41-
# Example 2: Adding column totals and adding a summary event with label specified.
42+
# Example 2: Adding column totals and adding a summary event with label specified
4243

43-
The example shows adding totals after a stats command where final summary event label is \'Sum\' and row=true value was used by default when not specified. It also added new field specified by labelfield as it did not match existing field.
44+
The following example PPL query shows how to use `addcoltotals` to add totals after a stats command where final summary event label is \'Sum\' and row=true value was used by default when not specified. It also added new field specified by labelfield as it did not match existing field.
4445

4546
```ppl
4647
source=accounts
@@ -63,7 +64,7 @@ fetched rows / total rows = 3/3
6364

6465
# Example 3: With all options
6566

66-
The example shows using addcoltotals with all options set.
67+
The following example PPL query shows how to use `addcoltotals` with all options set.
6768

6869
```ppl
6970
source=accounts

docs/user/ppl/cmd/addtotals.md

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
1-
# AddTotals
1+
# addtotals
22

33

4-
## Description
54

6-
The `addtotals` command computes the sum of numeric fields and appends a row with the totals to the result. The command can also add row totals and add a field to store row totals. This is useful for creating summary reports with subtotals or grand totals. The `addtotals` command only sums numeric fields (integers, floats, doubles). Non-numeric fields in the field list are ignored even if it\'s specified in field-list or in the case of no field-list specified.
5+
The `addtotals` command computes the sum of numeric fields and appends a row with the totals to the result. The command can also add row totals and add a field to store row totals. This is useful for creating summary reports with subtotals or grand totals. The `addtotals` command only sums numeric fields (integers, floats, doubles). Non-numeric fields in the field list are ignored even if it's specified in field-list or in the case of no field-list specified.
76

87
## Syntax
98

9+
Use the following syntax:
10+
1011
`addtotals [field-list] [label=<string>] [labelfield=<field>] [row=<boolean>] [col=<boolean>] [fieldname=<field>]`
1112

1213
- `field-list`: Optional. Comma-separated list of numeric fields to sum. If not specified, all numeric fields are summed.
@@ -16,9 +17,9 @@ The `addtotals` command computes the sum of numeric fields and appends a row wit
1617
- `label=<string>`: Optional. Custom text for the totals row labelfield\'s label. Default is \"Total\". This is applicable when col=true. This does not have any effect when labelfield and fieldname parameter both have same value.
1718
- `fieldname=<field>`: Optional. Calculates total of each row and add a new field to store this total. This is applicable when row=true.
1819

19-
## Example 1: Basic Example
20+
## Example 1: Basic example
2021

21-
The example shows placing the label in an existing field.
22+
The following example PPL query shows how to use `addtotals` to place the label in an existing field.
2223

2324
```ppl
2425
source=accounts
@@ -41,9 +42,9 @@ fetched rows / total rows = 4/4
4142
+-----------+---------+-------+
4243
```
4344

44-
## Example 2: Adding column totals and adding a summary event with label specified.
45+
## Example 2: Adding column totals and adding a summary event with label specified
4546

46-
The example shows adding totals after a stats command where final summary event label is \'Sum\'. It also added new field specified by labelfield as it did not match existing field.
47+
The following example PPL query shows how to use `addtotals` to add totals after a stats command where final summary event label is \'Sum\'. It also added new field specified by labelfield as it did not match existing field.
4748

4849
```ppl
4950
source=accounts
@@ -66,7 +67,7 @@ fetched rows / total rows = 5/5
6667
+----------------+-----------+---------+-----+-------+
6768
```
6869

69-
if row=true in above example, there will be conflict between column added for column totals and column added for row totals being same field \'Total\', in that case the output will have final event row label null instead of \'Sum\' because the column is number type and it cannot output String in number type column.
70+
if row=true in the preceding example, there will be conflict between column added for column totals and column added for row totals being same field \'Total\', in that case the output will have final event row label null instead of \'Sum\' because the column is number type and it cannot output String in number type column.
7071

7172
```ppl
7273
source=accounts
@@ -91,7 +92,7 @@ fetched rows / total rows = 5/5
9192

9293
## Example 3: With all options
9394

94-
The example shows using addtotals with all options set.
95+
The following example PPL query shows how to use `addtotals` with all options set.
9596

9697
```ppl
9798
source=accounts

docs/user/ppl/cmd/append.md

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,26 @@
1-
# append
1+
# append
22

3-
## Description
43

5-
The `append` command appends the result of a sub-search and attaches it as additional rows to the bottom of the input search results (The main search).
4+
The `append` command appends the result of a sub-search and attaches it as additional rows to the bottom of the input search results (the main search).
5+
66
The command aligns columns with the same field names and types. For different column fields between the main search and sub-search, NULL values are filled in the respective rows.
7-
## Syntax
87

9-
append \<sub-search\>
10-
* sub-search: mandatory. Executes PPL commands as a secondary search.
8+
## Syntax
9+
10+
Use the following syntax:
11+
12+
`append <sub-search>`
13+
* `sub-search`: mandatory. Executes PPL commands as a secondary search.
1114

15+
1216
## Limitations
1317

1418
* **Schema Compatibility**: When fields with the same name exist between the main search and sub-search but have incompatible types, the query will fail with an error. To avoid type conflicts, ensure that fields with the same name have the same data type, or use different field names (e.g., by renaming with `eval` or using `fields` to select non-conflicting columns).
1519

16-
## Example 1: Append rows from a count aggregation to existing search result
1720

18-
This example appends rows from "count by gender" to "sum by gender, state".
21+
## Example 1: Append rows from a count aggregation to existing search results
22+
23+
The following example appends rows from "count by gender" to "sum by gender, state".
1924

2025
```ppl
2126
source=accounts | stats sum(age) by gender, state | sort -`sum(age)` | head 5 | append [ source=accounts | stats count(age) by gender ]
@@ -37,9 +42,10 @@ fetched rows / total rows = 6/6
3742
+----------+--------+-------+------------+
3843
```
3944

40-
## Example 2: Append rows with merged column names
4145

42-
This example appends rows from "sum by gender" to "sum by gender, state" with merged column of same field name and type.
46+
## Example 2: Append rows with merged column names
47+
48+
The following example appends rows from "sum by gender" to "sum by gender, state" with merged column of same field name and type.
4349

4450
```ppl
4551
source=accounts | stats sum(age) as sum by gender, state | sort -sum | head 5 | append [ source=accounts | stats sum(age) as sum by gender ]

docs/user/ppl/cmd/appendcol.md

Lines changed: 16 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,18 @@
1-
# appendcol
1+
# appendcol
22

3-
## Description
43

5-
The `appendcol` command appends the result of a sub-search and attaches it alongside with the input search results (The main search).
6-
## Syntax
4+
The `appendcol` command appends the result of a sub-search and attaches it alongside the input search results (the main search).
75

8-
appendcol [override=\<boolean\>] \<sub-search\>
6+
## Syntax
7+
8+
Use the following syntax:
9+
10+
`appendcol [override=<boolean>] <sub-search>`
911
* override=<boolean>: optional. Boolean field to specify should result from main-result be overwritten in the case of column name conflict. **Default:** false.
10-
* sub-search: mandatory. Executes PPL commands as a secondary search. The sub-search uses the same data specified in the source clause of the main search results as its input.
12+
* `sub-search`: mandatory. Executes PPL commands as a secondary search. The sub-search uses the same data specified in the source clause of the main search results as its input.
1113

12-
## Example 1: Append a count aggregation to existing search result
14+
15+
## Example 1: Append a count aggregation to existing search results
1316

1417
This example appends "count by gender" to "sum by gender, state".
1518

@@ -40,7 +43,8 @@ fetched rows / total rows = 10/10
4043
+--------+-------+----------+------------+
4144
```
4245

43-
## Example 2: Append a count aggregation to existing search result with override option
46+
47+
## Example 2: Append a count aggregation to existing search results with override option
4448

4549
This example appends "count by gender" to "sum by gender, state" with override option.
4650

@@ -71,9 +75,10 @@ fetched rows / total rows = 10/10
7175
+--------+-------+----------+------------+
7276
```
7377

78+
7479
## Example 3: Append multiple sub-search results
7580

76-
This example shows how to chain multiple appendcol commands to add columns from different sub-searches.
81+
The following example PPL query shows how to use `appendcol` to chain multiple appendcol commands to add columns from different sub-searches.
7782

7883
```ppl
7984
source=employees
@@ -101,9 +106,10 @@ fetched rows / total rows = 9/9
101106
+------+-------------+-----+------------------+---------+
102107
```
103108

109+
104110
## Example 4: Override case of column name conflict
105111

106-
This example demonstrates the override option when column names conflict between main search and sub-search.
112+
The following example PPL query demonstrates how to use `appendcol` with the override option when column names conflict between main search and sub-search.
107113

108114
```ppl
109115
source=employees

docs/user/ppl/cmd/appendpipe.md

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,18 @@
1-
# appendpipe
1+
# appendpipe
22

3-
## Description
43

5-
The `appendpipe` command appends the result of the subpipeline to the search results. Unlike a subsearch, the subpipeline is not run first.The subpipeline is run when the search reaches the appendpipe command.
4+
The `appendpipe` command appends the result of the subpipeline to the search results. Unlike a subsearch, the subpipeline is not run first. The subpipeline is run when the search reaches the appendpipe command.
65
The command aligns columns with the same field names and types. For different column fields between the main search and sub-search, NULL values are filled in the respective rows.
7-
## Syntax
86

9-
appendpipe [\<subpipeline\>]
10-
* subpipeline: mandatory. A list of commands that are applied to the search results from the commands that occur in the search before the `appendpipe` command.
7+
## Syntax
8+
9+
Use the following syntax:
10+
11+
`appendpipe [<subpipeline>]`
12+
* `subpipeline`: mandatory. A list of commands that are applied to the search results from the commands that occur in the search before the `appendpipe` command.
1113

12-
## Example 1: Append rows from a total count to existing search result
14+
15+
## Example 1: Append rows from a total count to existing search results
1316

1417
This example appends rows from "total by gender" to "sum by gender, state" with merged column of same field name and type.
1518

@@ -37,6 +40,7 @@ fetched rows / total rows = 6/6
3740
+------+--------+-------+-------+
3841
```
3942

43+
4044
## Example 2: Append rows with merged column names
4145

4246
This example appends rows from "count by gender" to "sum by gender, state".
@@ -65,6 +69,7 @@ fetched rows / total rows = 6/6
6569
+----------+--------+-------+
6670
```
6771

72+
6873
## Limitations
6974

7075
* **Schema Compatibility**: Same as command `append`, when fields with the same name exist between the main search and sub-search but have incompatible types, the query will fail with an error. To avoid type conflicts, ensure that fields with the same name have the same data type, or use different field names (e.g., by renaming with `eval` or using `fields` to select non-conflicting columns).

0 commit comments

Comments
 (0)