Skip to content

Commit e5cbf6e

Browse files
committed
add generate_uuid() in the functions
Signed-off-by: Xun Zhang <xunzh@amazon.com>
1 parent e381f4b commit e5cbf6e

3 files changed

Lines changed: 75 additions & 2 deletions

File tree

_data-prepper/pipelines/configuration/processors/add-entries.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ You can configure the `add_entries` processor with the following options.
2121
| `metadata_key` | No | The key for the new metadata attribute. The argument must be a literal string key and not a JSON Pointer. Either one string key or `metadata_key` is required. |
2222
| `value` | No | The value of the new entry to be added, which can be used with any of the following data types: strings, Booleans, numbers, null, nested objects, and arrays. |
2323
| `format` | No | A format string to use as the value of the new entry, for example, `${key1}-${key2}`, where `key1` and `key2` are existing keys in the event. Required if neither `value` nor `value_expression` is specified. |
24-
| `value_expression` | No | An expression string to use as the value of the new entry. For example, `/key` is an existing key in the event with a type of either a number, a string, or a Boolean. Expressions can also contain functions returning number/string/integer. For example, `length(/key)` will return the length of the key in the event when the key is a string. For more information about keys, see [Expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). |
24+
| `value_expression` | No | An expression string to use as the value of the new entry. For example, `/key` is an existing key in the event with a type of either a number, a string, or a Boolean. Expressions can also contain functions returning number/string/integer. For example, `length(/key)` will return the length of the key in the event when the key is a string. For more information about keys, see [Expression syntax]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/). For more information about functions, see [Functions]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/functions/). |
2525
| `add_when` | No | A [conditional expression]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/expression-syntax/), such as `/some-key == "test"'`, that will be evaluated to determine whether the processor will be run on the event. |
2626
| `overwrite_if_key_exists` | No | When set to `true`, the existing value is overwritten if `key` already exists in the event. The default value is `false`. |
2727
| `append_if_key_exists` | No | When set to `true`, the existing value will be appended if a `key` already exists in the event. An array will be created if the existing value is not an array. Default is `false`. |

_data-prepper/pipelines/functions.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,11 @@ has_children: true
88

99
# Functions
1010

11-
OpenSearch Data Prepper offers a range of built-in functions that can be used within expressions to perform common data preprocessing tasks, such as calculating lengths, checking for tags, retrieving metadata, searching for substrings, checking IP address ranges, and joining list elements. These functions include the following:
11+
OpenSearch Data Prepper offers a range of built-in functions that can be used within expressions to perform common data preprocessing tasks, such as calculating lengths, checking for tags, retrieving metadata, searching for substrings, checking IP address ranges, joining list elements, and generating unique identifiers. These functions include the following:
1212

1313
- [`cidrContains()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/cidrcontains/)
1414
- [`contains()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/contains/)
15+
- [`generate_uuid()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/generate-uuid/)
1516
- [`getMetadata()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/get-metadata/)
1617
- [`getEventType()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/get-eventtype/)
1718
- [`hasTags()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/has-tags/)
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
---
2+
layout: default
3+
title: generate_uuid()
4+
parent: Functions
5+
grand_parent: Pipelines
6+
nav_order: 11
7+
---
8+
9+
# generate_uuid()
10+
11+
The `generate_uuid()` function takes no arguments and returns a randomly generated [UUID version 4](https://www.rfc-editor.org/rfc/rfc4122) string, for example, `"550e8400-e29b-41d4-a716-446655440000"`. Each call produces a unique value using a cryptographically strong random number generator, so collision probability is negligible in practice.
12+
13+
This function is useful when source records do not contain a natural unique identifier---for example, when running asynchronous batch inference jobs that require a stable key to match inference results back to the original records.
14+
15+
## Usage
16+
17+
Use `generate_uuid()` as a `value_expression` in the [`add_entries`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/processors/add-entries/) processor:
18+
19+
```yaml
20+
processor:
21+
- add_entries:
22+
entries:
23+
- key: recordId
24+
value_expression: 'generate_uuid()'
25+
```
26+
{% include copy.html %}
27+
28+
This adds a `recordId` field containing a unique UUID to every event passing through the processor.
29+
30+
## Example
31+
32+
The following pipeline assigns a unique `recordId` to each incoming log record before forwarding it to OpenSearch:
33+
34+
```yaml
35+
uuid-demo-pipeline:
36+
source:
37+
http:
38+
ssl: false
39+
40+
processor:
41+
- add_entries:
42+
entries:
43+
- key: recordId
44+
value_expression: 'generate_uuid()'
45+
46+
sink:
47+
- opensearch:
48+
hosts: ["https://opensearch:9200"]
49+
insecure: true
50+
username: admin
51+
password: admin_password
52+
index_type: custom
53+
index: demo-index-%{yyyy.MM.dd}
54+
```
55+
{% include copy.html %}
56+
57+
Given the following input event:
58+
59+
```json
60+
{ "message": "user login", "user": "alice" }
61+
```
62+
{% include copy.html %}
63+
64+
The document stored in OpenSearch contains the following information:
65+
66+
```json
67+
{
68+
"message": "user login",
69+
"user": "alice",
70+
"recordId": "550e8400-e29b-41d4-a716-446655440000"
71+
}
72+
```

0 commit comments

Comments
 (0)