Skip to content

Commit 08ff56d

Browse files
Add substring expression function documentation (#12094)
* Add substring expression function documentation Add documentation for four new Data Prepper expression functions: substringAfter, substringBefore, substringAfterLast, and substringBeforeLast. These functions extract portions of a string by delimiter and were added in opensearch-project/data-prepper#6621. Update the functions index page to include the new functions. Resolves: opensearch-project/data-prepper#6612 Signed-off-by: Nikhil Bagmar <nikhilbagmar73@gmail.com> * Doc review Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> --------- Signed-off-by: Nikhil Bagmar <nikhilbagmar73@gmail.com> Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: Fanit Kolchina <kolchfa@amazon.com>
1 parent ab34058 commit 08ff56d

6 files changed

Lines changed: 454 additions & 3 deletions

File tree

_data-prepper/pipelines/functions.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@ OpenSearch Data Prepper offers a range of built-in functions that can be used wi
1717
- [`hasTags()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/has-tags/)
1818
- [`join()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/join/)
1919
- [`length()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/length/)
20-
- [`subList()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/sublist/)
2120
- [`startsWith()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/startswith/)
22-
21+
- [`subList()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/sublist/)
22+
- [`substringAfter()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/substring-after/)
23+
- [`substringAfterLast()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/substring-after-last/)
24+
- [`substringBefore()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/substring-before/)
25+
- [`substringBeforeLast()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/substring-before-last/)

_data-prepper/pipelines/sublist.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ layout: default
33
title: subList()
44
parent: Functions
55
grand_parent: Pipelines
6-
nav_order: 35
6+
nav_order: 50
77
---
88

99
# subList(<key>, <start_index, inclusive>, <end_index, exclusive>)
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
layout: default
3+
title: substringAfterLast()
4+
parent: Functions
5+
grand_parent: Pipelines
6+
nav_order: 70
7+
---
8+
9+
# substringAfterLast()
10+
11+
The `substringAfterLast()` function is used to extract the portion of a string that follows the last occurrence of a specified delimiter. It takes two arguments:
12+
13+
1. The first argument is either a literal string or a JSON pointer that represents the source string.
14+
15+
1. The second argument is the delimiter string to search for within the first argument.
16+
17+
If the delimiter is found, the function returns the portion of the string after the last occurrence of the delimiter. If the delimiter is not found, the original string is returned. If the source resolves to `null`, the function returns `null`. If the delimiter is `null` or empty, the original string is returned.
18+
19+
For example, to extract the file extension from the `/filepath` field containing a file path, use the `substringAfterLast()` function as follows:
20+
21+
```
22+
'substringAfterLast(/filepath, ".")'
23+
```
24+
{% include copy.html %}
25+
26+
If the `/filepath` field contains `archive.tar.gz`, the function returns `gz`.
27+
28+
Alternatively, you can use a literal string as the first argument:
29+
30+
```
31+
'substringAfterLast("one-two-three", "-")'
32+
```
33+
{% include copy.html %}
34+
35+
The function returns `three` because it extracts the portion of the string after the last `-` character.
36+
37+
The `substringAfterLast()` function performs a case-sensitive search.
38+
{: .note}
39+
40+
## Example
41+
42+
The following pipeline uses the `substringAfterLast()` function to extract the file name from a full file path. It adds the extracted file name as a new field called `filename`:
43+
44+
```yaml
45+
substring-after-last-demo:
46+
source:
47+
http:
48+
ssl: false
49+
50+
processor:
51+
- add_entries:
52+
entries:
53+
- key: filename
54+
value_expression: 'substringAfterLast(/filepath, "/")'
55+
56+
sink:
57+
- opensearch:
58+
hosts: ["https://opensearch:9200"]
59+
insecure: true
60+
username: admin
61+
password: admin_password
62+
index_type: custom
63+
index: demo-index-%{yyyy.MM.dd}
64+
```
65+
{% include copy.html %}
66+
67+
You can test the pipeline using the following command:
68+
69+
```bash
70+
curl -sS -X POST "http://localhost:2021/log/ingest" \
71+
-H "Content-Type: application/json" \
72+
-d '[
73+
{"filepath":"/var/log/syslog"},
74+
{"filepath":"/home/user/docs/report.pdf"}
75+
]'
76+
```
77+
{% include copy.html %}
78+
79+
The documents stored in OpenSearch contain the following information:
80+
81+
```json
82+
{
83+
...
84+
"hits": {
85+
"total": {
86+
"value": 2,
87+
"relation": "eq"
88+
},
89+
"max_score": 1,
90+
"hits": [
91+
{
92+
"_index": "demo-index-2026.03.13",
93+
"_id": "abc123",
94+
"_score": 1,
95+
"_source": {
96+
"filepath": "/var/log/syslog",
97+
"filename": "syslog"
98+
}
99+
},
100+
{
101+
"_index": "demo-index-2026.03.13",
102+
"_id": "def456",
103+
"_score": 1,
104+
"_source": {
105+
"filepath": "/home/user/docs/report.pdf",
106+
"filename": "report.pdf"
107+
}
108+
}
109+
]
110+
}
111+
}
112+
```
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
layout: default
3+
title: substringAfter()
4+
parent: Functions
5+
grand_parent: Pipelines
6+
nav_order: 60
7+
---
8+
9+
# substringAfter()
10+
11+
The `substringAfter()` function is used to extract the portion of a string that follows the first occurrence of a specified delimiter. It takes two arguments:
12+
13+
1. The first argument is either a literal string or a JSON pointer that represents the source string.
14+
15+
1. The second argument is the delimiter string to search for within the first argument.
16+
17+
If the delimiter is found, the function returns the portion of the string after the first occurrence of the delimiter. If the delimiter is not found, the original string is returned. If the source resolves to `null`, the function returns `null`. If the delimiter is `null` or empty, the original string is returned.
18+
19+
For example, to extract the value after the first occurrence of the `=` character in a field named `header`, use the `substringAfter()` function as follows:
20+
21+
```
22+
'substringAfter(/header, "=")'
23+
```
24+
{% include copy.html %}
25+
26+
If `/header` contains `Content-Type=application/json`, the function returns `application/json`.
27+
28+
Alternatively, you can use a literal string as the first argument:
29+
30+
```
31+
'substringAfter("hello-world-foo", "-")'
32+
```
33+
{% include copy.html %}
34+
35+
The function returns `world-foo` because it extracts the portion of the string after the first `-` character.
36+
37+
The `substringAfter()` function performs a case-sensitive search.
38+
{: .note}
39+
40+
## Example
41+
42+
The following pipeline uses the `substringAfter()` function to extract the domain name from an email address field. It adds the extracted domain name as a new field called `domain`:
43+
44+
```yaml
45+
substring-after-demo:
46+
source:
47+
http:
48+
ssl: false
49+
50+
processor:
51+
- add_entries:
52+
entries:
53+
- key: domain
54+
value_expression: 'substringAfter(/email, "@")'
55+
56+
sink:
57+
- opensearch:
58+
hosts: ["https://opensearch:9200"]
59+
insecure: true
60+
username: admin
61+
password: admin_password
62+
index_type: custom
63+
index: demo-index-%{yyyy.MM.dd}
64+
```
65+
{% include copy.html %}
66+
67+
You can test the pipeline using the following command:
68+
69+
```bash
70+
curl -sS -X POST "http://localhost:2021/log/ingest" \
71+
-H "Content-Type: application/json" \
72+
-d '[
73+
{"email":"user@example.com"},
74+
{"email":"admin@opensearch.org"}
75+
]'
76+
```
77+
{% include copy.html %}
78+
79+
The documents stored in OpenSearch contain the following information:
80+
81+
```json
82+
{
83+
...
84+
"hits": {
85+
"total": {
86+
"value": 2,
87+
"relation": "eq"
88+
},
89+
"max_score": 1,
90+
"hits": [
91+
{
92+
"_index": "demo-index-2026.03.13",
93+
"_id": "abc123",
94+
"_score": 1,
95+
"_source": {
96+
"email": "user@example.com",
97+
"domain": "example.com"
98+
}
99+
},
100+
{
101+
"_index": "demo-index-2026.03.13",
102+
"_id": "def456",
103+
"_score": 1,
104+
"_source": {
105+
"email": "admin@opensearch.org",
106+
"domain": "opensearch.org"
107+
}
108+
}
109+
]
110+
}
111+
}
112+
```
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
layout: default
3+
title: substringBeforeLast()
4+
parent: Functions
5+
grand_parent: Pipelines
6+
nav_order: 90
7+
---
8+
9+
# substringBeforeLast()
10+
11+
The `substringBeforeLast()` function is used to extract the portion of a string that precedes the last occurrence of a specified delimiter. It takes two arguments:
12+
13+
1. The first argument is either a literal string or a JSON pointer that represents the source string.
14+
15+
1. The second argument is the delimiter string to search for within the first argument.
16+
17+
If the delimiter is found, the function returns the portion of the string before the last occurrence of the delimiter. If the delimiter is not found, the original string is returned. If the source resolves to `null`, the function returns `null`. If the delimiter is `null` or empty, the original string is returned.
18+
19+
For example, to remove the file extension from a filename field, use the `substringBeforeLast()` function as follows:
20+
21+
```
22+
'substringBeforeLast(/filename, ".")'
23+
```
24+
{% include copy.html %}
25+
26+
If the `/filename` field contains `archive.tar.gz`, the function returns `archive.tar`.
27+
28+
Alternatively, you can use a literal string as the first argument:
29+
30+
```
31+
'substringBeforeLast("one-two-three", "-")'
32+
```
33+
{% include copy.html %}
34+
35+
The function returns `one-two` because it extracts the portion of the string before the last `-` character.
36+
37+
The `substringBeforeLast()` function performs a case-sensitive search.
38+
{: .note}
39+
40+
## Example
41+
42+
The following pipeline uses the `substringBeforeLast()` function to extract the directory path from a full file path. It adds the extracted directory path as a new field called `directory`:
43+
44+
```yaml
45+
substring-before-last-demo:
46+
source:
47+
http:
48+
ssl: false
49+
50+
processor:
51+
- add_entries:
52+
entries:
53+
- key: directory
54+
value_expression: 'substringBeforeLast(/filepath, "/")'
55+
56+
sink:
57+
- opensearch:
58+
hosts: ["https://opensearch:9200"]
59+
insecure: true
60+
username: admin
61+
password: admin_password
62+
index_type: custom
63+
index: demo-index-%{yyyy.MM.dd}
64+
```
65+
{% include copy.html %}
66+
67+
You can test the pipeline using the following command:
68+
69+
```bash
70+
curl -sS -X POST "http://localhost:2021/log/ingest" \
71+
-H "Content-Type: application/json" \
72+
-d '[
73+
{"filepath":"/var/log/syslog"},
74+
{"filepath":"/home/user/docs/report.pdf"}
75+
]'
76+
```
77+
{% include copy.html %}
78+
79+
The documents stored in OpenSearch contain the following information:
80+
81+
```json
82+
{
83+
...
84+
"hits": {
85+
"total": {
86+
"value": 2,
87+
"relation": "eq"
88+
},
89+
"max_score": 1,
90+
"hits": [
91+
{
92+
"_index": "demo-index-2026.03.13",
93+
"_id": "abc123",
94+
"_score": 1,
95+
"_source": {
96+
"filepath": "/var/log/syslog",
97+
"directory": "/var/log"
98+
}
99+
},
100+
{
101+
"_index": "demo-index-2026.03.13",
102+
"_id": "def456",
103+
"_score": 1,
104+
"_source": {
105+
"filepath": "/home/user/docs/report.pdf",
106+
"directory": "/home/user/docs"
107+
}
108+
}
109+
]
110+
}
111+
}
112+
```

0 commit comments

Comments
 (0)