Skip to content

Commit 80cd183

Browse files
committed
Add substring expression function documentation
Add documentation for four new Data Prepper expression functions: substringAfter, substringBefore, substringAfterLast, and substringBeforeLast. These functions extract portions of a string by delimiter and were added in opensearch-project/data-prepper#6621. Update the functions index page to include the new functions. Resolves: opensearch-project/data-prepper#6612 Signed-off-by: Nikhil Bagmar <nikhilbagmar73@gmail.com>
1 parent d9db2b6 commit 80cd183

5 files changed

Lines changed: 453 additions & 2 deletions

File tree

_data-prepper/pipelines/functions.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,9 @@ OpenSearch Data Prepper offers a range of built-in functions that can be used wi
1717
- [`hasTags()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/has-tags/)
1818
- [`join()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/join/)
1919
- [`length()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/length/)
20-
- [`subList()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/sublist/)
2120
- [`startsWith()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/startswith/)
22-
21+
- [`subList()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/sublist/)
22+
- [`substringAfter()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/substring-after/)
23+
- [`substringAfterLast()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/substring-after-last/)
24+
- [`substringBefore()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/substring-before/)
25+
- [`substringBeforeLast()`]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/substring-before-last/)
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
layout: default
3+
title: substringAfterLast()
4+
parent: Functions
5+
grand_parent: Pipelines
6+
nav_order: 55
7+
---
8+
9+
# substringAfterLast()
10+
11+
The `substringAfterLast()` function is used to extract the portion of a string that follows the last occurrence of a specified delimiter. It takes two arguments:
12+
13+
- The first argument is either a literal string or a JSON pointer that represents the source string.
14+
15+
- The second argument is the delimiter string to search for within the first argument.
16+
17+
If the delimiter is found, the function returns everything after the last occurrence of the delimiter. If the delimiter is not found, the original string is returned. If the source resolves to `null`, the function returns `null`. If the delimiter is `null` or empty, the original string is returned.
18+
19+
For example, if you want to extract the file extension from a path field, you can use the `substringAfterLast()` function as follows:
20+
21+
```
22+
'substringAfterLast(/filepath, ".")'
23+
```
24+
{% include copy.html %}
25+
26+
If `/filepath` contains `archive.tar.gz`, this returns `gz`.
27+
28+
Alternatively, you can use a literal string as the first argument:
29+
30+
```
31+
'substringAfterLast("one-two-three", "-")'
32+
```
33+
{% include copy.html %}
34+
35+
This returns `three` because it extracts everything after the last `-`.
36+
37+
The `substringAfterLast()` function performs a case-sensitive search.
38+
{: .note}
39+
40+
## Example
41+
42+
The following pipeline uses the `substringAfterLast()` function to extract the file name from a full file path and adds it as a new field called `filename`:
43+
44+
```yaml
45+
substring-after-last-demo:
46+
source:
47+
http:
48+
ssl: false
49+
50+
processor:
51+
- add_entries:
52+
entries:
53+
- key: filename
54+
value_expression: 'substringAfterLast(/filepath, "/")'
55+
56+
sink:
57+
- opensearch:
58+
hosts: ["https://opensearch:9200"]
59+
insecure: true
60+
username: admin
61+
password: admin_password
62+
index_type: custom
63+
index: demo-index-%{yyyy.MM.dd}
64+
```
65+
{% include copy.html %}
66+
67+
You can test the pipeline using the following command:
68+
69+
```bash
70+
curl -sS -X POST "http://localhost:2021/log/ingest" \
71+
-H "Content-Type: application/json" \
72+
-d '[
73+
{"filepath":"/var/log/syslog"},
74+
{"filepath":"/home/user/docs/report.pdf"}
75+
]'
76+
```
77+
{% include copy.html %}
78+
79+
The documents stored in OpenSearch contain the following information:
80+
81+
```json
82+
{
83+
...
84+
"hits": {
85+
"total": {
86+
"value": 2,
87+
"relation": "eq"
88+
},
89+
"max_score": 1,
90+
"hits": [
91+
{
92+
"_index": "demo-index-2026.03.13",
93+
"_id": "abc123",
94+
"_score": 1,
95+
"_source": {
96+
"filepath": "/var/log/syslog",
97+
"filename": "syslog"
98+
}
99+
},
100+
{
101+
"_index": "demo-index-2026.03.13",
102+
"_id": "def456",
103+
"_score": 1,
104+
"_source": {
105+
"filepath": "/home/user/docs/report.pdf",
106+
"filename": "report.pdf"
107+
}
108+
}
109+
]
110+
}
111+
}
112+
```
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
layout: default
3+
title: substringAfter()
4+
parent: Functions
5+
grand_parent: Pipelines
6+
nav_order: 45
7+
---
8+
9+
# substringAfter()
10+
11+
The `substringAfter()` function is used to extract the portion of a string that follows the first occurrence of a specified delimiter. It takes two arguments:
12+
13+
- The first argument is either a literal string or a JSON pointer that represents the source string.
14+
15+
- The second argument is the delimiter string to search for within the first argument.
16+
17+
If the delimiter is found, the function returns everything after the first occurrence of the delimiter. If the delimiter is not found, the original string is returned. If the source resolves to `null`, the function returns `null`. If the delimiter is `null` or empty, the original string is returned.
18+
19+
For example, if you want to extract the value after the first `=` in a field named `header`, you can use the `substringAfter()` function as follows:
20+
21+
```
22+
'substringAfter(/header, "=")'
23+
```
24+
{% include copy.html %}
25+
26+
If `/header` contains `Content-Type=application/json`, this returns `application/json`.
27+
28+
Alternatively, you can use a literal string as the first argument:
29+
30+
```
31+
'substringAfter("hello-world-foo", "-")'
32+
```
33+
{% include copy.html %}
34+
35+
This returns `world-foo` because it extracts everything after the first `-`.
36+
37+
The `substringAfter()` function performs a case-sensitive search.
38+
{: .note}
39+
40+
## Example
41+
42+
The following pipeline uses the `substringAfter()` function to extract the domain from an email address field and adds it as a new field called `domain`:
43+
44+
```yaml
45+
substring-after-demo:
46+
source:
47+
http:
48+
ssl: false
49+
50+
processor:
51+
- add_entries:
52+
entries:
53+
- key: domain
54+
value_expression: 'substringAfter(/email, "@")'
55+
56+
sink:
57+
- opensearch:
58+
hosts: ["https://opensearch:9200"]
59+
insecure: true
60+
username: admin
61+
password: admin_password
62+
index_type: custom
63+
index: demo-index-%{yyyy.MM.dd}
64+
```
65+
{% include copy.html %}
66+
67+
You can test the pipeline using the following command:
68+
69+
```bash
70+
curl -sS -X POST "http://localhost:2021/log/ingest" \
71+
-H "Content-Type: application/json" \
72+
-d '[
73+
{"email":"user@example.com"},
74+
{"email":"admin@opensearch.org"}
75+
]'
76+
```
77+
{% include copy.html %}
78+
79+
The documents stored in OpenSearch contain the following information:
80+
81+
```json
82+
{
83+
...
84+
"hits": {
85+
"total": {
86+
"value": 2,
87+
"relation": "eq"
88+
},
89+
"max_score": 1,
90+
"hits": [
91+
{
92+
"_index": "demo-index-2026.03.13",
93+
"_id": "abc123",
94+
"_score": 1,
95+
"_source": {
96+
"email": "user@example.com",
97+
"domain": "example.com"
98+
}
99+
},
100+
{
101+
"_index": "demo-index-2026.03.13",
102+
"_id": "def456",
103+
"_score": 1,
104+
"_source": {
105+
"email": "admin@opensearch.org",
106+
"domain": "opensearch.org"
107+
}
108+
}
109+
]
110+
}
111+
}
112+
```
Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
layout: default
3+
title: substringBeforeLast()
4+
parent: Functions
5+
grand_parent: Pipelines
6+
nav_order: 60
7+
---
8+
9+
# substringBeforeLast()
10+
11+
The `substringBeforeLast()` function is used to extract the portion of a string that precedes the last occurrence of a specified delimiter. It takes two arguments:
12+
13+
- The first argument is either a literal string or a JSON pointer that represents the source string.
14+
15+
- The second argument is the delimiter string to search for within the first argument.
16+
17+
If the delimiter is found, the function returns everything before the last occurrence of the delimiter. If the delimiter is not found, the original string is returned. If the source resolves to `null`, the function returns `null`. If the delimiter is `null` or empty, the original string is returned.
18+
19+
For example, if you want to strip the file extension from a filename field, you can use the `substringBeforeLast()` function as follows:
20+
21+
```
22+
'substringBeforeLast(/filename, ".")'
23+
```
24+
{% include copy.html %}
25+
26+
If `/filename` contains `archive.tar.gz`, this returns `archive.tar`.
27+
28+
Alternatively, you can use a literal string as the first argument:
29+
30+
```
31+
'substringBeforeLast("one-two-three", "-")'
32+
```
33+
{% include copy.html %}
34+
35+
This returns `one-two` because it extracts everything before the last `-`.
36+
37+
The `substringBeforeLast()` function performs a case-sensitive search.
38+
{: .note}
39+
40+
## Example
41+
42+
The following pipeline uses the `substringBeforeLast()` function to extract the directory path from a full file path and adds it as a new field called `directory`:
43+
44+
```yaml
45+
substring-before-last-demo:
46+
source:
47+
http:
48+
ssl: false
49+
50+
processor:
51+
- add_entries:
52+
entries:
53+
- key: directory
54+
value_expression: 'substringBeforeLast(/filepath, "/")'
55+
56+
sink:
57+
- opensearch:
58+
hosts: ["https://opensearch:9200"]
59+
insecure: true
60+
username: admin
61+
password: admin_password
62+
index_type: custom
63+
index: demo-index-%{yyyy.MM.dd}
64+
```
65+
{% include copy.html %}
66+
67+
You can test the pipeline using the following command:
68+
69+
```bash
70+
curl -sS -X POST "http://localhost:2021/log/ingest" \
71+
-H "Content-Type: application/json" \
72+
-d '[
73+
{"filepath":"/var/log/syslog"},
74+
{"filepath":"/home/user/docs/report.pdf"}
75+
]'
76+
```
77+
{% include copy.html %}
78+
79+
The documents stored in OpenSearch contain the following information:
80+
81+
```json
82+
{
83+
...
84+
"hits": {
85+
"total": {
86+
"value": 2,
87+
"relation": "eq"
88+
},
89+
"max_score": 1,
90+
"hits": [
91+
{
92+
"_index": "demo-index-2026.03.13",
93+
"_id": "abc123",
94+
"_score": 1,
95+
"_source": {
96+
"filepath": "/var/log/syslog",
97+
"directory": "/var/log"
98+
}
99+
},
100+
{
101+
"_index": "demo-index-2026.03.13",
102+
"_id": "def456",
103+
"_score": 1,
104+
"_source": {
105+
"filepath": "/home/user/docs/report.pdf",
106+
"directory": "/home/user/docs"
107+
}
108+
}
109+
]
110+
}
111+
}
112+
```

0 commit comments

Comments
 (0)