Skip to content

Provides a mechanism to get the size of the JSON representation of an event.#6635

Merged
dlvenable merged 2 commits into
opensearch-project:mainfrom
dlvenable:6278-to-json-string-function
Mar 13, 2026
Merged

Provides a mechanism to get the size of the JSON representation of an event.#6635
dlvenable merged 2 commits into
opensearch-project:mainfrom
dlvenable:6278-to-json-string-function

Conversation

@dlvenable

@dlvenable dlvenable commented Mar 12, 2026

Copy link
Copy Markdown
Member

Description

Provides a mechanism to get the size of the JSON representation of an event.

This adds a new toJsonString() function to expressions. Updates length() function to accept a direct string as input, so that it can be composed with toJsonString().

This also removes a possible optimization in add_entries. This had some code to try to infer if a function is static. It was added in #6019 along with changes for using EventKey. I'm removing this because it was failing with toJsonString(). I think it is not a major gain in performance anyway because we already cache the parse results, so evaluation should be fast. If we want this kind of analysis for static expressions it should be done in a robust way in the expression package.

Now you can add the size of an Event using the following:

    - add_entries:
        entries:
          - key: "eventSize"
            value_expression: 'length(toJsonString())'

This is of course the size at a given point in time and does not reflect the addition of the size itself.

Issues Resolved

Resolves #6278.

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

… event.

This adds a new toJsonString() function to expressions. Updates length() function to accept a direct string as input, so that it can be composed with toJsonString().

Resolves opensearch-project#6278.

Signed-off-by: David Venable <dlv@amazon.com>
… the constructor. The approach was brittle and failed for the new toJsonString function.

Signed-off-by: David Venable <dlv@amazon.com>

private boolean containsEventReference(String expression) {
return expression.contains("/") || expression.contains("getMetadata");
if (valueExpr != null && !evaluator.isValidExpressionStatement(valueExpr)) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guessing you tested this change end to end by running data prepper? Just want to make sure we don't need to change the expression grammar to support passing functions to another function

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did test this running Data Prepper. That is how I found this issue in add_entries.

I already changed the grammer to support function composition #6626 and #6628.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't multiple functions for each event very inefficient? should we have eventSize() as a function?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, string representation of a event is probably not same as the actual amount of memory taken by the event.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkondaka ,

Also, string representation of a event is probably not same as the actual amount of memory taken by the event.

In the original GitHub issue #6278, @alamzeeshan requested the ability to get the length of the event as JSON. This is an approximation and appears to be sufficient for their needs.

The amount of memory taken is going to be much more difficult to determine and may be just as inefficient.

Isn't multiple functions for each event very inefficient?

I'm not sure what you mean by this question.

should we have eventSize() as a function?

I think we should avoid this because there is no single definition of event size. Here are some possible options:

I took this approach because it allows more flexibility. We could add a byteLength() method in the future. Or a toIonString() method. Using more primitive functions lets pipeline authors craft what they need.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean doing expression evaluation for length() and for jsonstring() would result in additional parsing overhead for every event. Isn't it better to have one function, like jsonstringlength() because it is more likely used.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that will have much overhead. It will have to add this to the stack.

If it is a problem we should optimize our parsing rather than try to create functions for all possible use-cases. Just as with other languages we should aim for the ability to compose and build from more general purpose operations and functions.

@dlvenable dlvenable merged commit 3c95779 into opensearch-project:main Mar 13, 2026
75 of 76 checks passed
@dlvenable dlvenable deleted the 6278-to-json-string-function branch March 13, 2026 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Provide a way to get the event size

3 participants