Provides a mechanism to get the size of the JSON representation of an event.#6635
Conversation
… event. This adds a new toJsonString() function to expressions. Updates length() function to accept a direct string as input, so that it can be composed with toJsonString(). Resolves opensearch-project#6278. Signed-off-by: David Venable <dlv@amazon.com>
… the constructor. The approach was brittle and failed for the new toJsonString function. Signed-off-by: David Venable <dlv@amazon.com>
|
|
||
| private boolean containsEventReference(String expression) { | ||
| return expression.contains("/") || expression.contains("getMetadata"); | ||
| if (valueExpr != null && !evaluator.isValidExpressionStatement(valueExpr)) { |
There was a problem hiding this comment.
Guessing you tested this change end to end by running data prepper? Just want to make sure we don't need to change the expression grammar to support passing functions to another function
There was a problem hiding this comment.
Isn't multiple functions for each event very inefficient? should we have eventSize() as a function?
There was a problem hiding this comment.
Also, string representation of a event is probably not same as the actual amount of memory taken by the event.
There was a problem hiding this comment.
Also, string representation of a event is probably not same as the actual amount of memory taken by the event.
In the original GitHub issue #6278, @alamzeeshan requested the ability to get the length of the event as JSON. This is an approximation and appears to be sufficient for their needs.
The amount of memory taken is going to be much more difficult to determine and may be just as inefficient.
Isn't multiple functions for each event very inefficient?
I'm not sure what you mean by this question.
should we have eventSize() as a function?
I think we should avoid this because there is no single definition of event size. Here are some possible options:
- The size of the event in Java memory
- The character length of the JSON representation. See Provide a way to get the event size #6278 (comment)
- The byte length of the JSON representation.
- There may be other representations.
I took this approach because it allows more flexibility. We could add a byteLength() method in the future. Or a toIonString() method. Using more primitive functions lets pipeline authors craft what they need.
There was a problem hiding this comment.
I mean doing expression evaluation for length() and for jsonstring() would result in additional parsing overhead for every event. Isn't it better to have one function, like jsonstringlength() because it is more likely used.
There was a problem hiding this comment.
I don't think that will have much overhead. It will have to add this to the stack.
If it is a problem we should optimize our parsing rather than try to create functions for all possible use-cases. Just as with other languages we should aim for the ability to compose and build from more general purpose operations and functions.
Description
Provides a mechanism to get the size of the JSON representation of an event.
This adds a new
toJsonString()function to expressions. Updateslength()function to accept a direct string as input, so that it can be composed withtoJsonString().This also removes a possible optimization in
add_entries. This had some code to try to infer if a function is static. It was added in #6019 along with changes for using EventKey. I'm removing this because it was failing withtoJsonString(). I think it is not a major gain in performance anyway because we already cache the parse results, so evaluation should be fast. If we want this kind of analysis for static expressions it should be done in a robust way in the expression package.Now you can add the size of an Event using the following:
This is of course the size at a given point in time and does not reflect the addition of the size itself.
Issues Resolved
Resolves #6278.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.