Add getEstimatedSize API to OutputCodec#5593
Conversation
Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
dlvenable
left a comment
There was a problem hiding this comment.
I left some comments.
At a high-level, I think we can get a better size estimate with the refactoring I suggested because it will account for what is already there.
| * @return int size of the serialized event | ||
| * @throws IOException throws IOException when invalid input is received or not able to create wrapping | ||
| */ | ||
| int getEstimatedSize(Event event) throws IOException; |
| writer.write(getParquetRecord(event)); | ||
| } | ||
|
|
||
| private int getSizeInBytes(GenericRecord record) throws IOException { |
There was a problem hiding this comment.
I'm not quite sure that this will be accurate. But, I'm not sure there is anyway to really do this correctly in Parquet.
| * @return int size of the serialized event | ||
| * @throws IOException throws IOException when invalid input is received or not able to create wrapping | ||
| */ | ||
| int getEstimatedSize(Event event) throws IOException; |
There was a problem hiding this comment.
I think we need the SinkContext here.
| * @return int size of the serialized event | ||
| * @throws IOException throws IOException when invalid input is received or not able to create wrapping | ||
| */ | ||
| int getEstimatedSize(Event event) throws IOException; |
There was a problem hiding this comment.
Do you think we could have this as a default implementation and remove the concrete implementations for most of the sub-classes?
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
BinaryEncoder encoder = EncoderFactory.get().binaryEncoder(outputStream, null);
start(outputStream, event, sinkContext);
write(event);
return outputStream.toByteArray().length;
| * @return int size of the serialized event | ||
| * @throws IOException throws IOException when invalid input is received or not able to create wrapping | ||
| */ | ||
| int getEstimatedSize(Event event) throws IOException; |
There was a problem hiding this comment.
OK. If you are really ambitious, I've thought about a refactoring in the past, that could also make this more accurate.
The OutputCodec is currently designed as a single codec for all OutputStreams. However, it would make more sense for OutputCodec to be more like a factory.
public interface OutputCodec {
public interface BoundToStreamCodec {
void writeEvent(Event event);
void complete();
int getEstimatedSize();
}
codecFor(final OutputStream outputStream, Event event, final OutputCodecContext codecContext); // replaces start();
}
The callers use it like this:
BoundToStreamCodec codec = outputCodec.codecFor(outputStream, event, codecContext); // replaces .start()
...
for(...) {
codec.writeEvent(event); // it already has an OutputStream so need to pass again.
}
...
codec.complete();
Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
* Add getEstimatedSize API to OutputCodec Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> * Fixed build failure Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> * Addressed review comments Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> --------- Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
* Add getEstimatedSize API to OutputCodec Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> * Fixed build failure Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> * Addressed review comments Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> --------- Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
* Add getEstimatedSize API to OutputCodec Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> * Fixed build failure Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> * Addressed review comments Signed-off-by: Krishna Kondaka <krishkdk@amazon.com> --------- Signed-off-by: Krishna Kondaka <krishkdk@amazon.com>
Description
Add getEstimatedSize API to OutputCodec.
Returns exact or slightly more size of an event when passed through the OutputCodec.
Issues Resolved
Resolves #[Issue number to be closed when this PR is merged]
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.