Skip to content

HDDS-14004. EventNotification: capture data in the CompletedRequestInfo ledger#9364

Merged
ChenSammi merged 1 commit intoapache:HDDS-13513_Event_Notification_FeatureBranchfrom
gardenia:HDDS-14004
Apr 9, 2026
Merged

HDDS-14004. EventNotification: capture data in the CompletedRequestInfo ledger#9364
ChenSammi merged 1 commit intoapache:HDDS-13513_Event_Notification_FeatureBranchfrom
gardenia:HDDS-14004

Conversation

@gardenia
Copy link
Copy Markdown

Please describe your PR in detail:

In OzoneManagerStateMachine when certain write requests complete successfully a summary of that request will be written to a new rocksdb "ledger" table named CompletedRequestInfo.

The intent is that a consumer can use that data to generate event notifications about changes on the filesystem.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14004

How was this patch tested?

unit tests

@github-actions
Copy link
Copy Markdown

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

@github-actions github-actions Bot added the stale label Dec 23, 2025
@ivandika3 ivandika3 removed the stale label Dec 23, 2025
Copy link
Copy Markdown
Contributor

@swamirishi swamirishi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the patch @gardenia have left some comments inline

// request info "ledger" table is consistent with the processed
// raits events? e.g. OzoneManagerDoubleBuffer?

try (BatchOperation batchOperation = omMetadataManager.getStore()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need a rocksdb batch. Flush should happen as part of double buffer flush
Look at this change https://github.com/apache/ozone/pull/8779/files#r2641773095

Copy link
Copy Markdown
Author

@gardenia gardenia Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swamirishi I have implemented a rough approach for plumbing this into double buffer flush as you suggested but I am not happy with the way I have plumbed it in and feel it needs some community feedback.

I would be happy to hear your feedback or can discuss at the next community call.

As I commented inline, I wonder if it is better to add the populating of the ledger rows into the response implementation of each request type we want to capture (e.g. OMVolumeCreateResponse::addToBatch, OMKeyCommitResponse::addToBatch, etc) which would then dispense with the need for the class I have at the moment which is a single place where the ledger rows are created.

cc: @errose28

@github-actions
Copy link
Copy Markdown

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

// (i.e. no need for any new code in this class). This
// seemed to be a bit messier to me in terms of code duplication
//
if (response instanceof HasCompletedRequestInfo) {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenSammi / @errose28 I would be grateful for your feedback on this approach to capturing the data for the new entity class by adding it to the transaction batch here OzoneManagerDoubleBuffer.

I discussed this briefly with @errose28 on the last community call and have made some revisions informed by that discussion although it is not exactly what we discussed. However, I the comment above explains my thought process (i.e. that it could have been added to the addToDBBatch method of each relevant response implementation rather than have any code footprint in OzoneManagerDoubleBuffer but that approach was messier than I would have liked). I'd be happy to hear alternate takes

NOTE: I'm aware this is lacking unit tests but I'd like to come to some consensus on where the code is going to live first

Copy link
Copy Markdown
Contributor

@ChenSammi ChenSammi Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gardenia , I think you can just add the following in each involved request's OMClientResponse#addToDBBatch(), addToDBBatch() is well DB is updated for each request, and OzoneManagerDoubleBuffer is reponsible to flush the batch to DB, so add the CompletedRequestInfoTable update in addToDBBatch should be fine.

omMetadataManager.getCompletedRequestInfoTable().putWithBatch(
              batchOperation, completedRequestInfo.getTrxLogIndex(), completedRequestInfo);

For example, OMFileCreateResponse,

  public void addToDBBatch(OMMetadataManager omMetadataManager,
                           BatchOperation batchOperation) throws IOException {
    super.addToDBBatch(omMetadataManager, batchOperation);
    omMetadataManager.getCompletedRequestInfoTable().putWithBatch(
        batchOperation, getTrxLogIndex(), getCompletedRequestInfoArgs());
  }

updateID in each object is the TrxLogIndex.

@ChenSammi
Copy link
Copy Markdown
Contributor

Hi @gardenia , is this the second patch, and ready for review?

@gardenia
Copy link
Copy Markdown
Author

gardenia commented Mar 4, 2026

Hi @gardenia , is this the second patch, and ready for review?

@ChenSammi Yes. Thank you. The unit testing is not comprehensive but I would appreciate some feedback on the general approach (in particular of having the HasCompletedRequestInfo interface which response types we want to capture have to implement)

@ChenSammi
Copy link
Copy Markdown
Contributor

Hi @gardenia , is this the second patch, and ready for review?

@ChenSammi Yes. Thank you. The unit testing is not comprehensive but I would appreciate some feedback on the general approach (in particular of having the HasCompletedRequestInfo interface which response types we want to capture have to implement)

@gardenia , I think you can just add the following in each involved request's OMClientResponse#addToDBBatch(), addToDBBatch() is well DB is updated for each request, and OzoneManagerDoubleBuffer is reponsible to flush the batch to DB, so add the CompletedRequestInfoTable update in addToDBBatch should be fine.

omMetadataManager.getCompletedRequestInfoTable().putWithBatch(
batchOperation, completedRequestInfo.getTrxLogIndex(), completedRequestInfo);

For example, OMFileCreateResponse,

public void addToDBBatch(OMMetadataManager omMetadataManager,
BatchOperation batchOperation) throws IOException {
super.addToDBBatch(omMetadataManager, batchOperation);
omMetadataManager.getCompletedRequestInfoTable().putWithBatch(
batchOperation, getTrxLogIndex(), getCompletedRequestInfoArgs());
}

updateID in each object is the TrxLogIndex.

@gardenia
Copy link
Copy Markdown
Author

Hi @gardenia , is this the second patch, and ready for review?

@ChenSammi Yes. Thank you. The unit testing is not comprehensive but I would appreciate some feedback on the general approach (in particular of having the HasCompletedRequestInfo interface which response types we want to capture have to implement)

@gardenia , I think you can just add the following in each involved request's OMClientResponse#addToDBBatch(), addToDBBatch() is well DB is updated for each request, and OzoneManagerDoubleBuffer is reponsible to flush the batch to DB, so add the CompletedRequestInfoTable update in addToDBBatch should be fine.

omMetadataManager.getCompletedRequestInfoTable().putWithBatch( batchOperation, completedRequestInfo.getTrxLogIndex(), completedRequestInfo);

For example, OMFileCreateResponse,

public void addToDBBatch(OMMetadataManager omMetadataManager, BatchOperation batchOperation) throws IOException { super.addToDBBatch(omMetadataManager, batchOperation); omMetadataManager.getCompletedRequestInfoTable().putWithBatch( batchOperation, getTrxLogIndex(), getCompletedRequestInfoArgs()); }

updateID in each object is the TrxLogIndex.

@ChenSammi I pushed a revised version using this approach.

}

@Override
public OmCompletedRequestInfo.OperationArgs getCompletedRequestInfoArgs() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the getCompletedRequestInfoArgs in OMKeyCreateResponse, and remove the getCompletedRequestInfoArgs here, they are the same.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gardenia for updating the patch. It overall looks good to me. Just above minor issue.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can keep the getCompletedRequestInfoArgs in OMKeyCreateResponse, and remove the getCompletedRequestInfoArgs here, they are the same.

@ChenSammi thanks for this. I'm not sure you are correct on this point.

OMKeyCreateResponseWithFSO has a inheritance hierarchy of:

  • OMFileCreateResponseWithFSO
  • OMFileCreateResponse
  • OMKeyCreateResponse

Therefore it inherits the implementation of getCompletedRequestInfoArgs from OMFileCreateResponse (rather than OMKeyCreateResponse) and so to get a "NoArgs" implementation it has to explicitly override it as such.

If you disagree then of course let me know. Thx.

Copy link
Copy Markdown
Contributor

@ChenSammi ChenSammi Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gardenia, you are right. Could you check the issues reported by CI?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChenSammi I believe the static analysis issues should be resolved now

@ChenSammi ChenSammi marked this pull request as ready for review March 25, 2026 02:44
@ChenSammi
Copy link
Copy Markdown
Contributor

@gardenia , Can you check the failed TestOzoneManagerDoubleBufferWithOMResponse ? I rerun this part of CI but it still failed.

@gardenia
Copy link
Copy Markdown
Author

gardenia commented Apr 8, 2026

@gardenia , Can you check the failed TestOzoneManagerDoubleBufferWithOMResponse ? I rerun this part of CI but it still failed.

@ChenSammi apologies I missed that but I believe it is addressed now.

The issue was that OMBucketDeleteResponse was trying to use omVolumeArgs.getUpdateID() as a source of the updateID for the txn but this was not correct. To fix this it was necessary to add updateID explicitly as a parameter to OMBucketDeleteResponse.

The unit tests are all passing now. I am seeing a flaky integration test but I am not sure if this was run subsequent to my latest changes or not.

@ChenSammi
Copy link
Copy Markdown
Contributor

Thanks @gardenia, the last patch LGTM, +1.

@ChenSammi ChenSammi merged commit a62fa97 into apache:HDDS-13513_Event_Notification_FeatureBranch Apr 9, 2026
44 checks passed
@gardenia
Copy link
Copy Markdown
Author

gardenia commented Apr 9, 2026

Thanks @gardenia, the last patch LGTM, +1.

Thanks @ChenSammi . The next 2 reviews are available for review when you get a chance:

#9365
#9366

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants