Skip to content

Add Collection#bulkUpdate(Map<Key, Collection<SubDocumentUpdate>> updates, ..)#288

Merged
suddendust merged 4 commits intomainfrom
pg_bulkUpdateSubDocs
Apr 14, 2026
Merged

Add Collection#bulkUpdate(Map<Key, Collection<SubDocumentUpdate>> updates, ..)#288
suddendust merged 4 commits intomainfrom
pg_bulkUpdateSubDocs

Conversation

@suddendust
Copy link
Copy Markdown
Contributor

@suddendust suddendust commented Apr 12, 2026

Description

Since the older API Collection#bulkUpdateSubDocs is deprecated, we did not have any new method to update multiple records/docs at with its own set of updates. This change introduces this API. Also, unlike the older, deprecated API that supported only SET, this new API supports all 5 sub-doc operators.

Testing

Added integration tests

Checklist:

  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • Any dependent changes have been merged and published in downstream modules

Documentation

This method does not guarantee any atomicity. It means partial updates are allowed. FlatPostgresCollection#bulkUpdateSubDocs is not atomic as well, as per the contract.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Codecov Report

❌ Patch coverage is 81.81818% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.46%. Comparing base (c445cc6) to head (5b73221).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...documentstore/postgres/FlatPostgresCollection.java 82.89% 8 Missing and 5 partials ⚠️
.../org/hypertrace/core/documentstore/Collection.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #288      +/-   ##
============================================
- Coverage     81.51%   81.46%   -0.05%     
- Complexity     1531     1532       +1     
============================================
  Files           241      241              
  Lines          7334     7387      +53     
  Branches        702      712      +10     
============================================
+ Hits           5978     6018      +40     
- Misses          915      924       +9     
- Partials        441      445       +4     
Flag Coverage Δ
integration 81.46% <81.81%> (-0.05%) ⬇️
unit 55.96% <9.09%> (-0.37%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 12, 2026

Test Results

  121 files  +1    121 suites  +1   35s ⏱️ -1s
  812 tests +2    811 ✅ +2  1 💤 ±0  0 ❌ ±0 
1 149 runs  +2  1 148 ✅ +2  1 💤 ±0  0 ❌ ±0 

Results for commit 5b73221. ± Comparison against base commit c445cc6.

♻️ This comment has been updated with latest results.

}

try {
boolean updated = updateSingleKey(key, keyUpdates, tableName, quotedPkColumn);
Copy link
Copy Markdown

@puneet-traceable puneet-traceable Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this method tries getting a connection from pool per query. would it make sense to fire this on the same connection?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you mean use the same connection for the entire batch? Yes, that's a good optimisation. Will change this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done

* <p>This method supports all update operators (SET, UNSET, ADD, APPEND_TO_LIST,
* ADD_TO_LIST_IF_ABSENT, REMOVE_ALL_FROM_LIST). Updates for each individual key are applied
* atomically, but there is no atomicity guarantee across different keys - some keys may be
* updated while others fail. Any atomicity guarantees are implementation-specific.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any atomicity guarantees are implementation-specific.

This violates the Liskov's substitution principle ('L' in SOLID). This has severe implications, which breaks the purpose of having the document store abstraction in the first place, because the implementors now have to make their client code work differently for different underlying datastore. And, honestly, if people miss to read this code comment, and expect the same client-code to work perfectly fine in the other world (which is highly likely), it can lead to subtle bugs, hard to debug and reproduce.

The older methods (like update) have in fact violated this leading to the introduction of newer consistent methods. Perhaps, the behaviour can be restrictive (to at least the known databases which has the most restrictive atomicity support), but has to be consistent.

Copy link
Copy Markdown
Contributor Author

@suddendust suddendust Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suresh-prakash Thanks for the detailed comment, totally agree with you. I feel batch-level atomicity could be controlled by adding them in the UpdateOptions. Or, this method's contract can say that batch-level atomicity is not guaranteed, while per-key update atomicity has to be guaranteed. This is the least-restrictive contract that almost any implementation can implement. If any implementation wants to provide batch-level atomicity guarantees, that's even better.

Wdyt?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batch-level atomicity is not guaranteed, while per-key update atomicity has to be guaranteed

Yes. This makes total sense.

If any implementation wants to provide batch-level atomicity guarantees, that's even better.

Better definitely in terms of performance. But, that can create functional digression, especially when one of the multiple updates is invalid. In one case (supporting per-key atomicity), documents before that key would have been updated. While, in another case (supporting batch-level atomicity), the entire update would fail.

If we are using document-store, all implementors have to be consistent as much as possible in terms of functionality. Performance would (and can) differ based on the underlying database used, indexes created, etc.

From this standpoint, I think, it would make sense to implement Postgres also perform key-level atomic updates (at least by default). If performance is critical the client can opt-in as you mentioned via. UpdateOptions with the implications explicitly known.

Basically, the design goal here is that the caller/client should be aware of the implications even when they missed to read out these comments and hence prefer cross-database consistency over performance (since the performance differences are evident and inevitable).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I have updated the doc.

Set<Key> updatedKeys = new HashSet<>();

try (Connection connection = client.getPooledConnection()) {
for (Map.Entry<Key, Collection<SubDocumentUpdate>> entry : updates.entrySet()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our conversation @ #288 (comment), seems like this is making the whole operations (batch-level) atomic (having a different variant than the Mongo implementation to come in the future).

Could you please check?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? For batch-level transactions, we would acquire a client.getTransactionalConnection(). Might be wrong here but afai understand it's atomic for each key, not the entire batch.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. I thought you are doing transactions here. You are right.

Set<Key> updatedKeys = new HashSet<>();

try (Connection connection = client.getPooledConnection()) {
for (Map.Entry<Key, Collection<SubDocumentUpdate>> entry : updates.entrySet()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad. I thought you are doing transactions here. You are right.

@suddendust suddendust merged commit 045fac2 into main Apr 14, 2026
7 checks passed
@suddendust suddendust deleted the pg_bulkUpdateSubDocs branch April 14, 2026 08:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants