Add Collection#bulkUpdate(Map<Key, Collection<SubDocumentUpdate>> updates, ..)#288
Add Collection#bulkUpdate(Map<Key, Collection<SubDocumentUpdate>> updates, ..)#288suddendust merged 4 commits intomainfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #288 +/- ##
============================================
- Coverage 81.51% 81.46% -0.05%
- Complexity 1531 1532 +1
============================================
Files 241 241
Lines 7334 7387 +53
Branches 702 712 +10
============================================
+ Hits 5978 6018 +40
- Misses 915 924 +9
- Partials 441 445 +4
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| } | ||
|
|
||
| try { | ||
| boolean updated = updateSingleKey(key, keyUpdates, tableName, quotedPkColumn); |
There was a problem hiding this comment.
I see that this method tries getting a connection from pool per query. would it make sense to fire this on the same connection?
There was a problem hiding this comment.
Oh, you mean use the same connection for the entire batch? Yes, that's a good optimisation. Will change this.
| * <p>This method supports all update operators (SET, UNSET, ADD, APPEND_TO_LIST, | ||
| * ADD_TO_LIST_IF_ABSENT, REMOVE_ALL_FROM_LIST). Updates for each individual key are applied | ||
| * atomically, but there is no atomicity guarantee across different keys - some keys may be | ||
| * updated while others fail. Any atomicity guarantees are implementation-specific. |
There was a problem hiding this comment.
Any atomicity guarantees are implementation-specific.
This violates the Liskov's substitution principle ('L' in SOLID). This has severe implications, which breaks the purpose of having the document store abstraction in the first place, because the implementors now have to make their client code work differently for different underlying datastore. And, honestly, if people miss to read this code comment, and expect the same client-code to work perfectly fine in the other world (which is highly likely), it can lead to subtle bugs, hard to debug and reproduce.
The older methods (like update) have in fact violated this leading to the introduction of newer consistent methods. Perhaps, the behaviour can be restrictive (to at least the known databases which has the most restrictive atomicity support), but has to be consistent.
There was a problem hiding this comment.
@suresh-prakash Thanks for the detailed comment, totally agree with you. I feel batch-level atomicity could be controlled by adding them in the UpdateOptions. Or, this method's contract can say that batch-level atomicity is not guaranteed, while per-key update atomicity has to be guaranteed. This is the least-restrictive contract that almost any implementation can implement. If any implementation wants to provide batch-level atomicity guarantees, that's even better.
Wdyt?
There was a problem hiding this comment.
batch-level atomicity is not guaranteed, while per-key update atomicity has to be guaranteed
Yes. This makes total sense.
If any implementation wants to provide batch-level atomicity guarantees, that's even better.
Better definitely in terms of performance. But, that can create functional digression, especially when one of the multiple updates is invalid. In one case (supporting per-key atomicity), documents before that key would have been updated. While, in another case (supporting batch-level atomicity), the entire update would fail.
If we are using document-store, all implementors have to be consistent as much as possible in terms of functionality. Performance would (and can) differ based on the underlying database used, indexes created, etc.
From this standpoint, I think, it would make sense to implement Postgres also perform key-level atomic updates (at least by default). If performance is critical the client can opt-in as you mentioned via. UpdateOptions with the implications explicitly known.
Basically, the design goal here is that the caller/client should be aware of the implications even when they missed to read out these comments and hence prefer cross-database consistency over performance (since the performance differences are evident and inevitable).
There was a problem hiding this comment.
Makes sense, I have updated the doc.
| Set<Key> updatedKeys = new HashSet<>(); | ||
|
|
||
| try (Connection connection = client.getPooledConnection()) { | ||
| for (Map.Entry<Key, Collection<SubDocumentUpdate>> entry : updates.entrySet()) { |
There was a problem hiding this comment.
Based on our conversation @ #288 (comment), seems like this is making the whole operations (batch-level) atomic (having a different variant than the Mongo implementation to come in the future).
Could you please check?
There was a problem hiding this comment.
Is it? For batch-level transactions, we would acquire a client.getTransactionalConnection(). Might be wrong here but afai understand it's atomic for each key, not the entire batch.
There was a problem hiding this comment.
My bad. I thought you are doing transactions here. You are right.
| Set<Key> updatedKeys = new HashSet<>(); | ||
|
|
||
| try (Connection connection = client.getPooledConnection()) { | ||
| for (Map.Entry<Key, Collection<SubDocumentUpdate>> entry : updates.entrySet()) { |
There was a problem hiding this comment.
My bad. I thought you are doing transactions here. You are right.
Description
Since the older API
Collection#bulkUpdateSubDocsis deprecated, we did not have any new method to update multiple records/docs at with its own set of updates. This change introduces this API. Also, unlike the older, deprecated API that supported onlySET, this new API supports all 5 sub-doc operators.Testing
Added integration tests
Checklist:
Documentation
This method does not guarantee any atomicity. It means partial updates are allowed.
FlatPostgresCollection#bulkUpdateSubDocsis not atomic as well, as per the contract.