optimized filtering #21
Open
nvdk wants to merge 10 commits into
Open
Conversation
this greatly reduces the size of delta messages in stacks with large data changes.
65ce8d9 to
1854db3
Compare
added 2 commits
January 23, 2025 16:19
this tries to optimize filtering and match detection in two ways: - by grouping on match pattern: this means we only do detection and filtering once for a given match pattern (takes the pattern and sendMatchesOnly into account) - by doing matching before origin filter, since the latter seems more costly
1854db3 to
34004ff
Compare
added 3 commits
January 23, 2025 17:00
this is not the prettiest of solutions, but it does achieve the same thing. bonus is it only happens once on startup
it's mentioned in the expressjs best practices (and elsewhere) that process.env should be used sparingly: | If you need to write environment-specific code, you can check the value of NODE_ENV with process.env.NODE_ENV. Be aware that checking the value of any environment variable incurs a performance penalty, and so should be done sparingly. https://expressjs.com/th/advanced/best-practice-performance.html#set-node_env-to-production it seems this is because process.env is a getter that processes the ENV array each time
nvdk
commented
Jan 23, 2025
| } else { | ||
| if (process.env["DEBUG_DELTA_SEND"] || process.env["DEBUG_DELTA_NOT_SENDING_EMPTY"]) | ||
| if (DEBUG_DELTA_SEND || DEBUG_DELTA_NOT_SENDING_EMPTY) | ||
| console.log(`Changeset empty. Not sending to ${entry.callback.method} ${entry.callback.url}`); |
Member
Author
There was a problem hiding this comment.
I kept this, but I don't think this situation can occur.
x-m-el
reviewed
Jun 19, 2025
Contributor
There was a problem hiding this comment.
Some optimizations are possible. I added comments in adjusted code below, because the suggestions are interlinked with each other, so separate comments could become confusing. Alternatively see e800173 for the commit of these changes
async function informWatchers( changeSets, res, muCallIdTrail, muSessionId ){
/* This code used to be inside the full loop, but this is not needed.
* If the entry has `sendMatchesOnly` set to true, using changedTriples can be avoided (see below).
* If `sendMatchesOnly` is set to false, the code using `changedTriples` will always need all triples
* (because there is no extra "filtering").
* so there is no need to recalculate it every time.
*/
let allInserts = [];
let allDeletes = [];
changeSets.forEach( (change) => {
allInserts = [...allInserts, ...change.insert];
allDeletes = [...allDeletes, ...change.delete];
} );
const changedTriples = [...allInserts, ...allDeletes];
// Iterate over each unique match pattern
for (const matchKey in groupedServices) {
const firstEntry = groupedServices[matchKey][0];
// can use first entry since it's part of grouping
const sendMatchesOnly = firstEntry.options.sendMatchesOnly;
let maybePatternFilteredChangesets = changeSets;
if (sendMatchesOnly) {
maybePatternFilteredChangesets = filterChangesetsOnPattern(changeSets, firstEntry);
/* Currently, fully empty change sets are send too, but I don't think
* this is desired if `sendMatchesOnly` is set to true?
* Note: can also do this filtering in `filterChangesetsOnPattern` directly
* (by having an if-case or similar around `filteredChangesets.push(clonedChangeSet);`)
*/
maybePatternFilteredChangesets = maybePatternFilteredChangesets.filter(
(change) =>
change.insert.length > 0 ||
change.delete.length > 0 || change.effectiveInsert.length > 0 ||
change.effectiveDelete.length > 0
);
}
/* `filterChangesetsOnPattern` already does all the filtering for matches,
* so if `sendMatchesOnly` is true, we can just check if there are still
* non-empty change sets left to send. If all are empty (or none are left,
* in case of filtering them away as stated above), there are no matches with the spec.
*/
const someTripleMatchedSpec = sendMatchesOnly
? maybePatternFilteredChangesets.length > 0 /* makes the assumption that maybePatternFilteredChangesets has no empty change sets. */
: changedTriples.some((triple) =>
tripleMatchesSpec(triple, firstEntry.match)
);
/* The if can be taken out of the forEach loop, to avoid the loop all together
* if no triples will be send out. Note that `DEBUG_TRIPLE_MATCHES_SPEC` should
* be moved before the `if` case then. However, this does change the debug output:
* before it would print a line per match (true/false), now it will do this
* per service group. The difference is negligible, so this change could be skipped.
*/
if( someTripleMatchedSpec ) {
const matchingServices = groupedServices[matchKey];
matchingServices.forEach( async (entry) => {
/// rest of the code
});
}
x-m-el
reviewed
Jun 19, 2025
| // for each entity | ||
| if( DEBUG_DELTA_MATCH ) | ||
| console.log(`Checking if we want to send to ${entry.callback.url}`); | ||
| const matchSpec = entry.match; |
Contributor
There was a problem hiding this comment.
unused code
Suggested change
| const matchSpec = entry.match; |
- avoid recalculating changedTriples in every loop - if sendMatchesOnly is true, a changeset might be empty. Do not send a changeset in this case - don't redo checking matches if this was already done because of `sendMatchesOnly`
Member
Author
|
I've cherry picked your commit into my PR @x-m-el , thanks for the review and improvements |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
builds on #20, but has some optimizations wrt to filtering:
options.sendMatchesOnlyinto account as a grouping key)edit: can be tested using
nvdk/mu-delta-notifier:feature-optimized-filtering. see https://github.com/lblod/app-lblod-harvester/pull/79/files for an example