Skip to content

Commit 81b0d27

Browse files
authored
Limit path depth and add cache cleanup (#523)
* Limit path depth and add cache cleanup Prevent excessive memory usage and reduce GC pressure by limiting traversal path depth in data-filter (cap depth logic at 20). In grouper worker, add a periodic cache cleanup interval (every 5 minutes) started on worker start and cleared on finish to avoid unbounded cache growth. Free large references after delta computation by nulling event payloads to allow garbage collection. Also tighten memoization for findSimilarEvent (max reduced from 200 to 50 and ttl set to 600s) to further limit memory retained by caches. * Prevent deep-path allocations; tune timeouts & tests Limit path growth in data filter to avoid creating new arrays past 20 levels (reduces excessive allocations for deeply nested objects). Import TimeMs and replace magic numbers: set MEMOIZATION_TTL to 600_000, use TimeMs.MINUTE for cache cleanup interval, and apply MEMOIZATION_TTL to the memoize decorator. Clear large delta references by setting them to undefined to aid GC. Add a test that verifies filtering works on objects nested >20 levels without causing excessive memory allocations. * Use named constants for traversal and cache interval Introduce MAX_TRAVERSAL_DEPTH in data-filter.ts and replace the hardcoded depth check (20) to prevent excessive memory allocations from deep object nesting. Add CACHE_CLEANUP_INTERVAL_MINUTES in index.ts and use it for the cache cleanup setInterval instead of the literal 5, improving readability and making these tuning values easier to adjust. * Move eslint ignore to MEMOIZATION_TTL Remove the unnecessary eslint-disable-next-line on the memoize import and apply the no-unused-vars ignore to the MEMOIZATION_TTL constant instead. This ensures the linter suppression targets the unused constant (decorators not counted) rather than the import, improving clarity. * Suppress no-unused-vars for memoize import Add an explanatory comment and an `/* eslint-disable-next-line no-unused-vars */` directive before the `memoize` import in workers/grouper/src/index.ts. This prevents ESLint from flagging the import as unused since decorators (which rely on the import) are not recognized as usages by the linter.
1 parent c4cc417 commit 81b0d27

File tree

3 files changed

+85
-3
lines changed

3 files changed

+85
-3
lines changed

workers/grouper/src/data-filter.ts

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,11 @@
11
import type { EventAddons, EventData } from '@hawk.so/types';
22
import { unsafeFields } from '../../../lib/utils/unsafeFields';
33

4+
/**
5+
* Maximum depth for object traversal to prevent excessive memory allocations
6+
*/
7+
const MAX_TRAVERSAL_DEPTH = 20;
8+
49
/**
510
* Recursively iterate through object and call function on each key
611
*
@@ -18,7 +23,12 @@ function forAll(obj: Record<string, unknown>, callback: (path: string[], key: st
1823
if (!(typeof value === 'object' && !Array.isArray(value))) {
1924
callback(path, key, current);
2025
} else {
21-
visit(value, [...path, key]);
26+
/**
27+
* Limit path depth to prevent excessive memory allocations from deep nesting
28+
* This reduces GC pressure and memory usage for deeply nested objects
29+
*/
30+
const newPath = path.length < MAX_TRAVERSAL_DEPTH ? path.concat(key) : path;
31+
visit(value, newPath);
2232
}
2333
}
2434
};

workers/grouper/src/index.ts

Lines changed: 40 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,19 +19,29 @@ import type { RepetitionDBScheme } from '../types/repetition';
1919
import { DatabaseReadWriteError, DiffCalculationError, ValidationError } from '../../../lib/workerErrors';
2020
import { decodeUnsafeFields, encodeUnsafeFields } from '../../../lib/utils/unsafeFields';
2121
import { MS_IN_SEC } from '../../../lib/utils/consts';
22+
import TimeMs from '../../../lib/utils/time';
2223
import DataFilter from './data-filter';
2324
import RedisHelper from './redisHelper';
2425
import { computeDelta } from './utils/repetitionDiff';
2526
import { rightTrim } from '../../../lib/utils/string';
2627
import { hasValue } from '../../../lib/utils/hasValue';
28+
29+
/**
30+
* eslint does not count decorators as a variable usage
31+
*/
2732
/* eslint-disable-next-line no-unused-vars */
2833
import { memoize } from '../../../lib/memoize';
2934

3035
/**
3136
* eslint does not count decorators as a variable usage
3237
*/
3338
/* eslint-disable-next-line no-unused-vars */
34-
const MEMOIZATION_TTL = Number(process.env.MEMOIZATION_TTL ?? 0);
39+
const MEMOIZATION_TTL = 600_000;
40+
41+
/**
42+
* Cache cleanup interval in minutes
43+
*/
44+
const CACHE_CLEANUP_INTERVAL_MINUTES = 5;
3545

3646
/**
3747
* Error code of MongoDB key duplication error
@@ -72,6 +82,11 @@ export default class GrouperWorker extends Worker {
7282
*/
7383
private redis = new RedisHelper();
7484

85+
/**
86+
* Interval for periodic cache cleanup to prevent memory leaks from unbounded cache growth
87+
*/
88+
private cacheCleanupInterval: NodeJS.Timeout | null = null;
89+
7590
/**
7691
* Start consuming messages
7792
*/
@@ -85,13 +100,30 @@ export default class GrouperWorker extends Worker {
85100

86101
await this.redis.initialize();
87102
console.log('redis initialized');
103+
104+
/**
105+
* Start periodic cache cleanup to prevent memory leaks from unbounded cache growth
106+
* Runs every 5 minutes to clear old cache entries
107+
*/
108+
this.cacheCleanupInterval = setInterval(() => {
109+
this.clearCache();
110+
}, CACHE_CLEANUP_INTERVAL_MINUTES * TimeMs.MINUTE);
111+
88112
await super.start();
89113
}
90114

91115
/**
92116
* Finish everything
93117
*/
94118
public async finish(): Promise<void> {
119+
/**
120+
* Clear cache cleanup interval to prevent resource leaks
121+
*/
122+
if (this.cacheCleanupInterval) {
123+
clearInterval(this.cacheCleanupInterval);
124+
this.cacheCleanupInterval = null;
125+
}
126+
95127
await super.finish();
96128
this.prepareCache();
97129
await this.eventsDb.close();
@@ -237,6 +269,12 @@ export default class GrouperWorker extends Worker {
237269
} as RepetitionDBScheme;
238270

239271
repetitionId = await this.saveRepetition(task.projectId, newRepetition);
272+
273+
/**
274+
* Clear the large event payload references to allow garbage collection
275+
* This prevents memory leaks from retaining full event objects after delta is computed
276+
*/
277+
delta = undefined;
240278
}
241279

242280
/**
@@ -334,7 +372,7 @@ export default class GrouperWorker extends Worker {
334372
* @param projectId - where to find
335373
* @param title - title of the event to find similar one
336374
*/
337-
@memoize({ max: 200, ttl: MEMOIZATION_TTL, strategy: 'hash', skipCache: [undefined] })
375+
@memoize({ max: 50, ttl: MEMOIZATION_TTL, strategy: 'hash', skipCache: [undefined] })
338376
private async findSimilarEvent(projectId: string, title: string): Promise<GroupedEventDBScheme | undefined> {
339377
/**
340378
* If no match by Levenshtein, try matching by patterns

workers/grouper/tests/data-filter.test.ts

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -327,5 +327,39 @@ describe('GrouperWorker', () => {
327327
expect(event.context['secret']).toBe('[filtered]');
328328
expect(event.context['auth']).toBe('[filtered]');
329329
});
330+
331+
test('should handle deeply nested objects (>20 levels) without excessive memory allocations', () => {
332+
// Create an object nested deeper than the cap (>20 levels)
333+
let deeplyNested: any = { value: 'leaf', secret: 'should-be-filtered' };
334+
335+
for (let i = 0; i < 25; i++) {
336+
deeplyNested = { [`level${i}`]: deeplyNested, password: `sensitive${i}` };
337+
}
338+
339+
const event = generateEvent({
340+
context: deeplyNested,
341+
});
342+
343+
// This should not throw or cause memory issues
344+
dataFilter.processEvent(event);
345+
346+
// Verify that filtering still works at various depths
347+
expect(event.context['password']).toBe('[filtered]');
348+
349+
// Navigate to a mid-level and check filtering
350+
let current = event.context['level24'] as any;
351+
for (let i = 24; i > 15; i--) {
352+
expect(current['password']).toBe('[filtered]');
353+
current = current[`level${i - 1}`];
354+
}
355+
356+
// At the leaf level, the secret should still be filtered
357+
// (though path tracking may be capped, filtering should still work)
358+
let leaf = event.context;
359+
for (let i = 24; i >= 0; i--) {
360+
leaf = leaf[`level${i}`] as any;
361+
}
362+
expect(leaf['secret']).toBe('[filtered]');
363+
});
330364
});
331365
});

0 commit comments

Comments
 (0)