Conversation
Add explicit shutdown() to SerialExecutor and DaskDelayedExecutor that clears tracked futures. Enhance LithopsEagerFunctionExecutor.shutdown() to clear cached _call_output on ResponseFutures before closing, preventing memory accumulation across repeated map() calls. Add parametrized tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #925 +/- ##
==========================================
+ Coverage 89.23% 89.30% +0.07%
==========================================
Files 33 33
Lines 2025 2039 +14
==========================================
+ Hits 1807 1821 +14
Misses 218 218
🚀 New features to boost your workflow:
|
|
|
||
|
|
||
| @pytest.mark.parametrize("executor_cls", ALL_EXECUTORS) | ||
| class TestExecutorMemory: |
There was a problem hiding this comment.
I'm not sure if either of these tests will be reliable enough - curious of @chuckwondo 's thoughts.
for more information, see https://pre-commit.ci
| # Lithops registers self.clean as an atexit handler (executors.py __init__), | ||
| # which prevents the FunctionExecutor from ever being garbage collected. | ||
| # Unregister it so the executor can be freed after shutdown. | ||
| atexit.unregister(self.lithops_client.clean) |
There was a problem hiding this comment.
This is absolutely wild and deserves raising upstream
|
|
||
|
|
||
| @pytest.mark.parametrize("executor_cls", ALL_EXECUTORS) | ||
| class TestExecutorMemory: |
There was a problem hiding this comment.
I'm not sure these really belong here. They seem like tests that should occur upstream. If we see memory leaks in the upstream executors, we should probably be opening bugs against the appropriate repositories, no?
There was a problem hiding this comment.
I agree, see #926 for discussion of what we should do or not do to clean up
There was a problem hiding this comment.
I think you are right in principal, but I would propose to keep this around as at least an optional test due to the significant work that was needed to get to the bottom of this.
There was a problem hiding this comment.
Yeah @chuckwondo I see these tests as hopefully-temporary, but unfortunately important.
|
Ok Tom and I actually worked on an alternative approach where we change the lithops config to set I have limited this to when the backend is |
TomNicholas
left a comment
There was a problem hiding this comment.
I think this is good, but before releasing it I want to:
- confirm with @jbusecke that nothing else puzzling has come up wrt this rabbit hole,
- raise an upstream issue on lithops in case we're missing something important here.
for more information, see https://pre-commit.ci
The shutdown method was not clearing `lithops_client.futures` or freeing output memory, causing test failures on Python 3.12 and 3.13. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Temporarily point lithops dep to jbusecke/lithops@fix-join-job-manager-localhostv2 to verify the upstream fix resolves the memory growth test on Linux CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…to exit Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…re measuring Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
for more information, see https://pre-commit.ci
| return iter(dask.compute(*delayed_tasks)) | ||
|
|
||
| def shutdown(self, wait: bool = True, *, cancel_futures: bool = False) -> None: | ||
| self._futures.clear() |
There was a problem hiding this comment.
Can we just get rid of self._futures? It's not clear why we even hold the list of futures to begin with?
There was a problem hiding this comment.
Um that is actually a good question @chuckwondo. @TomNicholas do you know why this was necessary in the case of dask?
There was a problem hiding this comment.
Why is this the case for any of the executors?
|
Ok that was a frustrating day. The memory test seemed to have some sort of dependency on the running environment (it passed concistently on my mac, but never on the CI). I have confirmed here that the actual end-to-end workflow which triggered this PR in the first place is indeed working as expected with this fix. I guess in the end I have come to agree with @chuckwondo that that behavior should not be tested in this packages context, and I have removed the test alltogether. We are still checking that the futures are all cleared up (and I am asserting that the |
Added `.shutdown()` method to custom executors to prevent unbounded memory increase in lithops.
@TomNicholas and I have been mulling over a complex native zarr ingestion job for a few days now. We were ingesting many batches of large (~1TB) native zarr stores, and saw a steady increase of memory which indicated that 'something' was holding onto memory in between batches. This PR adds tests to catch this behavior and a fix for the lithops executor that did fix our problem for now.
The dataset we are using is currently not public. I would like to demonstrate the base issue fully reproducibly. If anyone knows a ~1TB/300k chunks native zarr store in an anon bucket, please let me know. -> Moved this concern to https://github.com/jbusecke/virtualizarr-benchmark
Closes Lithops FunctionExecutor memory leaks: atexit handler + unbounded futures list #926 (confirmed in https://github.com/jbusecke/virtualizarr-benchmark)
Tests added
Tests passing
No test coverage regression
Full type hint coverage
Changes are documented in
docs/releases.md