Skip to content

Feature Request: Per‑Decorator, Per‑Core Maximum Entry Size Limit #292

@shaypal5

Description

@shaypal5

Feature Request: Per‑Decorator, Per‑Core Maximum Entry Size

Context

cachier supports several cache cores (Pickle, MongoDB, Memory, SQLAlchemy, Redis) todayciteturn4view0 and lists multi‑core caching on its public roadmap.
While an upcoming feature (size_limit, discussed in a separate issue) will cap the total cache size, there is currently no way to prevent a single, giant return value from being stored in a given core.

Why This Matters

  1. Memory pressure – Accidentally caching a 5 GB model in the in‑process Memory core can crash the interpreter.
  2. I/O & bandwidth – Storing huge blobs in a networked core (MongoDB / Redis) is slow and expensive.
  3. Tiered caching – Once multi‑core caching lands, users will want small objects to stay in fast tiers (RAM, local disk) while large objects skip directly to slower but capacious tiers.

Desired Behaviour

  • A decorator can declare maximum entry sizes (entry_size_limit) so that individual calls exceeding the limit are not cached by the used core (internally this argument is passed to the core).
  • Limits apply before the core writes; rejected entries return without caching.

Proposed API

from cachier import cachier, MemoryCore, PickleCore, RedisCore

@cachier(entry_size_limit="200MB")
def load_dataset(name: str):
    ...
  • entry_size_limit accepts integers (bytes) or human‑readable strings ("256KB", "2GB").
  • If the decorator refuse an entry, the function still returns the value but nothing is cached; on individual decorated function calls where cachier__verbose was set to True, an informative message will be printed.

Implementation Sketch

  1. Size Estimation
    • MemoryCore: use sys.getsizeof + heuristic deep‑size walker.
    • PickleCore / SQLCore: compute byte length of the serialized blob before writing.
    • MongoDB / Redis: rely on driver to abort early if payload > limit.
  2. Core API Change
    • Add optional entry_size_limit to every BaseCore.
    • On set(key, value) the core returns True (stored) or False (skipped).
  3. Decorator Logic
    • Iterate through configured cores, calling core.set().
    • Stop at first successful store; otherwise return uncached result.
  4. Backwards Compatibility
    • Default entry_size_limit=None == unlimited (today’s behaviour).
    • Existing decorators work unchanged.

Interaction with Multi‑Core Caching

Scenario Benefit of Entry‑Size Limits
Hot path (≤10 MB) Cached in RAM → fastest retrieval.
Medium objects (10 MB–200 MB) Skip RAM, land on Pickle core (local SSD).
Very large objects (>200 MB) Skip local tiers, optionally fall back to cloud storage core (S3, future).

Thus, size gates guarantee that upper tiers stay lean, avoiding the need for expensive eviction cycles.


Open Questions

  1. Should violation raise a custom exception or silently skip with a log?
  2. Is a single walk with cloudpickle.dumps() sufficient for size estimation across all cores?
  3. How should users discover the actual serialized size (debug utility)?

Alternatives Considered

  • Global decorator‑level max_return_size – too coarse; different cores have very different constraints.
  • Rely on size_limit total cache cap – entries may still evict everything else from the cache in one shot.

Note: Future multi-core behavior & API

This is NOT the desired behavior and API for the current feature, BUT the implementation of the current feature MUST support extension into this planned multi-core behavior.

  • A decorator can declare per‑core maximum entry sizes (entry_size_limit) so that individual calls exceeding the limit are not cached by that core.
  • Limits apply before the core writes; rejected entries fall through to the next core (once multi‑core is implemented) or simply return without caching.
    • If all cores refuse an entry, the function still returns the value but nothing is cached; on individual decorated function calls where cachier__verbose was set to True, an informative message will be printed.
from cachier import cachier, MemoryCore, PickleCore, RedisCore

@cachier(
    cores=[
        MemoryCore(entry_size_limit="10MB"),          # skip bigger objects
        PickleCore(entry_size_limit="200MB"),         # disk OK, but cap file size
        RedisCore()                                   # no limit → anything goes
    ],
)
def load_dataset(name: str):
    ...

Request

I propose adding entry_size_limit (per core) in the next minor release, paving the way for predictable multi‑tier caching behaviour.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions