Skip to content

feat(memory): per-tenant memory isolation#5967

Open
John-Jepsen wants to merge 8 commits into
crewAIInc:mainfrom
John-Jepsen:feat/memory-tenant-isolation
Open

feat(memory): per-tenant memory isolation#5967
John-Jepsen wants to merge 8 commits into
crewAIInc:mainfrom
John-Jepsen:feat/memory-tenant-isolation

Conversation

@John-Jepsen
Copy link
Copy Markdown

@John-Jepsen John-Jepsen commented May 29, 2026

Closes #2584

This PR implements per-tenant memory isolation and contextual memory retrieval.

Summary

Closes the per-tenant memory leak in the unified Memory subsystem. Right now every user's memories pool into the same vector collection and recall() mixes them across users — a data leak, not a UX nit. This PR threads tenant_id (and optional user_id) through the entire save → store → recall path and enforces isolation at the storage layer.

Invariant: a recall() scoped to tenant A can never return a memory written by tenant B — under any ranking, embedding collision, query depth, or backend.

Backward compatible. Default tenant_id is "_default", so single-tenant deployments keep working unchanged. Existing rows on disk read back as "_default" via row-level fallbacks and an auto-add_columns step when LanceDB tables are first opened.

Design

Full design doc: design-docs/0001-per-tenant-memory-isolation.md — covers the invariant, identity propagation, partitioning tradeoffs, threat model, test contract, and rollout order.

The enforcement chokepoint is a new ScopedStorage proxy that holds three contracts:

  1. Stamp on write — every record's tenant_id is overwritten with the bound tenant; cross-tenant records raise PermissionError.
  2. Inject on read — every backend read carries the tenant_id predicate; callers cannot omit it.
  3. Verify on return — every returned row is re-checked; a foreign-tenant row raises RuntimeError loudly, never silently filtered.

The triple is defense in depth: mypy --strict catches forgotten kwargs at type-check time (Protocol-level), backends push WHERE tenant_id = ? into the vector query (data-layer), and ScopedStorage is the runtime guard when either of the first two fails.

Commits (intended as one stacked PR; happy to split if you prefer)

# Commit What it does
1 docs(memory): add design doc… DD-0001 + Mermaid diagram of the propagation path
2 feat(memory): add tenant_id and user_id fields to MemoryRecord Additive Pydantic fields. Default _default. Hermetic round-trip tests.
3 refactor(memory): require tenant_id keyword on StorageBackend protocol Required keyword-only tenant_id on every read method; LanceDB + Qdrant push the predicate into the vector query. mypy --strict enforces.
4 feat(memory): add ScopedStorage wrapper and tenant isolation tests The enforcement chokepoint + the security-contract test file.
5 feat(memory): wire ScopedStorage through Memory and Flows The load-bearing PR. Memory gains tenant_id/user_id fields and per-call kwargs on remember/recall/forget/etc; RecallFlow and EncodingFlow hold a ScopedStorage so deep recall cannot escape.
6 feat(cli): add 'crewai memory migrate' One-shot CLI: --storage-dir, --default-tenant, --from-metadata-key, --dry-run. Idempotent.
7 docs(memory): add per-tenant memory isolation guide Mintlify page + nav (15 spots) + fix the misleading scope-as-isolation example in memory.mdx.

What this is not

Need Solved by Not this PR
Survives restart SQLite + Chroma/LanceDB persistence Already shipped
Shared across crews / cross-session knowledge graph External memory provider (mem0) Out of scope
Per-user data isolation This PR

Explicitly excluded (each its own ticket): mem0 replacement, per-tenant encryption, namespace-per-tenant partitioning (Option B in the design doc), multi-tenant rate limiting, Crew.kickoff() integration (Memory API is in; Crew.kickoff(tenant_id=…) belongs to a follow-up that touches agent execution).

Test plan

  • 13 isolation-contract tests (lib/crewai/tests/memory/test_tenant_isolation.py):
    • Cross-tenant recall() returns nothing (shallow and deep)
    • PermissionError on cross-tenant save through scoped handle
    • RuntimeError on backend leak detection
    • Default-tenant backcompat
    • forget / reset / get_record are tenant-scoped
    • user_id sub-partition within tenant
  • 7 migrate-command tests (lib/cli/tests/test_memory_migrate.py): missing dir/table no-op, schema migration, per-row --from-metadata-key, idempotence, --dry-run, validation
  • Full memory suite: 136 passed, 19 skipped (Qdrant optional dep)
  • ruff check clean across all changed files
  • mypy --strict clean (the 16 errors in qdrant_edge_storage.py are pre-existing no-any-unimported from the optional qdrant_edge dep being uninstalled — present on main before this branch)

Security note

This PR fixes a data-isolation bug. Reviewers — please run the checklist on the design doc in this order:

  1. Cross-scope leaks first — any recall code path that bypasses the scope filter
  2. Deletion honors scope
  3. Default-scope fallback works for callers that pass no tenant_id
  4. Tests prove two users with near-identical content do not cross-contaminate
  5. Then backward compat, perf, style

Open questions (called out in the design doc)

  1. Is "_default" the right sentinel? Alternatives bikeshed in DD-0001.
  2. Context-manager vs kwarg form for per-call tenant binding — currently kwargs; happy to add the context-manager sugar in a follow-up.
  3. Crew.kickoff(tenant_id=...) surface — proposed for a separate PR.
  4. Telemetry scrubbing — tenant_id values must never appear in telemetry payloads. Owner of crewai/telemetry/ should weigh in.
  5. Legacy BaseRAGStorage deprecation timeline.

AI-generated PR. Per .github/CONTRIBUTING.md, the llm-generated label is required. I attempted to apply it via gh pr create --label but lack permission on this repo. Could a maintainer please apply the llm-generated label?

Summary by CodeRabbit

  • New Features

    • Per-tenant memory isolation: memory APIs now accept tenant_id (and optional user_id) to prevent cross-tenant recall/leakage.
    • New CLI: crewai memory migrate to stamp existing memories with tenant identifiers and report dry-run vs applied changes.
  • Documentation

    • Comprehensive memory-isolation guide with examples, migration steps, threat model, and API usage.

Review Change Stack

DD-0001 captures the scope, invariant, propagation path, partitioning
tradeoff, threat model, test contract, and merge order for closing the
multi-tenant memory leak in the unified Memory subsystem.

Includes the source-of-truth Mermaid diagram for the identity propagation
path under design-docs/diagrams/. The diagram is embedded in the design
doc for inline review.
Additive schema change. tenant_id defaults to "_default" (single-tenant
deployments unchanged); user_id defaults to None.

Updates the docstrings on the legacy source/private fields to point at
tenant_id for callers who were using them as an isolation hint -- those
fields were never an isolation boundary (the post-filter at
unified_memory.py:704-709 ran after retrieval).

Backends do not yet persist or filter on the new fields; that lands in
the StorageBackend protocol change in the next PR. This commit is
purely the Pydantic model + a hermetic round-trip test that pins the
defaults and the "legacy row loads as _default" backward-compat path.

Refs: design-docs/0001-per-tenant-memory-isolation.md
Add tenant_id as a required keyword-only parameter on every read method
of StorageBackend (search, delete, get_record, list_records,
get_scope_info, list_scopes, list_categories, count, reset + async
variants). The "required without default" form is deliberate: mypy
--strict turns any forgotten caller into a CI failure, which is the
static enforcement behind the isolation invariant.

save() and update() do not gain a tenant_id parameter -- the tenant
lives on the record, and a separate parameter would invite a "record
says A, param says B" mismatch.

Backend changes:
  * LanceDBStorage: tenant_id/user_id columns added to placeholder schema;
    auto-add via add_columns on opening pre-isolation tables; every WHERE
    clause includes the tenant predicate via a new _tenant_where helper;
    reset() can no longer drop the table -- it scopes to the tenant.
  * QdrantEdgeStorage: tenant_id/user_id added to payload; new
    _build_tenant_filter replaces _build_scope_filter so every FieldCondition
    chain starts from the tenant; tenant_id and user_id added to payload
    indexes; reset() now goes through delete() with the tenant predicate.

Memory and the Flows (encoding_flow, recall_flow) temporarily pass
tenant_id="_default" to every storage call so behavior is unchanged
in this PR. The proper per-call tenant resolution lands when
ScopedStorage is wired through Memory in the next PR.

Test fixtures updated to pass tenant_id="_default" on direct backend
calls. The synthetic orphan payload in
test_orphaned_shard_cleanup gains tenant_id so the tenant filter matches
the fixture (real pre-isolation orphan data is migrated by the upcoming
crewai memory migrate command).

Refs: design-docs/0001-per-tenant-memory-isolation.md
ScopedStorage wraps any StorageBackend and binds every operation to a
fixed (tenant_id, user_id) pair. It holds three contracts and they
live nowhere else:

  1. Stamp on write -- every record's tenant_id is overwritten with the
     wrapper's bound tenant before persisting; a record arriving with a
     different non-default tenant_id raises PermissionError instead of
     being silently relabeled.
  2. Inject on read -- every read forwarded to the underlying backend
     carries the tenant_id predicate; the wrapper exposes no API to omit
     it.
  3. Verify on return -- after the backend returns rows, the wrapper
     re-checks each row's tenant_id and raises RuntimeError on a
     foreign-tenant leak. Loud over silent, so a broken backend filter
     surfaces in the next test run instead of shipping quietly.

The triple contract is defense in depth around the Protocol's required
keyword arg (type-check-time enforcement) and the backend's pushed-down
WHERE/FieldCondition (data-layer enforcement).

test_tenant_isolation.py adds the security contract from the design
doc. Nine ScopedStorage tests pass today (the wrapper is the
enforcement chokepoint and works without Memory's involvement). One
backcompat test pins the '_default' tenant single-tenant path. Two
Memory-level tests are XFAIL'd with strict=True, raises=TypeError --
they will pass when PR crewAIInc#4 wires Memory's remember/recall/forget to
take tenant_id kwargs and route through ScopedStorage.

Refs: design-docs/0001-per-tenant-memory-isolation.md
Adds tenant_id and user_id to Memory and threads them through every
public method (remember, remember_many, recall, forget, update,
list_records, list_scopes, list_categories, info, tree, reset, plus
async variants). Resolution order is per-call kwarg > instance default
> "_default", so single-tenant deployments keep working unchanged.

The mechanism: a new _scoped(tenant_id, user_id) factory builds a
ScopedStorage proxy bound to the resolved tenant. Every internal
storage call routes through it; the temporary tenant_id="_default"
arguments from PR crewAIInc#2 are removed. RecallFlow and EncodingFlow are
constructed with the ScopedStorage instance, so once a Flow is
running its leaf calls cannot escape the tenant filter -- the deep
recall LLM exploration path inherits isolation by construction.

Within-tenant policy: the existing source/private post-filter in
recall() is kept (it operates AFTER ScopedStorage has already
filtered to the tenant, so it cannot leak cross-tenant rows). It is
demoted in the docstring to a within-tenant convenience; per-user
isolation should use user_id, not source/private.

Tests: the two previously-XFAIL'd Memory-level isolation tests now
pass (cross-tenant recall and forget). A new
test_instance_default_tenant_holds pins the "Memory(tenant_id='X')"
construction pattern. All 13 isolation tests green; full memory
suite 129 passed, 19 skipped (qdrant optional dep).

Refs: design-docs/0001-per-tenant-memory-isolation.md
Converts 'crewai memory' from a single command into a click group
(invoke_without_command=True preserves the bare 'crewai memory' TUI
behavior). Adds a 'migrate' subcommand for stamping pre-isolation
LanceDB rows with a tenant_id.

Options:
  --storage-dir         Memory storage directory (default: $CREWAI_STORAGE_DIR/memory)
  --default-tenant      Tenant for unstamped rows (default: '_default')
  --from-metadata-key   Optional metadata key whose value becomes tenant_id;
                        rows missing the key fall back to --default-tenant.
                        Useful when existing data carries customer/user
                        identifiers in metadata.
  --table-name          LanceDB table name (default: 'memories')
  --dry-run             Print what would change without writing. Recommended
                        for the first run against production.

The command is idempotent. Schema migration (adding the tenant_id and
user_id columns) is applied via table.add_columns; per-row updates only
fire when the row's current tenant_id differs from the resolved target.
Dry-run simulates the column-add side effect so its reported row count
matches a real run.

Refs: design-docs/0001-per-tenant-memory-isolation.md
User-facing Mintlify page at docs/en/concepts/memory-isolation.mdx
covers when to use tenant_id, the three concepts (tenant_id, user_id,
scope) and what each is/isn't a security boundary for, common patterns
(SaaS, Crew.kickoff, Flows, instance-bound), the migration command,
the threat model, FAQ, and API reference.

docs.json: adds en/concepts/memory-isolation right after
en/concepts/memory in all 15 nav sections (one per language variant
and primary nav).

docs/en/concepts/memory.mdx: fixes the "Customer support (per-customer
context)" example that previously implied scope provided isolation.
It now uses tenant_id and adds a Warning callout pointing at the new
isolation page.

Translations to ar/ko/pt-BR for the new page can land in follow-up PRs.

Refs: design-docs/0001-per-tenant-memory-isolation.md
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

This PR implements per-tenant memory isolation throughout CrewAI's memory system by requiring tenant_id parameters in the StorageBackend protocol, introducing a ScopedStorage enforcement wrapper that stamps/verifies tenant boundaries, extending MemoryRecord with tenant fields, implementing tenant-aware query filtering in LanceDB and Qdrant backends, and adding a crewai memory migrate CLI tool for legacy data backfill.

Changes

Per-Tenant Memory Isolation Implementation

Layer / File(s) Summary
Design Specification & Diagrams
design-docs/0001-per-tenant-memory-isolation.md, design-docs/diagrams/identity-propagation.mmd
Complete design spec defining the isolation invariant, identity propagation path, StorageBackend contract changes, ScopedStorage enforcement choke-point, backend push-down filtering strategy, migration mechanics (backfill CLI, schema migration, startup warning), security model, isolation test contract, and multi-PR rollout plan. Identity propagation diagram visualizes the flow from memory API through ScopedStorage enforcement to backend storage.
User-Facing Documentation
docs/en/concepts/memory-isolation.mdx, docs/en/concepts/memory.mdx, docs/docs.json
New memory isolation concept page with quick-start examples, deployment patterns, migration command reference, threat model, and API surface. Updated existing memory.mdx example to use tenant_id="acme-corp" for real isolation instead of relying on scopes, with warning that scope does not prevent cross-tenant leakage. Documentation index updated to register the new concept page.
MemoryRecord Data Model Extension
lib/crewai/src/crewai/memory/types.py, lib/crewai/tests/memory/test_memory_record_tenant_fields.py
Adds tenant_id: str = "_default" and user_id: str | None = None fields to MemoryRecord. Updated source and private field documentation to clarify they are not isolation boundaries. Test suite validates defaults, backward compatibility for legacy records, round-trip serialization, and type constraints.
StorageBackend Protocol Contract
lib/crewai/src/crewai/memory/storage/backend.py
Updates protocol to require tenant_id as keyword-only parameter on all read operations (search, delete, get_record, list_records, get_scope_info, list_scopes, list_categories, count, reset) with optional user_id for sub-tenant filtering; save/update rely on record-embedded tenant_id. Enables static type-safety and misuse prevention across all backend implementations.
ScopedStorage Enforcement Wrapper
lib/crewai/src/crewai/memory/storage/scoped_storage.py
Implements tenant-bound StorageBackend proxy with three enforcement layers: (1) stamps incoming records with bound tenant_id and rejects cross-tenant records on write, (2) injects tenant predicates into backend calls on read, (3) verifies returned records match bound tenant and loud-fails on mismatch. Isolation-aware wiring for search, delete, list, count, reset, and scope operations.
LanceDB Tenant Isolation
lib/crewai/src/crewai/memory/storage/lancedb_storage.py
Adds SQL-safe WHERE fragment builder for tenant predicates, lazy schema migration auto-adding missing tenant_id/user_id columns, tenant-stamping during serialization with backward-compat defaults on deserialization, and tenant-pinned query construction for all read operations preventing foreign-tenant rows from entering candidate pools. Reset semantics changed from cross-tenant to tenant-scoped deletion.
Qdrant Edge Tenant Isolation
lib/crewai/src/crewai/memory/storage/qdrant_edge_storage.py
Persists tenant_id and user_id in point payloads with backward-compatible _default fallback, expands index schema to include tenant fields, builds tenant-pinned Qdrant filters for all queries/deletes, implements tenant/user validation in record retrieval, and replaces filesystem-based reset with tenant-scoped deletion via filter. Deletion now resolves record_ids only within requested tenant.
Memory API Tenant Integration
lib/crewai/src/crewai/memory/unified_memory.py
Adds instance-level tenant_id and user_id fields with defaults, _scoped() helper returning ScopedStorage bound to effective tenant/user (per-call args override instance defaults). Updates remember/recall/forget/list/reset methods to accept tenant_id/user_id overrides and route through scoped storage. Extends async wrappers and encoding flows to maintain tenant isolation.
CLI Memory Migrate Command
lib/cli/src/crewai_cli/cli.py, lib/cli/src/crewai_cli/memory_migrate.py
Converts memory CLI from single command to command group. Implements crewai memory migrate subcommand with options for storage directory, default tenant, optional metadata-key-based tenant source, table name, and dry-run mode. Idempotent schema migration (adds tenant columns) and per-row stamping with JSON metadata parsing fallback.
Migration Test Suite
lib/cli/tests/test_memory_migrate.py
Hermetic pytest suite validating migration behavior: no-op on missing directory/table, schema expansion, per-row stamping with metadata-key derivation, idempotency across repeated runs, dry-run mode verification, and empty-tenant input rejection.
Isolation Contract Test Suite
lib/crewai/tests/memory/test_tenant_isolation.py
Defines and verifies per-tenant isolation contract via ScopedStorage direct tests (stamp-on-write, cross-tenant rejection, search/get_record isolation, delete/reset scoping, backend leak detection, user_id filtering) and end-to-end Memory tests (backward compatibility with _default tenant, cross-tenant recall/forget scoping, instance-default-tenant binding).
Backend Test Updates
lib/crewai/tests/memory/test_qdrant_edge_storage.py, lib/crewai/tests/memory/test_unified_memory.py
Updates existing test suites to pass explicit tenant_id="_default" to all storage operations (search, count, delete, get_record, list_scopes, get_scope_info, list_records, list_categories, reset), reflecting the new tenant-scoped API contract. Orphaned shard tests now include tenant_id in synthetic payloads.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Suggested labels

size/XL

Poem

🐰 In burrows safe, each tenant keeps its thread,

Stamped and scoped so memories stay in bed.
ScopedStorage hops in, guard at the gate,
Stamps, filters, checks — no cross-tenant fate.
A rabbit cheers: isolation, well-made!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 53.07% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat(memory): per-tenant memory isolation' clearly summarizes the main change: introducing per-tenant isolation to the memory subsystem. It is specific, concise, and directly reflects the primary objective documented in the PR summary.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/crewai/src/crewai/memory/storage/lancedb_storage.py (1)

506-540: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Fix LanceDBStorage.delete() to honor record_ids when combined with older_than/categories/metadata_filter (match Qdrant semantics)

  • With record_ids + older_than and no categories/metadata_filter, LanceDB takes the record_ids-only branch and ignores age (lines 508-514).
  • With record_ids + (categories/metadata_filter) it scans and deletes by those predicates, but never intersects with record_ids (lines 515-540).
  • Qdrant gates the record_ids-only path with not (categories or metadata_filter or older_than) and applies allowed_ids inside the scan branch.
🐛 Proposed fix to match Qdrant semantics
-            if record_ids and not (categories or metadata_filter):
+            if record_ids and not (categories or metadata_filter or older_than):
                 before = int(self._table.count_rows())
                 ids_expr = ", ".join(f"'{_sql_quote(rid)}'" for rid in record_ids)
                 self._do_write(
                     "delete", f"({tenant_clause}) AND id IN ({ids_expr})"
                 )
                 return before - int(self._table.count_rows())
-            if categories or metadata_filter:
+            if categories or metadata_filter or older_than:
                 rows = self._scan_rows(
                     scope_prefix, tenant_id=tenant_id, user_id=user_id
                 )
+                allowed_ids = set(record_ids) if record_ids else None
                 to_delete: list[str] = []
                 for row in rows:
                     record = self._row_to_record(row)
+                    if allowed_ids is not None and record.id not in allowed_ids:
+                        continue
                     if categories and not any(
                         c in record.categories for c in categories
                     ):
                         continue
                     if metadata_filter and not all(
                         record.metadata.get(k) == v for k, v in metadata_filter.items()
                     ):
                         continue
                     if older_than and record.created_at >= older_than:
                         continue
                     to_delete.append(record.id)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/memory/storage/lancedb_storage.py` around lines 506 -
540, LanceDBStorage.delete currently skips `older_than` when `record_ids` are
provided and, when combined with `categories`/`metadata_filter`, doesn’t
intersect scan results with the provided `record_ids`; update the guard in the
fast-path to `if record_ids and not (categories or metadata_filter or
older_than)` so the ID-only delete only runs when no age/category/metadata
constraints exist, and in the scanning branch (where `_scan_rows`,
`_row_to_record`, and building `to_delete` are used) apply an `allowed_ids =
set(record_ids)` check so you only append record.id to `to_delete` if it is in
`allowed_ids` (i.e., intersect scan results with `record_ids`), leaving
`_do_write("delete", ...)` and the count math (`_table.count_rows()`) unchanged.
🧹 Nitpick comments (3)
lib/crewai/src/crewai/memory/unified_memory.py (1)

256-257: 💤 Low value

Note: user_id=None cannot override instance default.

The resolution logic user_id if user_id is not None else self.user_id means a caller with an instance-level user_id set cannot explicitly request "all users within this tenant" on a per-call basis by passing user_id=None. They would need a separate Memory instance without a default user_id.

This is likely acceptable for the intended use case (instance defaults are sticky), but worth documenting if callers may need to query across all users within a tenant while also having a user-specific default.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/crewai/src/crewai/memory/unified_memory.py` around lines 256 - 257, The
current resolution assigns resolved_user with "user_id if user_id is not None
else self.user_id", which prevents callers from explicitly passing None to mean
“all users” when an instance-level default exists; to fix, add a sentinel (e.g.
UNSET) as the default for the user_id parameter and change the resolution logic
to: if user_id is UNSET then use self.user_id, elif user_id is None then treat
as explicit “no user”/all-users, else use the provided user_id; update the
resolved_user assignment and any callers and docstrings to reference the new
UNSET sentinel and the clarified behavior.
design-docs/0001-per-tenant-memory-isolation.md (2)

355-356: ⚡ Quick win

Consider limiting the startup warning row scan.

The startup warning that detects unstamped rows could become expensive on large deployments with millions of records. Consider adding a note about limiting the scan to a sample (e.g., first 1000 rows) or making it opt-in via environment variable.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@design-docs/0001-per-tenant-memory-isolation.md` around lines 355 - 356, The
startup warning in Memory.model_post_init currently scans for unstamped rows and
can be expensive at scale; change it to either limit the scan to a configurable
sample (e.g., check only the first N rows or a random N rows) or make the full
scan opt-in via an environment variable (e.g., MEMORY_STARTUP_SCAN=true) and
expose a sample size config (e.g., MEMORY_STARTUP_SAMPLE_SIZE) so large
deployments default to a cheap/sample check while operators can opt into or
increase the scan via env vars; update Memory.model_post_init to read these
settings and implement the sampled/opt-in logic and ensure the warning message
reports when a sample was used.

304-309: 💤 Low value

Consider specifying the _quote escaping strategy.

The _quote helper is referenced but the exact escaping mechanism isn't defined. Consider adding a note specifying whether this uses parameterized queries (preferred) or manual SQL escaping, and which characters must be escaped.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@design-docs/0001-per-tenant-memory-isolation.md` around lines 304 - 309,
Clarify and document the escaping strategy used by the _quote helper referenced
in the where clause (where = f"tenant_id = {_quote(tenant_id)}" and optional
user_id) — update the design doc to state that DB parameterized queries are the
preferred approach and either replace/annotate _quote to use parameter binding
from the DB client (e.g., prepared statements) or explicitly list the
manual-escaping rules and characters escaped (quotes, backslashes, NULL, control
chars) if parameterization cannot be used; include a short note referencing the
_quote helper name and the tenant_id/user_id usage so readers know which code to
change or implement.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@lib/cli/src/crewai_cli/cli.py`:
- Around line 415-418: The "without key" count uses summary['rows_to_stamp']
which can be smaller than summary['rows_with_metadata_key'] on reruns; change
the calculation to derive the count from scanned rows instead by replacing the
expression that prints summary['rows_to_stamp'] -
summary['rows_with_metadata_key'] with summary['rows_scanned'] -
summary['rows_with_metadata_key']; update the block guarded by from_metadata_key
(the click.echo lines) so it prints the corrected value and ensure summary
contains 'rows_scanned' when building this output.

In `@lib/cli/src/crewai_cli/memory_migrate.py`:
- Around line 125-126: The current code materializes up to 10M rows with
table.search().limit(10_000_000).to_list(), which loads all columns (including
large vectors) into memory and silently truncates scans; change this to a
batched scanner that requests only the needed columns ("id", "tenant_id",
"metadata_str") and iterates in pages (e.g., using
table.search().select(...).limit(batch_size) with an offset/cursor or streaming
API), processing and stamping each batch and incrementing
summary["rows_scanned"] per batch so you never load the full table into memory
and never fetch the vector column.

In `@lib/crewai/tests/memory/test_qdrant_edge_storage.py`:
- Around line 57-62: Extend test_save_search to exercise tenant isolation:
create two records with the same embedding (use _rec to build them) but
different tenant ids (e.g., "alice" and "bob"), call storage.save([...]) for
both tenants, then run storage.search(...) with tenant_id set to each tenant and
assert results only contain that tenant's record; also exercise
storage.delete(...) with a tenant-scoped delete and verify the other tenant's
data remains. Use the existing symbols test_save_search, _rec, storage.save,
storage.search, and storage.delete to implement these checks.

In `@lib/crewai/tests/memory/test_tenant_isolation.py`:
- Around line 258-333: Add a deep-recall isolation test to TestMemoryIsolation
that uses Memory and its recall(...) with depth="deep" so the
RecallFlow/LLM-driven branch is exercised; create two records for tenants
"alice" and "bob" with identical embeddings (use mock_embedder to return same
vectors), call recall("something", tenant_id="alice", depth="deep") and
recall(... tenant_id="bob", depth="deep"), and assert results are tenant-scoped
(all h.record.tenant_id matches expected and no cross-tenant secrets appear) to
cover the LLM-driven path alongside the existing shallow tests.
- Around line 46-58: The mock_embedder fixture (mock_embedder -> embed ->
m.side_effect) uses abs(hash(t)) % 1000 which buckets inputs into 1000
collisions; replace that with a deterministic digest-based mapping (e.g.,
compute a SHA256/MD5 digest of t.encode(), then convert bytes into normalized
floats) to produce the 4-element embedding vector so different texts map to
highly distinct embeddings while remaining deterministic for tests.

---

Outside diff comments:
In `@lib/crewai/src/crewai/memory/storage/lancedb_storage.py`:
- Around line 506-540: LanceDBStorage.delete currently skips `older_than` when
`record_ids` are provided and, when combined with
`categories`/`metadata_filter`, doesn’t intersect scan results with the provided
`record_ids`; update the guard in the fast-path to `if record_ids and not
(categories or metadata_filter or older_than)` so the ID-only delete only runs
when no age/category/metadata constraints exist, and in the scanning branch
(where `_scan_rows`, `_row_to_record`, and building `to_delete` are used) apply
an `allowed_ids = set(record_ids)` check so you only append record.id to
`to_delete` if it is in `allowed_ids` (i.e., intersect scan results with
`record_ids`), leaving `_do_write("delete", ...)` and the count math
(`_table.count_rows()`) unchanged.

---

Nitpick comments:
In `@design-docs/0001-per-tenant-memory-isolation.md`:
- Around line 355-356: The startup warning in Memory.model_post_init currently
scans for unstamped rows and can be expensive at scale; change it to either
limit the scan to a configurable sample (e.g., check only the first N rows or a
random N rows) or make the full scan opt-in via an environment variable (e.g.,
MEMORY_STARTUP_SCAN=true) and expose a sample size config (e.g.,
MEMORY_STARTUP_SAMPLE_SIZE) so large deployments default to a cheap/sample check
while operators can opt into or increase the scan via env vars; update
Memory.model_post_init to read these settings and implement the sampled/opt-in
logic and ensure the warning message reports when a sample was used.
- Around line 304-309: Clarify and document the escaping strategy used by the
_quote helper referenced in the where clause (where = f"tenant_id =
{_quote(tenant_id)}" and optional user_id) — update the design doc to state that
DB parameterized queries are the preferred approach and either replace/annotate
_quote to use parameter binding from the DB client (e.g., prepared statements)
or explicitly list the manual-escaping rules and characters escaped (quotes,
backslashes, NULL, control chars) if parameterization cannot be used; include a
short note referencing the _quote helper name and the tenant_id/user_id usage so
readers know which code to change or implement.

In `@lib/crewai/src/crewai/memory/unified_memory.py`:
- Around line 256-257: The current resolution assigns resolved_user with
"user_id if user_id is not None else self.user_id", which prevents callers from
explicitly passing None to mean “all users” when an instance-level default
exists; to fix, add a sentinel (e.g. UNSET) as the default for the user_id
parameter and change the resolution logic to: if user_id is UNSET then use
self.user_id, elif user_id is None then treat as explicit “no user”/all-users,
else use the provided user_id; update the resolved_user assignment and any
callers and docstrings to reference the new UNSET sentinel and the clarified
behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 5ea368e1-6a3f-4e8e-a42f-5c27e6e2c70e

📥 Commits

Reviewing files that changed from the base of the PR and between fca21b1 and a236207.

📒 Files selected for processing (18)
  • design-docs/0001-per-tenant-memory-isolation.md
  • design-docs/diagrams/identity-propagation.mmd
  • docs/docs.json
  • docs/en/concepts/memory-isolation.mdx
  • docs/en/concepts/memory.mdx
  • lib/cli/src/crewai_cli/cli.py
  • lib/cli/src/crewai_cli/memory_migrate.py
  • lib/cli/tests/test_memory_migrate.py
  • lib/crewai/src/crewai/memory/storage/backend.py
  • lib/crewai/src/crewai/memory/storage/lancedb_storage.py
  • lib/crewai/src/crewai/memory/storage/qdrant_edge_storage.py
  • lib/crewai/src/crewai/memory/storage/scoped_storage.py
  • lib/crewai/src/crewai/memory/types.py
  • lib/crewai/src/crewai/memory/unified_memory.py
  • lib/crewai/tests/memory/test_memory_record_tenant_fields.py
  • lib/crewai/tests/memory/test_qdrant_edge_storage.py
  • lib/crewai/tests/memory/test_tenant_isolation.py
  • lib/crewai/tests/memory/test_unified_memory.py

Comment thread lib/cli/src/crewai_cli/cli.py Outdated
Comment thread lib/cli/src/crewai_cli/memory_migrate.py Outdated
Comment thread lib/crewai/tests/memory/test_qdrant_edge_storage.py
Comment thread lib/crewai/tests/memory/test_tenant_isolation.py
Comment thread lib/crewai/tests/memory/test_tenant_isolation.py
Copy link
Copy Markdown
Author

@John-Jepsen John-Jepsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gtg

Major:
  - LanceDBStorage.delete(): record_ids + older_than took the fast path
    and silently dropped older_than; record_ids + categories/metadata_filter
    scanned by the predicates but never intersected with record_ids.
    Now matches Qdrant semantics: fast path is gated on no other predicates,
    scan branch intersects with allowed_ids = set(record_ids). Two new
    regression tests pin both branches.
  - crewai memory migrate now streams rows in paginated batches instead of
    materializing up to 10M rows in one call. Only id, tenant_id, and
    metadata_str are selected -- the heavy vector column is never read.
    Pre-isolation tables (no tenant_id column) adapt the select set so
    dry-run works against unmigrated schemas. New test fixes the page size
    and confirms a multi-page table is scanned end-to-end.
  - Adds tenant-isolation tests to test_qdrant_edge_storage so when the
    optional qdrant_edge dep is installed the Qdrant backend's isolation
    contract is verified, not just LanceDB's.
  - Adds test_deep_recall_honors_tenant exercising the depth='deep'
    RecallFlow path with two tenants and colliding embeddings.

Minor:
  - CLI "without key" count now derives from rows_scanned instead of
    rows_to_stamp, so it stays correct on reruns where rows_to_stamp shrinks
    to zero as the data converges.
  - mock_embedder in test_tenant_isolation now uses a SHA-256 digest instead
    of abs(hash(t)) % 1000, removing the 1000-bucket collision risk that
    could let two different texts produce identical vectors and make
    isolation assertions pass vacuously.

Skipped (intentional):
  - Design-doc nits about startup-warning scan limits and _quote escaping
    strategy -- the warning is documented as a one-line log, not a load-
    bearing scan, and the escaping helper is documented inline in the
    LanceDB implementation.
  - The user_id sentinel suggestion -- per-call user_id=None meaning
    "instance default" is the intended SaaS pattern; callers who need
    "all users within tenant" construct a Memory without a default user_id.

Test counts: 140 passed, 21 skipped (was 136/19; the deltas are the two
new Qdrant isolation tests, the new deep-recall test, and the two new
LanceDB delete-intersection tests).
@John-Jepsen
Copy link
Copy Markdown
Author

Thanks @coderabbitai. Addressed in 90073fb:

Fixed (5 of 5 actionable):

  • 🟠 LanceDBStorage.delete()record_ids + older_than no longer drops older_than; record_ids + categories/metadata_filter now intersects with record_ids. Two new regression tests pin both branches (test_lancedb_delete_record_ids_intersects_with_other_filters, test_lancedb_delete_record_ids_intersects_with_categories).
  • 🟠 crewai memory migrate — replaced the 10M-row materialization with a paginated select() scan that fetches only id, tenant_id, metadata_str. The vector column is never read. Pre-isolation tables (no tenant_id column yet) adapt the select set so dry-run still works against unmigrated schemas. New test_migrate_pagination_streams_past_page_size uses a 2-row page size to confirm a 7-row fixture is fully scanned.
  • 🟠 Qdrant tenant-isolation tests — added test_search_isolates_tenants_with_colliding_embeddings and test_delete_is_tenant_scoped to test_qdrant_edge_storage.py so when the optional qdrant_edge dep is installed the same isolation contract from test_tenant_isolation.py is verified for the Qdrant backend.
  • 🟠 Deep-recall isolation test — added test_deep_recall_honors_tenant that uses depth="deep" so the RecallFlow path actually runs. Two tenants, SHA-256-derived embeddings, asserts no cross-tenant leakage.
  • 🟡 CLI "without key" count — now derives from rows_scanned - rows_with_metadata_key so reruns don't print a negative number once rows_to_stamp converges to 0.
  • 🟡 mock_embedder — replaced abs(hash(t)) % 1000 with a SHA-256 digest so different texts cannot collide and make isolation assertions pass vacuously.

Skipped (with reason):

  • 💤 user_id UNSET sentinel — the per-call user_id=None meaning "fall back to instance default" is the intended SaaS pattern. Callers who need "query across all users within this tenant" construct a separate Memory without a default user_id. The complexity of an UNSET sentinel didn't seem worth it for this case; happy to revisit if maintainers disagree.
  • 💤 Design-doc startup-warning scan limit and _quote escaping nits — the warning is a one-line log, not a load-bearing scan, and the escaping is documented inline in lancedb_storage.py (_sql_quote doubles single quotes, matching SQLite/LanceDB literal rules). Bandit's S608 rule catches any raw f-string usage in CI.

Test counts: 140 passed, 21 skipped (was 136/19). Deltas: 2 LanceDB delete tests, 2 Qdrant isolation tests (skipped without the optional dep), 1 deep-recall test, 1 migrate-pagination test.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Tip

For best results, initiate chat on the files or code changes.

@John-Jepsen — excellent follow-through. The fixes are exactly what was needed:

  • The paginated select() scan for crewai memory migrate is a much safer approach — no full-table materialization and the dry-run path adapts to pre-isolation schemas cleanly.
  • The SHA-256-derived embeddings in mock_embedder close the vacuous-pass hole properly; that was the most subtle test correctness issue.
  • test_deep_recall_honors_tenant exercising the actual RecallFlow path is the right end-to-end pin for the deep-recall invariant.
  • The record_ids intersection fix with the two regression tests pins both logical branches, which is the right level of coverage for that delete logic.
  • The "without key" count derivation fix (rows_scanned - rows_with_metadata_key) is the correct idempotent formula.

The rationale for skipping the user_id UNSET sentinel is sound — the "construct a second Memory without a default user_id" pattern keeps the API surface minimal and avoids a sentinel value leaking into downstream code. Happy to revisit if maintainers want cross-user recall as a first-class primitive later.

The _quote / startup-warning nits are fine to leave with the inline comment + Bandit S608 guard in CI.

🐇✨


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Memory Distinguished by Custom Key​​

1 participant