fix: add user_ids kwarg to LoCoMo / MemBench / MemSim / PersonaMem load_documents#14
Open
King-Brownie wants to merge 1 commit into
Conversation
…uments The base class Dataset.load_documents() declares ``user_ids: set[str] | None = None`` as part of its abstract contract. The runner at ``src/memory_bench/runner.py:128`` always passes ``user_ids=query_user_ids`` when ``dataset.isolation_unit is not None AND query_limit is not None``, but 4 of 7 concrete dataset overrides forgot to include the parameter in their signature: - LoComoDataset (isolation_unit='conversation') — actively fires ``TypeError: LoComoDataset.load_documents() got an unexpected keyword argument 'user_ids'`` on ``omb run --dataset locomo --memory bm25 --split locomo10 --query-limit N``. - MemBenchDataset, MemSimDataset, PersonaMemDataset (isolation_unit=None) — latent signature-contract violations that would surface if upstream ever sets their isolation_unit. This change: - LoComoDataset: add ``user_ids`` to the signature AND apply the filter ``if user_ids is not None and sample_id not in user_ids: continue`` — matches the reference pattern in BEAMDataset / LongMemEvalDataset / LifeBenchDataset (which all filter by their own per-doc id and were already correct). - MemBench / MemSim / PersonaMem: signature-only fixes; the runner never passes ``user_ids`` to them today (their isolation_unit is ``None``), so adding filter logic without a trigger path would be premature. The signature parity matches the base-class abstract method. Live-validated downstream by re-running ``omb run --dataset locomo --memory bm25 --split locomo10 --query-limit 3`` — the TypeError is gone; the run progresses to answer generation.
|
@ParisT97 is attempting to deploy a commit to the Vectorize Team on Vercel. A member of the Team first needs to authorize it. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the upstream
TypeError: LoComoDataset.load_documents() got an unexpected keyword argument 'user_ids'raised by the standardomb run --dataset locomo --memory bm25 --split locomo10 --query-limit Ncommand at commit45fa380.Companion to the issue filed alongside this PR.
Root cause
Dataset.load_documents()(base.py#L81-L92) declaresuser_ids: set[str] | None = Nonein its abstract signature. The runner at runner.py#L128 passesuser_ids=todataset.load_documents()whenever the dataset has anisolation_unitAND aquery_limitis set.4 of 7 concrete dataset overrides forgot to include the kwarg:
user_idsin override before this PR?isolation_unitLoComoDataset'conversation'MemBenchDatasetNoneMemSimDatasetNonePersonaMemDatasetNoneLongMemEvalDataset'question'LifeBenchDataset'user'BEAMDataset'conversation'Changes
user_idsto signature and apply the filterif user_ids is not None and sample_id not in user_ids: continue. Matches the existing pattern inBEAMDataset/LongMemEvalDataset/LifeBenchDataset(which filter by their per-doc identifier).isolation_unit=Nonemeans the runner never passes themuser_idstoday, so adding filter logic without a trigger path would be premature. Signature parity with the base class is restored.Total diff: +6 lines across 4 files.
Test plan
omb run --dataset locomo --memory bm25 --split locomo10 --query-limit 3no longer raisesTypeErrorand progresses to answer-generation.Happy to address any review feedback.