Skip to content

the current framework is mostly deterministic #11

@yanan1116

Description

@yanan1116

hi authors, here is a concern from mine.

In the current framework, the retrieval path is mostly deterministic.

for each task being tested on the test split, given the query of each task, the following parts are deterministic:

  1. First-stage similarity retrieval
  • given the query text
  • the embedding is deterministic
  • similarity ranking over stored query embeddings is deterministic
  1. Expansion from query to memory IDs
  • once the top query groups are chosen
  • the associated memory IDs are looked up by rule
  • that step is deterministic
  1. Second-stage scoring
  • given candidate memories
  • similarity and Q are already fixed
  • hybrid scoring and sorting are deterministic
  1. Final selection
  • vanilla topk selection is deterministic
  • tri-channel bucket selection is also deterministic

So overall, the retrieval process is largely deterministic, and i suppose that determinism is one of the reasons exploration is weak.

and in the yaml file configs/rl_alf_config.yaml , epsilon is set to 0, which means that no exploration, so overall is deterministic.
and i do not find any words talking about how to set epsilon to be optimal.

(for LLM the temperature is also 0)

It means the work is not an exploration method in the RL sense -weak exploration over the memory bank and weak credit coverage.

in other words, the process is very path-dependent:

  • Epoch 1 creates the initial memory bank.
  • From Epoch 2 onward, retrieval is already conditioned on what Epoch 1 produced.
  • Since retrieval is mostly greedy, early-selected memories keep getting reused and updated.
  • That gives the first epoch disproportionate influence on the later trajectory.

That means later epochs are not independent rounds of learning. They are mostly: refinement of an already biased bank state

this means that early good memories get amplified, and early bad or noisy memories can also get locked in, and large parts of the bank remain under-tested.

Could you please give more clarification or thoughts on this point ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions