the current framework is mostly deterministic

hi authors, here  is a concern from mine.

In the current framework, the retrieval path is mostly deterministic.

for each task being  tested on the test split,  given the query of each task,  the following parts are deterministic:

  1. First-stage similarity retrieval
  - given the query text
  - the embedding is deterministic
  - similarity ranking over stored query embeddings is deterministic

  2. Expansion from query to memory IDs
  - once the top query groups are chosen
  - the associated memory IDs are looked up by rule
  - that step is deterministic

  3. Second-stage scoring
  - given candidate memories
  - similarity and Q are already fixed
  - hybrid scoring and sorting are deterministic

  4. Final selection
  - vanilla topk selection is deterministic
  - tri-channel bucket selection is also deterministic

  So overall, the retrieval process is largely deterministic, and i suppose that determinism is one of the reasons exploration is weak.

and in the yaml file `configs/rl_alf_config.yaml` , `epsilon` is set to 0, which means that no exploration, so overall is deterministic. 
and i do not find any words talking about how to set `epsilon` to be optimal.

(for LLM the temperature is also 0)

It means the work is not an exploration method  in the RL sense -weak exploration over the memory bank and weak credit coverage.

in other words, the process is very path-dependent: 
  - Epoch 1 creates the initial memory bank.
  - From Epoch 2 onward, retrieval is already conditioned on what Epoch 1 produced.
  - Since retrieval is mostly greedy, early-selected memories keep getting reused and updated.
  - That gives the first epoch disproportionate influence on the later trajectory.

That means later epochs are not independent rounds of learning. They are mostly: refinement of an already biased bank state

this means that early good memories get amplified, and early bad or noisy memories can also get locked in, and large parts of the bank remain under-tested.

Could you please give more clarification or thoughts on this point ?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the current framework is mostly deterministic #11

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

the current framework is mostly deterministic #11

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions