Skip to content

[BUG] Evals config_set silently renames api_config to "pipeline" and accepts unknown keys that are never consumed #245

@kamran-rapidfireAI

Description

@kamran-rapidfireAI

Summary — verified end-to-end

In evals-mode hyperparameter sweeps, RFRandomSearch / RFGridSearch config_set handling has two surprising behaviors that compound:

  1. The user-facing key api_config (used in all RAG tutorial examples) is silently renamed to "pipeline" in the output of get_runs(). Code that introspects run["api_config"] after get_runs() returns will KeyError or fall back to None.
  2. Arbitrary unrecognized keys (e.g. "prompt_id": List([1, 2, 3])) are accepted by get_runs() and propagated into the sampled run dict, but no downstream framework code reads them. The keys never reach preprocess_fn's batch, never appear in MLflow params, are stripped from the DB JSON config, and dropped on clone. Net effect: the user thinks they're sweeping an axis, but the framework treats every sampled value identically.

There is no warning, no debug log, no error on either path.

Verified end-to-end via a working notebook that uses rapidfireai's normal public API (Experiment, RFAPIModelConfig, RFRandomSearch) with real OpenAI gpt-4o-mini calls. No source-poking, no monkey-patching. Notebook: https://gist.github.com/kamran-rapidfireAI/13189883f3d1c9aa99012d8f92b68bd4

End-to-end repro — all three claims confirmed

Claim 1: api_config is silently renamed to pipeline

from rapidfireai.automl import List, RFRandomSearch
config_set = {
    "api_config":   List([cfg_a, cfg_b]),
    "batch_size":   4,
    "my_prompt_id": List(["v1", "v2", "v3"]),   # custom — never consumed
}
runs = RFRandomSearch(config_set, num_runs=3).get_runs(seed=42)
print(list(runs[0].keys()))
# => ['pipeline', 'batch_size', 'my_prompt_id']
# Note: 'api_config' is gone (renamed to 'pipeline'); 'my_prompt_id' survives.

Claim 2: my_prompt_id is sampled but never reaches preprocess_fn

preprocess_fn was instrumented to write its received batch.keys() to a file. Captured output:

preprocess_fn received batch keys: ['query', 'query_id']
  my_prompt_id in batch? False

The sampled value of my_prompt_id for this run was 'v3', but preprocess_fn never sees it.

Claim 3: my_prompt_id is NOT logged to MLflow

SQL query against ~/rapidfireai/db/rapidfire_mlflow.db for the run:

key             | value
----------------+-------------------------
model           | bug245_gpt4omini_exp
rag_k           | 2
rag_search_type | similarity

No my_prompt_id row. A reviewer cannot tell from MLflow that the axis was even declared.

Why it matters

We hit this trying to sweep prompt templates via "prompt_id": List([...]). The MLflow UI showed runs with identical params (because prompt_id was never logged) and metrics differing only at the noise floor (because preprocess_fn always got the same default template). It took several hours to discover that the axis was a no-op.

The silent rename of the documented key api_config"pipeline" in get_runs() output is a foot-gun. Code that does for run in get_runs(): cfg = run["api_config"] (which is exactly what the example notebooks' key naming suggests) silently fails.

Environment

  • rapidfireai: main (HEAD 91d94de); same behavior on 0.15.2 PyPI
  • Python 3.12, Linux
  • Setup used in the verification notebook: experiment_name bug245_repro (auto-suffixed to _2), model gpt-4o-mini, 3 questions, 1 shard

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No fields configured for Bug.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions