Skip to content

fix(coders): catch doubled-prefix hallucinations in edit-block headers#5112

Open
ekrembasari wants to merge 1 commit into
Aider-AI:mainfrom
ekrembasari:fix/edit-block-doubled-prefix-path
Open

fix(coders): catch doubled-prefix hallucinations in edit-block headers#5112
ekrembasari wants to merge 1 commit into
Aider-AI:mainfrom
ekrembasari:fix/edit-block-doubled-prefix-path

Conversation

@ekrembasari
Copy link
Copy Markdown

Summary

Catch doubled-prefix hallucinations like .claude/.claude/foo.json (when the chat file is .claude/foo.json) by extending the existing prepended-bogus-dir guard with progressive suffix-stripping against chat_files / valid_fnames. Closes #5111.

Why

Observed in production dispatch across small editor models (groq/llama-3.3-70b-versatile, openrouter/openai/gpt-oss-120b, gemini-2.5-flash-lite). Existing basename-only guard in WholeFileCoder and basename+fuzzy guards in find_filename miss multi-segment doubled prefixes; result is a 0-byte canonical file and content at a doubled path.

Changes

  • aider/coders/wholefile_coder.py: after basename guard, try progressively shorter suffixes against chat_files.
  • aider/coders/editblock_coder.py:find_filename: same pattern, between basename and fuzzy match (catches cases where SequenceMatcher ratio falls below the 0.8 cutoff).
  • 2 tests covering both code paths, including a mutation-tested case where fuzzy match would otherwise miss (ratio 0.778 < 0.8).

Risk

Low. Strict subset of behaviour:

  • Only triggers when exact match and basename match both fail.
  • Only resolves when a deterministic suffix is itself in chat_files / valid_fnames.
  • No behaviour change on any path that currently resolves correctly.

Test plan

  • pytest tests/basic/test_wholefile.py tests/basic/test_editblock.py — 38 passed (was 36 before)
  • Mutation test: confirmed both new tests fail without the fix (return doubled path verbatim)

Small editor models occasionally emit edit-block filename headers with
the chat-file's own prefix duplicated -- e.g. ".claude/.claude/foo.json"
when the chat file is ".claude/foo.json". The existing prepended-bogus-dir
guard in WholeFileCoder.get_edits and find_filename only covers the case
where the LLM emits just the basename; multi-segment doubled prefixes
fall through, abs_root_path concatenates blindly, and the file lands at
a doubled path while the canonical path stays empty.

Extend both code paths with progressive suffix-stripping against
chat_files / valid_fnames. Strict subset of behaviour: only triggers
when exact and basename matches both fail, and only resolves when a
deterministic suffix is itself in the chat-files list.

Tests cover:
- WholeFileCoder: chat_files retains a "subdir/" prefix (two files in
  distinct subdirs so find_common_root resolves to the tempdir root);
  LLM emits doubled "subdir/subdir/sample.txt"; canonical path edited,
  doubled path never created.
- find_filename: doubled prefix where SequenceMatcher ratio falls below
  the 0.8 fuzzy-match cutoff (0.778 for "sub/dir/sub/dir/foo.py" vs
  "sub/dir/foo.py"), so neither basename nor fuzzy match recovers it.
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Ekrem Başarı seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Edit-block path doubling: small models hallucinate prefix-doubled headers; existing prepended-dir guard misses multi-segment cases

2 participants