Skip to content

store_new_application lacks location normalization, causing duplicate components #72

@anticomputer

Description

@anticomputer

The store_new_application MCP tool in repo_context.py performs an upsert
via filter_by(repo=repo, location=location), but the location field is
passed through as-is from the LLM agent. When the agent calls the tool
multiple times for the same root-level component with slightly different
location strings (e.g. ., .., ...), each call creates a new row
because the exact-match check does not match them.

There is also no UNIQUE constraint on (repo, location) in the Application
model, so the database does not prevent this.

This causes downstream amplification in taskflows that use repeat_prompt: true with async: true (e.g. classify_application_local), since
get_components returns all rows and the taskflow spawns a parallel agent
for each duplicate.

Observed on a single-file Go application (one component), where
identify_applications stored 3 rows:

id | location | notes (truncated)
1  | .        | Single-component Go web application ...
2  | ..       | Single-component Go web application ...
3  | ...      | Single-component Go web application ...

This resulted in classify_application_local running 3 identical analyses
in parallel instead of 1.

Possible fixes:

  • Normalize location before the upsert (strip trailing dots/slashes, canonicalize . variants)
  • Add a UNIQUE(repo, location) constraint to the Application table
  • Both

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions