The store_new_application MCP tool in repo_context.py performs an upsert
via filter_by(repo=repo, location=location), but the location field is
passed through as-is from the LLM agent. When the agent calls the tool
multiple times for the same root-level component with slightly different
location strings (e.g. ., .., ...), each call creates a new row
because the exact-match check does not match them.
There is also no UNIQUE constraint on (repo, location) in the Application
model, so the database does not prevent this.
This causes downstream amplification in taskflows that use repeat_prompt: true with async: true (e.g. classify_application_local), since
get_components returns all rows and the taskflow spawns a parallel agent
for each duplicate.
Observed on a single-file Go application (one component), where
identify_applications stored 3 rows:
id | location | notes (truncated)
1 | . | Single-component Go web application ...
2 | .. | Single-component Go web application ...
3 | ... | Single-component Go web application ...
This resulted in classify_application_local running 3 identical analyses
in parallel instead of 1.
Possible fixes:
- Normalize
location before the upsert (strip trailing dots/slashes, canonicalize . variants)
- Add a UNIQUE(repo, location) constraint to the Application table
- Both
The
store_new_applicationMCP tool inrepo_context.pyperforms an upsertvia
filter_by(repo=repo, location=location), but thelocationfield ispassed through as-is from the LLM agent. When the agent calls the tool
multiple times for the same root-level component with slightly different
location strings (e.g.
.,..,...), each call creates a new rowbecause the exact-match check does not match them.
There is also no UNIQUE constraint on (repo, location) in the Application
model, so the database does not prevent this.
This causes downstream amplification in taskflows that use
repeat_prompt: truewithasync: true(e.g. classify_application_local), sinceget_componentsreturns all rows and the taskflow spawns a parallel agentfor each duplicate.
Observed on a single-file Go application (one component), where
identify_applications stored 3 rows:
This resulted in classify_application_local running 3 identical analyses
in parallel instead of 1.
Possible fixes:
locationbefore the upsert (strip trailing dots/slashes, canonicalize.variants)