perf(integrate): narrow project-scope local discovery to .apm/ to avoid full-tree walks (closes #1507)#1554
Conversation
…ithub/ (closes #1507) Local-scope integrate phase used to call discover_primitives(project_root) for the synthetic _local package, which walks the entire project tree via os.walk because LOCAL_PRIMITIVE_PATTERNS contains generic patterns like '**/*.instructions.md'. On a 50k+ file monorepo with only 2 instructions + 1 prompt under .apm/, this caused a 13-minute hang. Mirrors the existing $HOME narrowing (#830): when install_path == project_root (i.e. the synthetic _local PackageInfo built in services.integrate_local_content), restrict discover_primitives to .apm/ and .github/, which are the only supported locations for local primitives per LOCAL_PRIMITIVE_PATTERNS. The link_resolver.package_root sentinel for in-package asset rewriting (#1147) is preserved as the project root for the local case so asset links inside .apm/ are still resolved against the project tree. Tests: - Mock-based assertion that scan_root is .apm/ (and .github/ when present), never project_root. - Spy on os.walk to confirm noise subtrees are not traversed. - Mutation-break gate: removing the narrowing branch fails 4/4 new tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Fixes a performance bug (#1507) where apm install's integrate phase walks the entire project tree during local primitive discovery, causing multi-minute hangs in large monorepos. Extends the existing $HOME narrowing (#830) so that when install_path == project_root (the synthetic _local package), discovery is scoped to .apm/ and .github/ only.
Changes:
- Refactor
BaseIntegrator.init_link_resolverto support multiplescan_rootsand add a_is_root_local_packagehelper that narrows discovery for the project-scope_localpackage. - Preserve
link_resolver.package_root(asset rewriting from #1147) for installed deps and the project-scope local case; continue to skip it only for the$HOMEuser-scope case. - Add
TestInitLinkResolverLocalScoping(4 new tests, including a real-walk regression spy) and updatetest_uses_install_path_when_not_hometo reflect the real-package contract (apm_modules/owner/repo).
Show a summary per file
| File | Description |
|---|---|
| src/apm_cli/integration/base_integrator.py | Adds project-scope narrowing to .apm//.github/ and reworks the package_root assignment logic. |
| tests/unit/integration/test_base_integrator.py | Adds 4 unit tests covering narrowing, missing dirs, no-walk, and a real os.walk spy regression test. |
| CHANGELOG.md | Adds Unreleased Fixed entry referencing #1507. |
Copilot's findings
- Files reviewed: 3/3 changed files
- Comments generated: 0
APM Review Panel:
|
| Persona | B | R | N | Takeaway |
|---|---|---|---|---|
| Python Architect | 0 | 0 | 2 | Clean perf fix extending existing narrowing pattern; no architectural concerns. |
| CLI Logging Expert | 0 | 0 | 1 | No CLI output regression; optional verbose trace could make narrowed discovery easier to debug. |
| DevX UX Expert | 0 | 0 | 0 | No CLI surface, help text, error wording, or first-run flow changes; internal performance fix aligns with documented primitive locations. |
| Supply Chain Security Expert | 0 | 0 | 0 | No supply-chain security impact; read-only discovery scope is narrowed and auth/download/integrity paths are untouched. |
| OSS Growth Hacker | 0 | 0 | 2 | Strong user-visible performance story; changelog exists, with optional polish to foreground the 13-minute to sub-second outcome. |
| Doc Writer | 0 | 0 | 2 | CHANGELOG-only docs surface is accurate and well-formed; broader docs drift is not required for this scoped fix. |
| Test Coverage Expert | 0 | 1 | 0 | New narrowing tests pass; add one targeted assertion for package_root preservation if folding feedback. |
| Performance Expert | 0 | 1 | 1 | Dominant O(repo_size) local walk is eliminated; one follow-up is to document the supported .apm/.github scope if tree-wide generic patterns were ever relied on. |
B = blocking-severity findings, R = recommended, N = nits.
Counts are signal strength, not gates. The maintainer ships.
Top 3 follow-ups
- [Test Coverage Expert] Add assertion that
link_resolver.package_root == project_rootafter narrowed local discovery -- Guards the asset-link resolution invariant the PR explicitly promises to preserve; outcome:missing on a devx surface means a silent regression could ship undetected. - [Performance Expert] Document that project-local primitives are discovered only under
.apm/and.github/-- Users with generic patterns outside these dirs will silently lose discovery; making the scope explicit in docs prevents support burden. - [OSS Growth Hacker] Rewrite changelog bullet to lead with user outcome (minutes to sub-second) before implementation details -- Improves release-note scanability and amplifies the performance story for adopters evaluating APM.
Architecture
classDiagram
direction LR
class BaseIntegrator {
<<Template Method>>
+init_link_resolver(package_info, project_root)
+resolve_links(content, source, target)
+cleanup_empty_parents()
+sync_remove_files()
-_is_root_local_package(package_info, project_root)$ bool
}
class PromptIntegrator {
+integrate_package_primitives()
}
class InstructionIntegrator {
+integrate_package_primitives()
}
class UnifiedLinkResolver {
+register_contexts(primitives)
+package_root Path
+resolve_links_for_installation()
}
class discover_primitives {
<<Pure>>
+discover_primitives(base_dir) list
}
class PackageInfo {
<<ValueObject>>
+install_path Path
+name str
}
PromptIntegrator --|> BaseIntegrator
InstructionIntegrator --|> BaseIntegrator
BaseIntegrator *-- UnifiedLinkResolver : owns
BaseIntegrator ..> discover_primitives : calls
BaseIntegrator ..> PackageInfo : reads
note for BaseIntegrator "Template Method: subclasses override integrate_package_primitives(); link resolution is shared base logic."
class BaseIntegrator:::touched
classDef touched fill:#fff3b0,stroke:#d47600
flowchart TD
A["apm install CLI entry"] --> B["services.integrate_local_content()"]
B --> C["builds synthetic _local PackageInfo install_path = project_root"]
C --> D["PromptIntegrator.integrate_package_primitives()"]
D --> E["BaseIntegrator.init_link_resolver(pkg, project_root)"]
E --> F{"_is_root_local_package? install_path.resolve() == project_root.resolve()"}
F -->|Yes| G["candidates = .apm/, .github/; filter by .is_dir()"]
F -->|No| H{"install_path == Path.home()?"}
H -->|Yes| I["scan_roots = [~/.apm/]"]
H -->|No| J["scan_roots = [install_path]"]
G --> K["for root in scan_roots: discover_primitives(root)"]
I --> K
J --> K
K --> L["os.walk(root) bounded to .apm/ or .github/"]
L --> M["link_resolver.register_contexts(primitives)"]
M --> N{"install_path != Path.home()?"}
N -->|Yes| O["link_resolver.package_root = install_path"]
N -->|No| P["skip package_root escape guard"]
Recommendation
Merge now. The fix is correct, well-tested, and eliminates a real pain point reported by a community user. Fold the package_root assertion in-PR if convenient (one line); otherwise open a fast-follow issue for it alongside the docs-scope clarification. Neither prevents value delivery.
Full per-persona findings
Python Architect
- [nit]
$HOMEbranch does not filter by.is_dir()unlike the new project-root branch atsrc/apm_cli/integration/base_integrator.py:563
Pre-existing behavior, but the new project-root branch filters candidates with.is_dir()while the$HOMEbranch unconditionally scans~/.apm.
Suggested: Consider using the same.is_dir()guard for~/.apmfor symmetry. - [nit] Compound
package_rootcondition could use a clarifying comment atsrc/apm_cli/integration/base_integrator.py:588
The condition encodes settingpackage_rootfor all cases except$HOME. It is correct but dense.
Suggested: Consider adding a one-line comment naming the invariant.
CLI Logging Expert
- [nit] Narrowed discovery decision is invisible to verbose mode at
src/apm_cli/integration/base_integrator.py:558
Averbose_detailline after computingscan_rootswould help agents and maintainers understand why primitives outside.apm/.githubwere not scanned. This is additive only; no user-facing regression.
Suggested: Optionally log the computed local discoveryscan_rootsatverbose_detaillevel.
DevX UX Expert
No findings.
Supply Chain Security Expert
No findings.
OSS Growth Hacker
- [nit] Changelog hook could lead with the user outcome at
CHANGELOG.md:45
The bullet is dense and maintainer-focused. For a dramatic perf win, leading with the user result would improve release-note scanability.
Suggested: Start with "apm installon large monorepos no longer spends minutes in integrate" and move implementation details to a parenthetical. - [nit] Reporter credit could distinguish report and fix authors at
CHANGELOG.md:45
The entry credits the reporter, which is valuable, but could also clarify the fix author to reinforce contribution norms.
Suggested: Optionally say "(reported by @ioannispoulios, fixed by @danielmeppiel)" if that matches changelog convention.
Auth Expert -- inactive
PR touches only base_integrator.py discovery-walk narrowing, CHANGELOG.md, and integration tests; no auth, token, credential, or host-resolution surfaces are affected.
Doc Writer
- [nit] Changelog bullet is long for scanability at
CHANGELOG.md:45
The single bullet packs symptom, fix, scope, cross-references, and reporter credit into one long sentence. It is accurate, but a two-clause split would be easier to scan.
Suggested: Optional: split into a change clause and an impact clause while keeping issue and reporter references. - [nit] Troubleshooting docs could mention old-version large-monorepo integrate hangs at
docs/src/content/docs/troubleshooting/install-failures.md
The install failures page covers network hangs, not slow integrate on large monorepos. This PR is correctly scoped, but a follow-up doc note could help users stuck on older versions.
Suggested: Defer a short follow-up docs note rather than expanding this PR.
Test Coverage Expert
- [recommended] No test asserts
link_resolver.package_root == project_rootafter narrowed local discovery attests/unit/integration/test_base_integrator.py:171
The PR changespackage_rootassignment and explicitly promises the local case preserves project-root asset rewriting. The new tests provescan_rootsare narrowed, but grep ofTestInitLinkResolverLocalScopingforpackage_rootfound no assertion. A regression here could break asset links inside.apmwithout tripping the new tests.
Suggested: Addassert bi.link_resolver.package_root == tmp_pathtotest_narrows_to_apm_and_github_when_install_path_is_project_root.
Proof (missing at):tests/unit/integration/test_base_integrator.py::test_package_root_preserved_as_project_root_after_narrow-- proves: Asset links inside.apmresolve against the project tree after narrowed discovery. [devx]
assert bi.link_resolver.package_root == tmp_path
Performance Expert
- [recommended] Tree-wide generic
LOCAL_PRIMITIVE_PATTERNSare now unreachable for synthetic_localdiscovery atsrc/apm_cli/integration/base_integrator.py:556
Narrowing to.apm/and.github/is the right performance trade-off, but users who relied on generic**/*.instructions.mdfiles elsewhere in the project tree would silently lose discovery. The PR says.apm/.githubare the supported locations, so this is a scope/documentation follow-up rather than a correctness objection.
Suggested: Make the changelog/docs explicit that project-local primitives are discovered under.apm/and.github/. - [nit]
Path.resolve()adds tiny syscall overhead per package atsrc/apm_cli/integration/base_integrator.py:544
The overhead is negligible compared with the avoided full-tree walk, but plainPathequality could be enough if inputs are already canonical.
Suggested: Leave as-is unless future profiling shows this in the hot path.
This panel is advisory. It does not block merge. Re-apply the panel-review label after addressing feedback to re-run.
APM panel recommendationHeadline: Clean performance fix: local-discovery narrowing removes the large-repo install hang without changing the CLI surface. Recommendation: ship_now - No remaining panel findings. The scoped fix preserves installed-package discovery, preserves the project-root package_root invariant for narrowed local discovery, and adds regression coverage for the original full-tree walk, HOME edge cases, and string install_path normalization. Why: All active panelists converged with no remaining findings. The architecture review confirmed the install_path normalization and package_root behavior; security review confirmed the narrower discovery boundary; DevX review confirmed the user benefit without command-shape changes; docs and growth review confirmed the changelog is outcome-first; test coverage confirmed the regression traps. Local validation passed: Principle alignment
Architecture view classDiagram
class BaseIntegrator {
+link_resolver: UnifiedLinkResolver | None
+init_link_resolver(package_info, project_root) None
+resolve_links(content, source, target) tuple
+_is_root_local_package(package_info, project_root)$ bool
}
class UnifiedLinkResolver {
+package_root: Path
+register_contexts(primitives)
+resolve_links_for_installation(content, source_file, target_file) str
}
class PackageInfo {
+install_path: Path | str
}
BaseIntegrator --> UnifiedLinkResolver : creates and registers contexts
BaseIntegrator ..> PackageInfo : normalizes install_path
flowchart TD
A[package_info.install_path] --> B[Path-normalized install_path]
B --> C{resolves to HOME?}
C -->|yes| D[scan ~/.apm only when it is a directory]
C -->|no| E{resolves to project_root?}
E -->|yes| F[scan project .apm and .github only]
E -->|no| G[scan install_path]
D --> H[discover_primitives per scan root]
F --> H
G --> H
H --> I[register contexts in UnifiedLinkResolver]
F --> J[package_root remains project_root]
G --> K[package_root remains dependency install_path]
sequenceDiagram
participant BI as BaseIntegrator
participant PI as package_info
participant Disc as discover_primitives
participant LR as UnifiedLinkResolver
BI->>PI: read install_path
BI->>BI: normalize Path and choose scan_roots
loop each scan_root
BI->>Disc: discover_primitives(root)
Disc-->>BI: primitives
BI->>LR: register_contexts(primitives)
end
BI->>LR: set package_root when safe
Panel signals
Recommended follow-ups Full findings Folded by shepherd-driver
Copilot signals
Deferred
|
Co-authored-by: danielmeppiel <danielmeppiel@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: danielmeppiel <danielmeppiel@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Symptom
On a 50k+ file monorepo with only 2 instructions and 1 prompt under
.apm/,apm installhangs for ~13 minutes during the integrate phase before completing. Reported by @ioannispoulios in #1507.Trace
src/apm_cli/install/services.py:465-477builds a synthetic_localPackageInfowithinstall_path = project_rootand callsintegrate_package_primitives().src/apm_cli/integration/prompt_integrator.py:234invokesself.init_link_resolver(package_info, project_root).src/apm_cli/integration/base_integrator.py:531-553(pre-fix) setsscan_root = package_info.install_pathand only narrows toscan_root/.apmwhenscan_root == Path.home()(issue User-scope install (--global) unexpectedly enters local.apmintegration and can spend a long time scanning$HOME#830). For project-scope local content,scan_rootstays at the project root.src/apm_cli/primitives/discovery.py:590walksbase_dirwithos.walk. BecauseLOCAL_PRIMITIVE_PATTERNScontains generic patterns like**/*.instructions.md, the walk traverses the entire repo even when no local primitive lives outside.apm/. Result: O(repo size) work proportional to the monorepo, not to APM content.Fix
Extend the existing
$HOMEnarrowing to also apply wheninstall_path == project_root(i.e. the synthetic_localPackageInfo). For that case, scan only.apm/and.github/-- the two supported locations for local primitives perLOCAL_PRIMITIVE_PATTERNS. Real installed packages (apm_modules/<owner>/<repo>/...) keep scanning theirinstall_pathdirectly because theirinstall_pathdiffers fromproject_root.The
link_resolver.package_rootsentinel for in-package asset rewriting (#1147) is preserved as the project root for the local case, so asset links inside.apm/continue to resolve against the project tree.Tests (TDD + mutation-break gate)
tests/unit/integration/test_base_integrator.py::TestInitLinkResolverLocalScoping:test_narrows_to_apm_and_github_when_install_path_is_project_root-- mocksdiscover_primitives, asserts it is called with.apm/and.github/and never withproject_rootitself.test_skips_missing_directories-- only.apm/exists, so only.apm/is scanned.test_no_apm_or_github_means_no_walk-- no scannable dirs,discover_primitivesis never called.test_real_walk_does_not_traverse_noise_subtree-- end-to-end with a real walk; spies onos.walkto assert nonoise/...directory is visited even though the noise subtree contains a file matching the generic**/*.instructions.mdpattern.Mutation-break gate: deleting the narrowing branch causes all four new tests to fail.
The pre-existing
test_uses_install_path_when_not_homewas updated to useinstall_path = tmp_path / "apm_modules" / "owner" / "repo"so it reflects the real-package contract (install_path != project_root) it was always meant to exercise.How to test on a synthetic large-tree fixture
Before this PR: minutes. After: sub-second integrate phase.
Validation evidence
uv run --extra dev pytest tests/unit/integration/ tests/unit/install/ tests/unit/test_local_content_install.py -q-> 2959 passed.ruff check,ruff format --check,pylint R0801,lint-auth-signals.sh) -> all silent / 10.00.Closes #1507.