From cbd6e003dbb130f2d226672a55716cabc76cdca2 Mon Sep 17 00:00:00 2001 From: chkp-roniz <35386615+chkp-roniz@users.noreply.github.com> Date: Mon, 25 May 2026 15:12:24 +0300 Subject: [PATCH 1/8] fix(artifactory): deterministic boundary probe for nested GitLab paths Mirrors the native GitLab resolver pattern -- but uses HEAD on Artifactory archive URLs as the existence signal, so no separate metadata API is needed. Covers both routing modes: Mode 1 -- explicit FQDN deps (host/artifactory/key/owner/repo/...). Mode 2 -- bare shorthand under PROXY_REGISTRY_URL+PROXY_REGISTRY_ONLY. The resolver walks candidate (owner, repo, virtual_path) splits shallow-first and locks in the first one whose archive responds 2xx-3xx. For unambiguous paths it returns the parse-time ref unchanged. When every candidate is rejected it raises -- distinguishing "missing repo" from auth (401/403) errors so users get an actionable hint instead of silently anchoring on a wrong guess. ``allow_redirects=False`` on the HEAD call keeps the Bearer token from leaking cross-host. Removes the parse-time marker-segment heuristic (_VIRTUAL_PATH_ROOT_SEGMENTS / _ARTIFACTORY_VIRTUAL_MARKERS / _ARTIFACTORY_VIRTUAL_FILE_EXTENSIONS) that previously decided the boundary based on hard-coded directory names like ``skills/`` or ``prompts/``. Parse-time now defaults to all-as-repo with a structural file-extension rule on the last segment; the install-time resolver is the authoritative source of the boundary. Also fixes a pre-existing URL-roundtrip bug where ``to_github_url`` -> ``parse`` folded the ``artifactory/`` prefix into ``repo_url``, causing the downloader to build double-prefixed archive URLs and 404. ``_validate_url_repo_path`` now strips the prefix before returning the bare ``owner/repo`` slug. --- CHANGELOG.md | 10 + .../content/docs/enterprise/registry-proxy.md | 127 +++++ src/apm_cli/commands/install.py | 6 +- src/apm_cli/deps/artifactory_orchestrator.py | 11 +- src/apm_cli/install/artifactory_resolver.py | 275 ++++++++++ src/apm_cli/install/package_resolution.py | 29 +- src/apm_cli/models/dependency/reference.py | 84 ++- src/apm_cli/utils/github_host.py | 100 +++- tests/unit/test_artifactory_support.py | 482 ++++++++++++++++++ 9 files changed, 1103 insertions(+), 21 deletions(-) create mode 100644 src/apm_cli/install/artifactory_resolver.py diff --git a/CHANGELOG.md b/CHANGELOG.md index 1110f2352..e2985904c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,8 +7,18 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added + +- Deterministic Artifactory boundary probe: at install time, `_resolve_artifactory_boundary` HEAD-probes the candidate archive URLs and rebuilds the dependency reference at the proxy-verified split. Covers both explicit-FQDN (`host/artifactory/key/owner/repo/...`) and bare-shorthand-under-proxy deps. Distinguishes "missing repo" from auth (401/403) errors; uses `allow_redirects=False` so the bearer token can't leak cross-host. Mirrors the native GitLab probing pattern but without a separate metadata API. + +### Changed + +- Parse-time Artifactory boundary detection no longer uses directory-marker heuristics (`skills/`, `prompts/`, `agents/`, `collections/`, `instructions/`). The install-time resolver is authoritative for the (owner, repo, virtual_path) split; parse-time defaults to a simple all-as-repo / structural-file-extension rule. The `_VIRTUAL_PATH_ROOT_SEGMENTS`, `_ARTIFACTORY_VIRTUAL_MARKERS`, and `_ARTIFACTORY_VIRTUAL_FILE_EXTENSIONS` constants are removed. + ### Fixed +- `apm install` through a registry proxy now supports nested-group repos (3+ path segments, e.g. `group/subgroup/project`). Previously every trailing segment past `owner/repo` was treated as an in-repo virtual sub-path, so the downloader requested the wrong archive URL and the install failed with HTTP 404 from the proxy. Behavior is gated on `PROXY_REGISTRY_ONLY` so direct (non-proxy) installs keep the legacy two-segment shape. Affects `parse_artifactory_path`, `build_artifactory_archive_url`, `_detect_virtual_package`, the shorthand resolvers in `DependencyReference`, and `ArtifactoryOrchestrator._split_owner_repo` / `download_subdirectory`. +- URL-form Artifactory deps no longer round-trip with the `artifactory/` prefix folded into `repo_url`. The duplicated prefix caused the downloader to construct double-prefixed archive URLs (`/artifactory/key/artifactory/key/owner/repo/...`) and 404. `_validate_url_repo_path` now strips the Artifactory VCS prefix before returning the bare `owner/repo` slug; the prefix is still recovered separately via `_extract_artifactory_prefix`. - Copilot, Codex, Cursor, Claude, Windsurf, OpenCode, and Gemini adapters handle MCP v0.1 `runtimeArguments`/`packageArguments` with `variables` (no `type` key), matching the VS Code fix from #1444. (#1461, closes #1452, thanks @sergio-sisternes-epam) ## [0.14.2] - 2026-05-22 diff --git a/docs/src/content/docs/enterprise/registry-proxy.md b/docs/src/content/docs/enterprise/registry-proxy.md index c8da3f50b..69c0366fb 100644 --- a/docs/src/content/docs/enterprise/registry-proxy.md +++ b/docs/src/content/docs/enterprise/registry-proxy.md @@ -95,6 +95,129 @@ rewrites `apm.lock.yaml`. | `apm marketplace` (`marketplace.json` fetch) | Yes; falls back to GitHub Contents API unless `PROXY_REGISTRY_ONLY=1` | | Policy file fetch (`apm-policy.yml`) | No -- uses the GitHub API directly | +### Nested-group repos (GitLab subgroups behind the proxy) + +GitHub uses a fixed `owner/repo` shape, but GitLab projects can sit at any +subgroup depth (e.g. `group/subgroup/project`). When +`PROXY_REGISTRY_ONLY=1` is set, APM treats path segments past the second +as part of the repo slug; the real boundary between repo path and +in-repo virtual sub-path is then settled at install time by the same +deterministic boundary probe used for explicit FQDN deps (see +[Explicit Artifactory FQDN](#explicit-artifactory-fqdn-deterministic-boundary-probe) +below): + +```yaml +# apm.yml -- 3-segment GitLab project behind a registry proxy +dependencies: + apm: + - group/subgroup/project#main # resolves to the full nested path + - group/sub-a/sub-b/project#v1.2.0 # arbitrary depth supported +``` + +Virtual sub-paths under nested-group repos work via the probe: parse +defaults to all-as-repo, then the install-time resolver HEAD-probes +candidate splits against the proxy and rebuilds the dependency +reference at the first split whose archive responds 2xx-3xx: + +```yaml +dependencies: + apm: + # The probe walks shallow-first and lands on the real boundary -- + # ``group/subgroup/project`` is the repo, ``skills/`` is the + # virtual sub-path -- no marker-segment heuristic involved. + - group/subgroup/project/skills/ + # Files ending in ``.prompt.md`` / ``.instructions.md`` / + # ``.chatmode.md`` / ``.agent.md`` are structurally a virtual file + # at parse time; the probe still confirms which directory the file + # sits under is part of the repo path. + - group/subgroup/project/.prompt.md +``` + +Probe authentication matches the URL being probed: bare-shorthand deps +(Mode 2) use the proxy's own bearer token from `PROXY_REGISTRY_TOKEN`, +while explicit-FQDN deps (Mode 1) use the per-host auth resolver -- in +both cases the audience matches the probed URL, never the upstream Git +host. + +#### Trade-off: lockfile env-dependence + +The fold-into-repo behavior is gated on `PROXY_REGISTRY_ONLY` to keep the +legacy two-segment shape for direct installs. Consequence: the same +shorthand parses differently with vs. without the env set. For a team +that always runs through the proxy, this is invisible. For a mixed CI +fleet, expect lockfile drift if some agents have the env and others +don't -- pin the env in the same place you pin Python and APM versions. + +#### Configuring the upstream remote (GitLab) + +When the proxy fronts a private GitLab instance, the proxy itself must +authenticate upstream -- the client (APM) does not need a token if the +proxy is configured to accept anonymous reads on its API. + +In the Artifactory UI, for the remote pointing at GitLab: + +| Field | Value | +|---|---| +| URL | `https://` (no path prefix) | +| Repository Path Prefix | *blank* (any value gets prepended to every upstream request) | +| Username | empty *or* the GitLab username | +| Password / Token | the raw GitLab PAT value -- no `PRIVATE-TOKEN:` prefix | +| Token Authentication | enable when the password is a GitLab PAT | +| VCS Provider | `GitLab` | + +The PAT must carry **`read_repository`** scope -- `read_api` alone does +not permit `/-/archive/` downloads. Verify directly against GitLab +before saving on the remote: + +```bash +curl -sI -H "PRIVATE-TOKEN: $PAT" \ + "https://///-/archive//-.zip" \ + | head -3 +# Want: HTTP/1.1 200 OK + Content-Type: application/zip +``` + +#### Default branch gotcha + +APM defaults to `main` when no ref is provided. GitLab projects whose +default branch is still `master` will return HTTP 404 for every archive +URL APM tries. Pin the ref in `apm.yml` (`#master`) when the +project hasn't been renamed. + +#### Explicit Artifactory FQDN: deterministic boundary probe + +When a dep is written with the full proxy URL -- +`/artifactory///[/]` -- parse time gives a +simple `owner / first-segment / rest-as-virtual` split. The real +boundary is settled at install time by an authoritative resolver that +mirrors APM's native GitLab probing pattern, without a separate metadata +API: + +1. Enumerate every plausible `(owner, repo, virtual_path)` split + shallow-first. +2. `HEAD` each candidate's archive URL on the proxy (no follow on + redirects, so the bearer token can't leak cross-host). +3. The first candidate that responds 2xx-3xx wins; the dependency + reference is rebuilt at that boundary and persisted to `apm.yml` as + a structured `git:` + `path:` entry. + +If every candidate is rejected the resolver raises -- there is no +silent fallback to the parse-time guess: + +| Result | Behaviour | +|---|---| +| Single candidate (e.g. `host/artifactory/key/owner/repo`) | Parse-time ref returned unchanged; no HEAD probe issued. | +| All candidates `4xx` (excluding 401/403) | `ValueError: ... did not resolve to a reachable repository archive` | +| All candidates `401`/`403` | `ValueError: ... authentication problem, not a missing repo` -- check the token's read scope. | + +To opt out of probing -- e.g. when the proxy is offline at install time +or when you want a deterministic byte-for-byte string -- use the +explicit `//` boundary marker, which short-circuits the resolver to a +single candidate: + +```text +/artifactory////// +``` + When a surface is not proxy-routed and `PROXY_REGISTRY_ONLY=1`, APM aborts rather than silently fetching direct. @@ -141,6 +264,10 @@ and `apm cache clean`. | `git clone` hangs through the proxy | `HTTPS_PROXY` not set in the env that runs `git` | Export it in the shell that invokes `apm install`; CI secrets often miss this | | `DeprecationWarning: ARTIFACTORY_BASE_URL is deprecated` | Legacy env names | Rename to `PROXY_REGISTRY_*` | | Plaintext-token warning on proxy startup | Token sent over `http://` | Use `https://`, or set `PROXY_REGISTRY_ALLOW_HTTP=1` if the link is internal-only | +| `Invalid zip archive` with a body that starts `` and is ~17KB | Upstream returned a sign-in page; proxy cached the HTML | Configure upstream credentials on the registry remote, purge the cache, then refetch | +| 3-segment dep (`group/sub/project`) fails with HTTP 404 from the proxy | APM treated `project` as a virtual sub-path | Set `PROXY_REGISTRY_ONLY=1`; see [Nested-group repos](#nested-group-repos-gitlab-subgroups-behind-the-proxy) | +| HTTP 404 on every ref of an existing GitLab project | Default branch is `master`, APM defaults to `main` | Pin the ref: `#master` in `apm.yml` | +| Upstream URL in `X-Artifactory-Origin-Remote-Path` has a duplicated group name | The remote's "Repository Path Prefix" is prepending a segment that's also in the request | Clear the prefix field on the remote | For fully disconnected CI (no proxy reach at all), build a bundle on a connected host with `apm pack` and restore offline. See diff --git a/src/apm_cli/commands/install.py b/src/apm_cli/commands/install.py index f81f828b5..b6ee6a403 100644 --- a/src/apm_cli/commands/install.py +++ b/src/apm_cli/commands/install.py @@ -12,6 +12,7 @@ import click +from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary from apm_cli.install.errors import ( AuthenticationError, DirectDependencyError, @@ -436,11 +437,12 @@ def warning_handler(msg): # Canonicalize input try: - dep_ref, direct_gitlab_virtual_resolved = resolve_parsed_dependency_reference( + dep_ref, direct_virtual_resolved = resolve_parsed_dependency_reference( package, marketplace_dep_ref, dependency_reference_cls=DependencyReference, try_resolve_gitlab_direct_shorthand=_try_resolve_gitlab_direct_shorthand, + resolve_artifactory_boundary=_resolve_artifactory_boundary, auth_resolver=auth_resolver, verbose=bool(logger and logger.verbose), ) @@ -458,7 +460,7 @@ def warning_handler(msg): _seen.add(_s) _normalized.append(_s) dep_ref.skill_subset = _normalized - if marketplace_dep_ref is not None or direct_gitlab_virtual_resolved: + if marketplace_dep_ref is not None or direct_virtual_resolved: _apm_yml_entries[canonical] = dependency_reference_to_yaml_entry(dep_ref) except ValueError as e: reason = str(e) diff --git a/src/apm_cli/deps/artifactory_orchestrator.py b/src/apm_cli/deps/artifactory_orchestrator.py index cd7f0b616..ab07ed59d 100644 --- a/src/apm_cli/deps/artifactory_orchestrator.py +++ b/src/apm_cli/deps/artifactory_orchestrator.py @@ -175,12 +175,15 @@ def _resolve_host_prefix( @staticmethod def _split_owner_repo(dep_ref: DependencyReference) -> tuple[str, str]: repo_parts = dep_ref.repo_url.split("/") - if len(repo_parts) < 2 or not repo_parts[0] or not repo_parts[1]: + if len(repo_parts) < 2 or not all(repo_parts): raise ValueError( f"Invalid Artifactory repo reference '{dep_ref.repo_url}': " "expected 'owner/repo' format" ) - return repo_parts[0], repo_parts[1] + # Owner is the top-level namespace; the remainder of the path is the + # project slug. For GitLab projects behind an Artifactory VCS proxy + # the slug can include subgroups (e.g. ``group/subgroup/project``). + return repo_parts[0], "/".join(repo_parts[1:]) @staticmethod def _progress(progress_obj, progress_task_id, *, completed: int, total: int = 100) -> None: @@ -257,7 +260,9 @@ def download_subdirectory( subdir_path = dep_ref.virtual_path repo_parts = dep_ref.repo_url.split("/") owner = repo_parts[0] - repo = repo_parts[1] if len(repo_parts) > 1 else repo_parts[0] + # Preserve subgroup nesting (GitLab via proxy) by folding everything + # past the owner into the repo slug. + repo = "/".join(repo_parts[1:]) if len(repo_parts) > 1 else repo_parts[0] host, prefix, scheme = proxy_info self._progress(progress_obj, progress_task_id, completed=10) diff --git a/src/apm_cli/install/artifactory_resolver.py b/src/apm_cli/install/artifactory_resolver.py new file mode 100644 index 000000000..ae31769c4 --- /dev/null +++ b/src/apm_cli/install/artifactory_resolver.py @@ -0,0 +1,275 @@ +"""Authoritative Artifactory boundary resolution for install-time validation. + +Native GitLab support (see :mod:`apm_cli.install.gitlab_resolver`) probes the +GitLab API to find the real repository boundary inside a nested subgroup path. +For deps routed through a JFrog Artifactory VCS proxy the proxy itself serves +archive ZIPs at deterministic URLs, so the archive *is* the existence signal: +``HEAD`` the candidate archive URL, treat ``2xx-3xx`` as existence proof. + +The resolver covers both routing modes uniformly: + +* **Mode 1** -- explicit FQDN deps (``host/artifactory/key/owner/repo/...``). + Host and prefix come from the dependency reference itself. +* **Mode 2** -- bare shorthand (``owner/repo/...``) routed through the proxy + by ``PROXY_REGISTRY_URL`` + ``PROXY_REGISTRY_ONLY=1``. Host and prefix come + from the registry config. The rebuilt ref stays bare shorthand form (no + ``host``/``artifactory_prefix`` set) so identity and lockfile shape are + preserved. + +The resolver is **deterministic**: + +* For unambiguous paths (a single plausible owner/repo split), it returns the + parse-time dependency reference unchanged. +* For ambiguous paths it walks candidate splits shallow-first and locks in the + first one whose archive responds 2xx-3xx. No "best-effort fallback" -- if + every candidate is rejected by the proxy, the function raises and the + install pipeline surfaces a clear error (and distinguishes "missing repo" + from "auth problem"). + +Drift from this contract has historically masked broken deps by silently +falling back to a guess. Keep the resolver definite. +""" + +from __future__ import annotations + +import dataclasses + +import requests + +from apm_cli.core.auth import AuthResolver +from apm_cli.models.apm_package import DependencyReference +from apm_cli.utils.github_host import ( + build_artifactory_archive_url, + iter_artifactory_boundary_candidates, + sanitize_token_url_in_message, +) + +_ARTIFACTORY_BOUNDARY_UNRESOLVED = ( + "Artifactory host/path did not resolve to a reachable repository archive. " + "Verify the proxy URL, ref, and that the upstream project exists; the " + "``//`` notation can mark the repo/virtual boundary explicitly when the " + "proxy is unavailable." +) + +_ARTIFACTORY_BOUNDARY_AUTH = ( + "Artifactory proxy rejected the probe for every candidate boundary " + "(401/403). This is an authentication problem, not a missing repo: " + "check that the configured token has read access to the proxy." +) + + +class _CandidateStatus: + """Probe outcome for one candidate boundary.""" + + EXISTS = "exists" # at least one URL shape returned 2xx + MISSING = "missing" # every URL shape returned 4xx other than 401/403 + AUTH = "auth" # only 401/403 seen -- cannot tell if the repo exists + + +def _candidate_archive_status( + host: str, + prefix: str, + owner: str, + repo: str, + ref: str, + headers: dict[str, str], + verify, + timeout: int = 15, +) -> str: + """Classify one candidate's existence by HEAD-probing every archive URL shape. + + Returns one of :class:`_CandidateStatus` constants. Distinguishing + ``AUTH`` from ``MISSING`` is deliberate: a misconfigured token should + surface a different error than a wrong owner/repo split. + """ + urls = build_artifactory_archive_url(host, prefix, owner, repo, ref) + saw_auth = False + for url in urls: + try: + # ``allow_redirects=False`` keeps the Bearer token from leaking to + # any host the proxy might redirect us to. 3xx is still existence + # proof -- the server confirmed the resource by issuing a redirect. + r = requests.head( + url, headers=headers, timeout=timeout, verify=verify, allow_redirects=False + ) + except requests.RequestException: + continue + if 200 <= r.status_code < 400: + return _CandidateStatus.EXISTS + if r.status_code in (401, 403): + saw_auth = True + return _CandidateStatus.AUTH if saw_auth else _CandidateStatus.MISSING + + +def _proxy_probe_headers( + dep_ref: DependencyReference, + auth_resolver, + is_mode_1: bool, +) -> dict[str, str]: + """Build the HEAD-probe headers for the right authentication audience. + + * Mode 1 (explicit FQDN): the dep's host *is* the proxy host, so the + per-host auth resolver already returns the right token. + * Mode 2 (bare shorthand under the proxy): the dep's logical host is + typically ``github.com`` and the auth resolver would hand back a + GitHub token -- wrong audience. Use the proxy's own bearer token + from :class:`RegistryConfig` instead. + """ + headers: dict[str, str] = {"User-Agent": "apm-cli"} + if is_mode_1: + ctx = auth_resolver.resolve_for_dep(dep_ref) + if ctx and getattr(ctx, "token", None): + headers["Authorization"] = f"Bearer {ctx.token}" + return headers + # Mode 2: proxy-scoped token only. + from ..deps.registry_proxy import RegistryConfig + + cfg = RegistryConfig.from_env() + if cfg is not None: + headers.update(cfg.get_headers()) + return headers + + +def _proxy_routing_target( + dep_ref: DependencyReference, +) -> tuple[str, str, bool] | None: + """Return ``(host, prefix, is_mode_1)`` if *dep_ref* routes through the proxy. + + * Mode 1 (explicit FQDN): host and prefix come from the dependency ref. + * Mode 2 (bare shorthand under ``PROXY_REGISTRY_ONLY``): host and prefix + come from the registry-proxy config. + + Returns ``None`` when no proxy routing applies (regular GitHub/ADO deps) + or when host/prefix are not real strings (e.g. mocked dependency refs + in unit tests). + """ + if dep_ref.is_artifactory(): + host = dep_ref.host + prefix = dep_ref.artifactory_prefix + if isinstance(host, str) and isinstance(prefix, str) and host and prefix: + return (host, prefix, True) + return None + from ..deps.artifactory_orchestrator import ArtifactoryRouter + + if not ArtifactoryRouter.should_use_proxy(dep_ref): + return None + cfg = ArtifactoryRouter.parse_proxy_config() + if cfg is None: + return None + host, prefix, _scheme = cfg + if isinstance(host, str) and isinstance(prefix, str) and host and prefix: + return (host, prefix, False) + return None + + +def _resolve_artifactory_boundary( + package: str, + auth_resolver, + verbose: bool = False, + *, + dep_ref: DependencyReference | None = None, +) -> DependencyReference: + """Definitively resolve the (owner, repo, virtual_path) boundary on the proxy. + + Returns the rebuilt :class:`DependencyReference` with the proxy-verified + boundary, or *dep_ref* unchanged when there is nothing to disambiguate + (single candidate, or the dep doesn't route through the proxy at all). + Raises ``ValueError`` if every candidate is rejected by the proxy. + """ + if auth_resolver is None: + auth_resolver = AuthResolver() + + if dep_ref is None: + dep_ref = DependencyReference.parse(package) + + target = _proxy_routing_target(dep_ref) + if target is None: + # Not routed through the proxy -- nothing for this resolver to do. + return dep_ref + host, prefix, is_mode_1 = target + + # Strip any inlined ``user:pass@host`` credentials before echoing the + # package string back in error messages. Deferred until after the + # routing-target check so non-proxy deps short-circuit before touching + # the host (callers occasionally pass mocked dep_refs whose ``host`` is + # not a real string). + safe_package = sanitize_token_url_in_message(package, host=host) + + prefix_segs = prefix.split("/") + tail = dep_ref.repo_url or "" + if dep_ref.virtual_path: + tail = f"{tail}/{dep_ref.virtual_path}" + tail_segs = [s for s in tail.split("/") if s] + path_segments = [*prefix_segs, *tail_segs] + + candidates = list(iter_artifactory_boundary_candidates(path_segments)) + if not candidates: + raise ValueError( + f"Artifactory dep '{safe_package}' has no plausible owner/repo split" + ) + if len(candidates) == 1: + # Single candidate -- the parse-time dep_ref is already definitive. + return dep_ref + + headers = _proxy_probe_headers(dep_ref, auth_resolver, is_mode_1) + verify = True + + ref = dep_ref.reference or "main" + + all_auth = True + for cand_prefix, cand_owner, cand_repo, cand_virtual in candidates: + if verbose: + path_suffix = f" [path: {cand_virtual}]" if cand_virtual else "" + print( + f" artifactory-resolve: probing {host}/{cand_prefix}/{cand_owner}" + f"/{cand_repo}#{ref}{path_suffix}" + ) + status = _candidate_archive_status( + host, cand_prefix, cand_owner, cand_repo, ref, headers, verify + ) + if status == _CandidateStatus.EXISTS: + # If the probe confirms the parse-time boundary, return the + # original ref so the install pipeline doesn't re-serialize + # a structurally unchanged dep as a different shape. + if ( + dep_ref.repo_url == f"{cand_owner}/{cand_repo}" + and (dep_ref.virtual_path or None) == cand_virtual + ): + return dep_ref + return _rebuild_dep_ref(dep_ref, host, cand_prefix, cand_owner, cand_repo, cand_virtual, is_mode_1) + if status == _CandidateStatus.MISSING: + all_auth = False + + # Every candidate was rejected. Auth-only rejection deserves a different + # error than "the repo doesn't exist" so the user gets an actionable hint. + if all_auth: + raise ValueError(f"{_ARTIFACTORY_BOUNDARY_AUTH} (package: {safe_package})") + raise ValueError(f"{_ARTIFACTORY_BOUNDARY_UNRESOLVED} (package: {safe_package})") + + +def _rebuild_dep_ref( + dep_ref: DependencyReference, + host: str, + prefix: str, + owner: str, + repo: str, + virtual_path: str | None, + is_mode_1: bool, +) -> DependencyReference: + """Rebuild *dep_ref* at the probed boundary, preserving non-boundary fields. + + Mode 1 keeps the FQDN+prefix on the rebuilt ref (it was there to begin + with). Mode 2 keeps the rebuilt ref as bare shorthand so identity and + lockfile shape stay stable -- the proxy is still routed via env, not via + embedded host/prefix. + """ + if is_mode_1: + return DependencyReference.from_artifactory_boundary_probe( + host, prefix, owner, repo, virtual_path, dep_ref.reference + ) + return dataclasses.replace( + dep_ref, + repo_url=f"{owner}/{repo}", + virtual_path=virtual_path, + is_virtual=bool(virtual_path), + ) diff --git a/src/apm_cli/install/package_resolution.py b/src/apm_cli/install/package_resolution.py index 791e184e3..260c2bd21 100644 --- a/src/apm_cli/install/package_resolution.py +++ b/src/apm_cli/install/package_resolution.py @@ -40,14 +40,20 @@ def resolve_parsed_dependency_reference( try_resolve_gitlab_direct_shorthand: Callable[..., Any], auth_resolver: Any, verbose: bool, + resolve_artifactory_boundary: Callable[..., Any] | None = None, ) -> tuple[Any, bool]: """Parse or probe *package* into a ``DependencyReference``. - Returns ``(dep_ref, direct_gitlab_virtual_resolved)`` where the second flag - is True when GitLab direct shorthand probing produced a virtual path entry. + Returns ``(dep_ref, direct_virtual_resolved)`` where the second flag is + True when a probe (GitLab shorthand or Artifactory boundary) rebuilt the + dep ref, so the install pipeline persists it as a structured entry. + + For Artifactory deps the optional ``resolve_artifactory_boundary`` is + authoritative: it returns the proxy-verified boundary or raises -- there + is no silent fallback to the parse-time guess. Raises: - ValueError: When GitLab shorthand probing is required but fails to resolve. + ValueError: When GitLab or Artifactory probing fails to resolve. """ dep_ref = ( marketplace_dep_ref @@ -66,8 +72,21 @@ def resolve_parsed_dependency_reference( if resolved is None: raise ValueError(_GITLAB_DIRECT_SHORTHAND_UNRESOLVED) dep_ref = resolved - direct_gitlab_virtual_resolved = bool(dep_ref.is_virtual and dep_ref.virtual_path) - return dep_ref, direct_gitlab_virtual_resolved + direct_virtual_resolved = bool(dep_ref.is_virtual and dep_ref.virtual_path) + return dep_ref, direct_virtual_resolved + if marketplace_dep_ref is None and resolve_artifactory_boundary is not None: + # The resolver decides its own applicability -- it short-circuits for + # deps that don't route through the Artifactory proxy. When it rebuilds + # the dep_ref, the canonical shorthand can't round-trip the verified + # boundary, so persist as a structured ``git:`` + ``path:`` entry. + resolved = resolve_artifactory_boundary( + package, + auth_resolver, + verbose=verbose, + dep_ref=dep_ref, + ) + if resolved is not dep_ref: + return resolved, True return dep_ref, False diff --git a/src/apm_cli/models/dependency/reference.py b/src/apm_cli/models/dependency/reference.py index 6b2829980..9ed5dd193 100644 --- a/src/apm_cli/models/dependency/reference.py +++ b/src/apm_cli/models/dependency/reference.py @@ -760,6 +760,26 @@ def from_gitlab_shorthand_probe( is_virtual=True, ) + @classmethod + def from_artifactory_boundary_probe( + cls, + host: str, + prefix: str, + owner: str, + repo: str, + virtual_path: str | None, + reference: str | None, + ) -> "DependencyReference": + """Build a dependency ref for a resolved Artifactory boundary probe.""" + return cls( + repo_url=f"{owner}/{repo}", + host=host, + reference=reference, + virtual_path=virtual_path, + is_virtual=bool(virtual_path), + artifactory_prefix=prefix, + ) + @classmethod def _gitlab_shorthand_repo_segment_count( cls, @@ -804,6 +824,39 @@ def _gitlab_shorthand_repo_segment_count( return n + @classmethod + def _bare_shorthand_repo_segment_count(cls, path_segments: list[str]) -> int: + """Return how many leading segments belong to the repo path for bare shorthand. + + For ``owner/repo[/...]`` shorthand without an FQDN, the default is 2 + segments (GitHub convention). When registry-only mode is active, the + proxy may be fronting a host that allows nested namespaces (GitLab + subgroups) -- parse defaults to **all-as-repo** so the deterministic + boundary probe in :mod:`apm_cli.install.artifactory_resolver` can + rebuild the dependency reference at the proxy-verified split. + + The only parse-time inference kept is **structural**: a path whose + last segment ends in a virtual file extension + (``.prompt.md``/``.instructions.md``/``.chatmode.md``/``.agent.md``) + is by shape a virtual file dep -- the file is the last segment and + the repo is everything before it. This is not a directory-marker + heuristic; the file extension is the type. The shallower boundary + (when the file lives under a known directory like ``prompts/``) is + settled by the probe, not by a marker list. + """ + n = len(path_segments) + if n < 3: + return 2 + + from ...deps.registry_proxy import is_enforce_only + + if not is_enforce_only(): + return 2 + + if any(path_segments[-1].endswith(ext) for ext in cls.VIRTUAL_FILE_EXTENSIONS): + return n - 1 + return n + @classmethod def _detect_virtual_package(cls, dependency_str: str): """Detect whether *dependency_str* refers to a virtual package. @@ -890,7 +943,13 @@ def _detect_virtual_package(cls, dependency_str: str): else: min_base_segments = len(path_segments) else: - min_base_segments = 2 + # Bare shorthand (no FQDN). Default GitHub-style: owner/repo plus + # any tail is treated as a virtual sub-path. But when registry-only + # mode is active, the proxy may be fronting a GitLab instance where + # the project lives at an arbitrary subgroup depth -- fold non-marker + # segments into the repo path instead of mis-classifying them as + # virtual sub-paths (see issue: nested GitLab subgroup support). + min_base_segments = cls._bare_shorthand_repo_segment_count(path_segments) min_virtual_segments = min_base_segments + 1 @@ -1065,6 +1124,17 @@ def _resolve_virtual_shorthand_repo(cls, repo_url, validated_host, virtual_path= "Invalid Azure DevOps virtual package format: expected at least org/project/repo/path" ) repo_url = "/".join(parts[:3]) + elif validated_host is None and virtual_path: + # Bare shorthand under registry-only mode may carry a nested + # repo path (GitLab subgroup via proxy). Trust the boundary + # already chosen by ``_bare_shorthand_repo_segment_count`` -- + # everything before the virtual tail belongs to the repo. + vparts = [p for p in virtual_path.split("/") if p] + tail = len(vparts) + if tail > 0 and len(parts) > tail + 1: + repo_url = "/".join(parts[: len(parts) - tail]) + else: + repo_url = "/".join(parts[:2]) else: repo_url = "/".join(parts[:2]) @@ -1113,6 +1183,10 @@ def _resolve_shorthand_to_parsed_url(cls, repo_url, host): user_repo = "/".join(parts[:3]) elif host and not is_github_hostname(host) and not is_azure_devops_hostname(host): user_repo = "/".join(parts) + elif len(parts) >= 3 and cls._bare_shorthand_repo_segment_count(parts) > 2: + # Registry-only mode allows nested-group repo paths + # (GitLab via proxy). Keep the full multi-segment path. + user_repo = "/".join(parts[: cls._bare_shorthand_repo_segment_count(parts)]) else: user_repo = "/".join(parts[:2]) else: @@ -1245,6 +1319,14 @@ def _validate_url_repo_path(cls, parsed_url) -> tuple[str, str | None]: raise ValueError( f"Invalid repository path: expected at least 'user/repo', got '{path}'" ) + # Strip the Artifactory VCS prefix so ``repo_url`` is the bare + # ``owner/repo`` -- otherwise URL round-trip through + # ``to_github_url`` -> ``parse`` would carry the prefix in the + # repo_url and the orchestrator would double-prefix download URLs. + # The prefix itself is recovered separately via + # :meth:`_extract_artifactory_prefix`. + if is_artifactory_path(path_parts): + path_parts = path_parts[2:] for pp in path_parts: if any(pp.endswith(ext) for ext in cls.VIRTUAL_FILE_EXTENSIONS): raise ValueError( diff --git a/src/apm_cli/utils/github_host.py b/src/apm_cli/utils/github_host.py index 3031115de..5e515f040 100644 --- a/src/apm_cli/utils/github_host.py +++ b/src/apm_cli/utils/github_host.py @@ -610,25 +610,101 @@ def is_artifactory_path(path_segments: list) -> bool: return len(path_segments) >= 4 and path_segments[0].lower() == "artifactory" +def iter_artifactory_boundary_candidates(path_segments: list, shape_filter=None): + """Yield ``(prefix, owner, repo, virtual_path)`` candidates shallow-first. + + Mirrors :meth:`DependencyReference.iter_gitlab_direct_shorthand_boundary_candidates`: + enumerate every plausible (owner, repo) split and let the caller probe each + one against the Artifactory proxy. The probe (HEAD on the archive URL) + decides the real boundary; this iterator only proposes candidates. + + If *shape_filter* is provided, candidates whose ``virtual_path`` fails the + filter are skipped. The candidate with no virtual path (``k == n``) is + always yielded as the all-as-repo fallback so callers that need a + deterministic answer (no probing) can pick it. + + The ``//`` empty-segment notation explicitly marks the repo / virtual + boundary and short-circuits the iterator to a single candidate. + + Returns nothing for non-Artifactory paths. + """ + if not is_artifactory_path(path_segments): + return + repo_key = path_segments[1] + prefix = f"artifactory/{repo_key}" + remaining = path_segments[2:] + if not remaining: + return + owner = remaining[0] + after_owner = remaining[1:] + n = len(after_owner) + if n == 0: + return + + if "" in after_owner: + empty_idx = after_owner.index("") + repo_parts = after_owner[:empty_idx] + suffix_parts = [s for s in after_owner[empty_idx + 1 :] if s] + if repo_parts: + yield ( + prefix, + owner, + "/".join(repo_parts), + "/".join(suffix_parts) if suffix_parts else None, + ) + return + + for k in range(1, n + 1): + repo = "/".join(after_owner[:k]) + suffix_parts = after_owner[k:] + suffix = "/".join(suffix_parts) if suffix_parts else None + if suffix is not None and shape_filter is not None and not shape_filter(suffix): + continue + yield (prefix, owner, repo, suffix) + + def parse_artifactory_path(path_segments: list) -> tuple: - """Parse Artifactory path into (prefix, owner, repo, virtual_path). + """Parse Artifactory path into ``(prefix, owner, repo, virtual_path)``. - Input: ['artifactory', 'github', 'microsoft', 'apm-sample-package'] - Output: ('artifactory/github', 'microsoft', 'apm-sample-package', None) + Parse-time output is intentionally simple and unambiguous: ``owner`` is the + first segment after ``artifactory/{key}``, ``repo`` is the next segment, + and any further segments become ``virtual_path``. The authoritative + boundary -- needed for nested GitLab subgroup paths behind the Artifactory + proxy -- is determined by :func:`apm_cli.install.artifactory_resolver.\ +_resolve_artifactory_boundary`, which probes archive URLs and rebuilds the + dependency reference at the verified boundary. - Input: ['artifactory', 'github', 'owner', 'repo', 'skills', 'review'] - Output: ('artifactory/github', 'owner', 'repo', 'skills/review') + The ``//`` notation (empty segment) is honored as an explicit, deterministic + boundary marker so users can opt out of probing. Returns None if not a valid Artifactory path. """ if not is_artifactory_path(path_segments): return None repo_key = path_segments[1] - remaining = path_segments[2:] prefix = f"artifactory/{repo_key}" + remaining = path_segments[2:] + if not remaining: + return None owner = remaining[0] - repo = remaining[1] - virtual_path = "/".join(remaining[2:]) if len(remaining) > 2 else None + after_owner = remaining[1:] + if not after_owner: + return None + + if "" in after_owner: + empty_idx = after_owner.index("") + repo_parts = after_owner[:empty_idx] + suffix_parts = [s for s in after_owner[empty_idx + 1 :] if s] + if repo_parts: + return ( + prefix, + owner, + "/".join(repo_parts), + "/".join(suffix_parts) if suffix_parts else None, + ) + + repo = after_owner[0] + virtual_path = "/".join(after_owner[1:]) if len(after_owner) > 1 else None return (prefix, owner, repo, virtual_path) @@ -661,11 +737,15 @@ def build_artifactory_archive_url( Tuple of URLs to try in order """ base = f"{scheme}://{host}/{prefix}/{owner}/{repo}" + # GitLab archive filenames use only the project basename, even when the + # project sits inside a subgroup (e.g. ``group/sub/pkg`` becomes + # ``pkg-{ref}.zip``). ``rsplit`` keeps the flat case unchanged. + repo_basename = repo.rsplit("/", 1)[-1] return ( # GitHub-style: /archive/refs/heads/{ref}.zip f"{base}/archive/refs/heads/{ref}.zip", - # GitLab-style: /-/archive/{ref}/{repo}-{ref}.zip - f"{base}/-/archive/{ref}/{repo}-{ref}.zip", + # GitLab-style: /-/archive/{ref}/{basename}-{ref}.zip + f"{base}/-/archive/{ref}/{repo_basename}-{ref}.zip", # GitHub-style tags fallback f"{base}/archive/refs/tags/{ref}.zip", # codeload.github.com-style: /zip/refs/heads/{ref} diff --git a/tests/unit/test_artifactory_support.py b/tests/unit/test_artifactory_support.py index 0f3e14611..8fad78874 100644 --- a/tests/unit/test_artifactory_support.py +++ b/tests/unit/test_artifactory_support.py @@ -107,6 +107,122 @@ def test_different_repo_key(self): result = parse_artifactory_path(["artifactory", "my-proxy", "team", "project"]) assert result[0] == "artifactory/my-proxy" + def test_nested_gitlab_subgroup_simple_parse(self): + """Parse is intentionally simple/shallow; the install-time resolver is authoritative. + + For ``apm/acme-group/shared-modules/pkg-utils`` parse picks + ``owner=acme-group, repo=shared-modules`` and folds the rest into + ``virtual_path``. ``_resolve_artifactory_boundary`` then HEAD-probes + archive URLs and rebuilds the ref at the proxy-verified boundary. + """ + result = parse_artifactory_path( + ["artifactory", "apm", "acme-group", "shared-modules", "pkg-utils"] + ) + assert result == ("artifactory/apm", "acme-group", "shared-modules", "pkg-utils") + + def test_nested_subgroup_with_marker_simple_parse(self): + """Marker-like segments are no longer parse-time signals; the resolver decides.""" + result = parse_artifactory_path( + ["artifactory", "apm", "group", "subgroup", "repo", "skills", "foo"] + ) + assert result == ("artifactory/apm", "group", "subgroup", "repo/skills/foo") + + def test_nested_subgroup_with_virtual_file_extension(self): + """Virtual file extensions are no longer parse-time boundary signals either.""" + result = parse_artifactory_path( + ["artifactory", "apm", "group", "subgroup", "repo", "rules.prompt.md"] + ) + assert result == ("artifactory/apm", "group", "subgroup", "repo/rules.prompt.md") + + def test_double_slash_subdirectory_notation(self): + """The ``//subdir`` notation explicitly marks the repo/virtual boundary.""" + # Mirrors ``art.example.com/artifactory/github/owner/repo//subdir``. + result = parse_artifactory_path( + ["artifactory", "github", "owner", "repo", "", "subdir"] + ) + assert result[0] == "artifactory/github" + assert result[1] == "owner" + assert result[2] == "repo" + assert result[3] == "subdir" + + +class TestIterArtifactoryBoundaryCandidates: + """Test the candidate iterator that backs the install-time probe.""" + + def test_flat_path_yields_single_candidate(self): + from apm_cli.utils.github_host import iter_artifactory_boundary_candidates + + candidates = list( + iter_artifactory_boundary_candidates( + ["artifactory", "github", "owner", "repo"] + ) + ) + assert candidates == [("artifactory/github", "owner", "repo", None)] + + def test_nested_path_yields_shallow_to_deep(self): + from apm_cli.utils.github_host import iter_artifactory_boundary_candidates + + candidates = list( + iter_artifactory_boundary_candidates( + ["artifactory", "apm", "group", "sub", "repo"] + ) + ) + # Shallowest (k=1) yielded first; deepest (all-as-repo) last. + assert candidates == [ + ("artifactory/apm", "group", "sub", "repo"), + ("artifactory/apm", "group", "sub/repo", None), + ] + + def test_shape_filter_accepts_matching_suffixes(self): + from apm_cli.utils.github_host import iter_artifactory_boundary_candidates + + # Filter only accepts suffixes ending in ``.prompt.md``. + def only_prompt_md(v): + return v.endswith(".prompt.md") + + candidates = list( + iter_artifactory_boundary_candidates( + ["artifactory", "apm", "g", "a", "b", "rules.prompt.md"], + shape_filter=only_prompt_md, + ) + ) + # after_owner = [a, b, rules.prompt.md]: + # k=1 suffix='b/rules.prompt.md' -> ends in .prompt.md, accepted. + # k=2 suffix='rules.prompt.md' -> ends in .prompt.md, accepted. + # k=3 suffix=None -> filter bypassed (no-suffix fallback). + assert ("artifactory/apm", "g", "a", "b/rules.prompt.md") in candidates + assert ("artifactory/apm", "g", "a/b", "rules.prompt.md") in candidates + assert ("artifactory/apm", "g", "a/b/rules.prompt.md", None) in candidates + + def test_shape_filter_actually_rejects(self): + from apm_cli.utils.github_host import iter_artifactory_boundary_candidates + + # Reject every non-empty suffix. + candidates = list( + iter_artifactory_boundary_candidates( + ["artifactory", "apm", "g", "a", "b"], + shape_filter=lambda _v: False, + ) + ) + # All k rebuild at depth.""" + from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary + + package = "art.example.com/artifactory/apm/acme-group/shared-modules/pkg-utils" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + ok_url = "/artifactory/apm/acme-group/shared-modules/pkg-utils/" + + with patch( + "apm_cli.install.artifactory_resolver.requests.head", + side_effect=self._fake_head(ok_url), + ): + resolved = _resolve_artifactory_boundary(package, auth, verbose=False) + + assert resolved.host == "art.example.com" + assert resolved.artifactory_prefix == "artifactory/apm" + assert resolved.repo_url == "acme-group/shared-modules/pkg-utils" + assert resolved.virtual_path is None + assert resolved.is_artifactory() + + def test_resolver_picks_shallower_boundary_when_subpath_is_virtual(self): + """Shallow boundary 200s -> lock in shallow + virtual_path.""" + from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary + + package = "art.example.com/artifactory/apm/group/repo/skills/foo" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + ok_url = "/artifactory/apm/group/repo/" + + def _head(url, headers=None, timeout=None, verify=None, allow_redirects=None): + resp = MagicMock() + # Only the shallowest archive URL hits 200; deeper paths 404. + resp.status_code = ( + 200 + if ok_url in url and "/group/repo/skills" not in url + else 404 + ) + return resp + + with patch( + "apm_cli.install.artifactory_resolver.requests.head", + side_effect=_head, + ): + resolved = _resolve_artifactory_boundary(package, auth, verbose=False) + + assert resolved.repo_url == "group/repo" + assert resolved.virtual_path == "skills/foo" + assert resolved.is_virtual + + def test_resolver_no_op_for_non_proxy_routed_dep(self): + """Non-Artifactory deps without proxy mode are a no-op (unchanged ref). + + The resolver covers both routing modes (Mode 1 FQDN and Mode 2 bare + shorthand under ``PROXY_REGISTRY_ONLY``); deps that don't route through + the proxy at all (e.g. a plain ``github.com`` dep with no proxy env) + should pass through unchanged so the resolver can be called blindly + from the install pipeline. + """ + from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary + + auth = Mock() + # Clear any proxy env to make sure Mode 2 doesn't kick in here either. + with patch.dict(os.environ, {}, clear=True): + dep = DependencyReference.parse("github.com/owner/repo") + resolved = _resolve_artifactory_boundary( + "github.com/owner/repo", auth, verbose=False, dep_ref=dep + ) + assert resolved is dep + + def test_resolver_raises_when_no_candidate_matches(self): + """All-404 must raise -- deterministic failure, no silent fallback.""" + from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary + + package = "art.example.com/artifactory/apm/group/sub/repo" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + def _all_404(url, headers=None, timeout=None, verify=None, allow_redirects=None): + resp = MagicMock() + resp.status_code = 404 + return resp + + with patch( + "apm_cli.install.artifactory_resolver.requests.head", side_effect=_all_404 + ): + with pytest.raises(ValueError, match="did not resolve"): + _resolve_artifactory_boundary(package, auth, verbose=False) + + def test_resolver_distinguishes_auth_from_missing(self): + """All-401 must raise an auth-specific error, not a "not found" one. + + A misconfigured token returning 401 for every probe must surface as an + auth problem so the user knows to fix credentials, not the dep path. + """ + from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary + + package = "art.example.com/artifactory/apm/group/sub/repo" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token="bad") + + def _all_401(url, headers=None, timeout=None, verify=None, allow_redirects=None): + resp = MagicMock() + resp.status_code = 401 + return resp + + with patch( + "apm_cli.install.artifactory_resolver.requests.head", side_effect=_all_401 + ): + with pytest.raises(ValueError, match="authentication problem"): + _resolve_artifactory_boundary(package, auth, verbose=False) + + def test_resolver_returns_unchanged_for_unambiguous_path(self): + """A 4-segment path has only one candidate -- return the parse-time ref unchanged.""" + from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary + + package = "art.example.com/artifactory/github/owner/repo" + dep_ref = DependencyReference.parse(package) + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + with patch( + "apm_cli.install.artifactory_resolver.requests.head" + ) as mock_head: + resolved = _resolve_artifactory_boundary( + package, auth, verbose=False, dep_ref=dep_ref + ) + + assert resolved is dep_ref + mock_head.assert_not_called() + + def test_resolver_handles_mode_2_bare_shorthand(self): + """Bare shorthand under ``PROXY_REGISTRY_ONLY`` is probed via the registry config. + + The rebuilt ref stays bare (no ``host`` / ``artifactory_prefix`` set); + the proxy is still routed via env, not embedded in the dep ref. + """ + from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary + + package = "group/sub/repo/skills/foo" + with patch.dict( + os.environ, + { + "PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", + "PROXY_REGISTRY_ONLY": "1", + }, + clear=True, + ): + dep = DependencyReference.parse(package) + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + # Only the shallow boundary (group/sub/repo) exists. + ok_marker = "/artifactory/apm/group/sub/repo/" + + def _head(url, headers=None, timeout=None, verify=None, allow_redirects=None): + resp = MagicMock() + resp.status_code = ( + 200 + if ok_marker in url and "/group/sub/repo/skills" not in url + else 404 + ) + return resp + + with patch( + "apm_cli.install.artifactory_resolver.requests.head", side_effect=_head + ): + resolved = _resolve_artifactory_boundary( + package, auth, verbose=False, dep_ref=dep + ) + + assert resolved.repo_url == "group/sub/repo" + assert resolved.virtual_path == "skills/foo" + assert resolved.is_virtual + # Bare shorthand form preserved -- proxy routing is via env, not + # embedded as artifactory_prefix on the ref. + assert resolved.artifactory_prefix is None + assert resolved.is_artifactory() is False + + +# ── ArtifactoryOrchestrator: multi-segment repo support ── + + +class TestArtifactoryOrchestratorNestedRepo: + """``ArtifactoryOrchestrator`` keeps nested-group repo paths intact.""" + + def test_split_owner_repo_two_segments(self): + """Owner/repo split is unchanged for flat 2-segment paths.""" + from apm_cli.deps.artifactory_orchestrator import ArtifactoryOrchestrator + + dep = DependencyReference.parse( + "art.example.com/artifactory/github/owner/repo" + ) + owner, repo = ArtifactoryOrchestrator._split_owner_repo(dep) + assert owner == "owner" + assert repo == "repo" + + def test_split_owner_repo_nested_subgroup_after_probe(self): + """After a probe-rebuilt ref, subgroup segments are folded into ``repo``.""" + from apm_cli.deps.artifactory_orchestrator import ArtifactoryOrchestrator + + # Simulate the post-probe dep_ref directly (avoids needing a real proxy). + dep = DependencyReference.from_artifactory_boundary_probe( + host="art.example.com", + prefix="artifactory/apm", + owner="acme-group", + repo="shared-modules/pkg-utils", + virtual_path=None, + reference=None, + ) + owner, repo = ArtifactoryOrchestrator._split_owner_repo(dep) + assert owner == "acme-group" + assert repo == "shared-modules/pkg-utils" + # ── token_manager.py: Artifactory token support ── @@ -790,6 +1269,9 @@ def test_no_corporate_values_in_source(self): src_dir / "models" / "dependency.py", src_dir / "commands" / "install.py", src_dir / "core" / "token_manager.py", + src_dir / "install" / "artifactory_resolver.py", + src_dir / "install" / "package_resolution.py", + src_dir / "models" / "dependency" / "reference.py", ] forbidden = ["checkpoint", "chkp"] for py_file in target_files: From 385af9ead49edefd2149ad245288c86a76f2c929 Mon Sep 17 00:00:00 2001 From: chkp-roniz <35386615+chkp-roniz@users.noreply.github.com> Date: Mon, 25 May 2026 16:47:36 +0300 Subject: [PATCH 2/8] fix(artifactory): address Copilot review feedback on PR #1472 - parse_artifactory_path: reject ``owner//virtual`` (empty segment before the explicit boundary) instead of falling through and returning an empty ``repo`` slug. - _proxy_routing_target: thread the proxy ``scheme`` (https/http) through the resolver into ``build_artifactory_archive_url`` so installs that intentionally route through an ``http://`` proxy (under ``PROXY_REGISTRY_ALLOW_HTTP=1``) probe over the same transport. - _CandidateStatus.EXISTS: update inline comment to say ``2xx or 3xx`` (redirect-inclusive), matching the actual classification in ``_candidate_archive_status``. - CHANGELOG: split the parse-time-behavior note into explicit-FQDN vs bare-shorthand modes (the all-as-repo default applies only under ``PROXY_REGISTRY_ONLY``; Mode 1 stays shallow). - package_resolution: clarify the ``direct_virtual_resolved`` docstring -- GitLab path gates on ``is_virtual``, Artifactory path gates on probe rebuild; both signal "persist as structured ``git:`` + ``path:`` entry". - Add two regression tests for the empty-segment edge case (parser + iterator). --- CHANGELOG.md | 6 ++++- src/apm_cli/install/artifactory_resolver.py | 28 ++++++++++++--------- src/apm_cli/install/package_resolution.py | 13 ++++++++-- src/apm_cli/utils/github_host.py | 18 +++++++------ tests/unit/test_artifactory_support.py | 24 ++++++++++++++++++ 5 files changed, 67 insertions(+), 22 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index e2985904c..1d79f4a95 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -13,7 +13,11 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Changed -- Parse-time Artifactory boundary detection no longer uses directory-marker heuristics (`skills/`, `prompts/`, `agents/`, `collections/`, `instructions/`). The install-time resolver is authoritative for the (owner, repo, virtual_path) split; parse-time defaults to a simple all-as-repo / structural-file-extension rule. The `_VIRTUAL_PATH_ROOT_SEGMENTS`, `_ARTIFACTORY_VIRTUAL_MARKERS`, and `_ARTIFACTORY_VIRTUAL_FILE_EXTENSIONS` constants are removed. +- Parse-time Artifactory boundary detection no longer uses directory-marker heuristics (`skills/`, `prompts/`, `agents/`, `collections/`, `instructions/`). The install-time resolver is authoritative for the (owner, repo, virtual_path) split. Parse-time defaults differ by mode: + - **Explicit FQDN** (`host/artifactory/key/owner/repo/...`): `parse_artifactory_path` stays intentionally shallow -- `owner` = first segment after the prefix, `repo` = next segment, remainder = `virtual_path`. + - **Bare shorthand under `PROXY_REGISTRY_ONLY`**: `_bare_shorthand_repo_segment_count` defaults to all-as-repo with a structural file-extension rule on the last segment (a path ending in `.prompt.md`/`.instructions.md`/`.chatmode.md`/`.agent.md` is by shape a virtual file; everything before it is the repo). + + Both modes converge at install time: `_resolve_artifactory_boundary` HEAD-probes candidate splits and rebuilds the dependency reference at the proxy-verified split. The `_VIRTUAL_PATH_ROOT_SEGMENTS`, `_ARTIFACTORY_VIRTUAL_MARKERS`, and `_ARTIFACTORY_VIRTUAL_FILE_EXTENSIONS` constants are removed. ### Fixed diff --git a/src/apm_cli/install/artifactory_resolver.py b/src/apm_cli/install/artifactory_resolver.py index ae31769c4..85620bd82 100644 --- a/src/apm_cli/install/artifactory_resolver.py +++ b/src/apm_cli/install/artifactory_resolver.py @@ -61,7 +61,7 @@ class _CandidateStatus: """Probe outcome for one candidate boundary.""" - EXISTS = "exists" # at least one URL shape returned 2xx + EXISTS = "exists" # at least one URL shape returned 2xx or 3xx (existence proof, redirect inclusive) MISSING = "missing" # every URL shape returned 4xx other than 401/403 AUTH = "auth" # only 401/403 seen -- cannot tell if the repo exists @@ -75,6 +75,7 @@ def _candidate_archive_status( headers: dict[str, str], verify, timeout: int = 15, + scheme: str = "https", ) -> str: """Classify one candidate's existence by HEAD-probing every archive URL shape. @@ -82,7 +83,7 @@ def _candidate_archive_status( ``AUTH`` from ``MISSING`` is deliberate: a misconfigured token should surface a different error than a wrong owner/repo split. """ - urls = build_artifactory_archive_url(host, prefix, owner, repo, ref) + urls = build_artifactory_archive_url(host, prefix, owner, repo, ref, scheme=scheme) saw_auth = False for url in urls: try: @@ -132,12 +133,15 @@ def _proxy_probe_headers( def _proxy_routing_target( dep_ref: DependencyReference, -) -> tuple[str, str, bool] | None: - """Return ``(host, prefix, is_mode_1)`` if *dep_ref* routes through the proxy. +) -> tuple[str, str, str, bool] | None: + """Return ``(host, prefix, scheme, is_mode_1)`` if *dep_ref* routes through the proxy. - * Mode 1 (explicit FQDN): host and prefix come from the dependency ref. - * Mode 2 (bare shorthand under ``PROXY_REGISTRY_ONLY``): host and prefix - come from the registry-proxy config. + * Mode 1 (explicit FQDN): host and prefix come from the dependency ref; + scheme is always ``https`` (Mode 1 deps reject ``http://`` upstream). + * Mode 2 (bare shorthand under ``PROXY_REGISTRY_ONLY``): host, prefix, + and scheme come from the registry-proxy config, so installs that + intentionally route through an ``http://`` proxy (isolated networks + using ``PROXY_REGISTRY_ALLOW_HTTP=1``) probe over the same transport. Returns ``None`` when no proxy routing applies (regular GitHub/ADO deps) or when host/prefix are not real strings (e.g. mocked dependency refs @@ -147,7 +151,7 @@ def _proxy_routing_target( host = dep_ref.host prefix = dep_ref.artifactory_prefix if isinstance(host, str) and isinstance(prefix, str) and host and prefix: - return (host, prefix, True) + return (host, prefix, "https", True) return None from ..deps.artifactory_orchestrator import ArtifactoryRouter @@ -156,9 +160,9 @@ def _proxy_routing_target( cfg = ArtifactoryRouter.parse_proxy_config() if cfg is None: return None - host, prefix, _scheme = cfg + host, prefix, scheme = cfg if isinstance(host, str) and isinstance(prefix, str) and host and prefix: - return (host, prefix, False) + return (host, prefix, scheme or "https", False) return None @@ -186,7 +190,7 @@ def _resolve_artifactory_boundary( if target is None: # Not routed through the proxy -- nothing for this resolver to do. return dep_ref - host, prefix, is_mode_1 = target + host, prefix, scheme, is_mode_1 = target # Strip any inlined ``user:pass@host`` credentials before echoing the # package string back in error messages. Deferred until after the @@ -225,7 +229,7 @@ def _resolve_artifactory_boundary( f"/{cand_repo}#{ref}{path_suffix}" ) status = _candidate_archive_status( - host, cand_prefix, cand_owner, cand_repo, ref, headers, verify + host, cand_prefix, cand_owner, cand_repo, ref, headers, verify, scheme=scheme ) if status == _CandidateStatus.EXISTS: # If the probe confirms the parse-time boundary, return the diff --git a/src/apm_cli/install/package_resolution.py b/src/apm_cli/install/package_resolution.py index 260c2bd21..84b2119ee 100644 --- a/src/apm_cli/install/package_resolution.py +++ b/src/apm_cli/install/package_resolution.py @@ -45,8 +45,17 @@ def resolve_parsed_dependency_reference( """Parse or probe *package* into a ``DependencyReference``. Returns ``(dep_ref, direct_virtual_resolved)`` where the second flag is - True when a probe (GitLab shorthand or Artifactory boundary) rebuilt the - dep ref, so the install pipeline persists it as a structured entry. + True when the dep should be persisted as a structured ``git:`` + ``path:`` + entry in ``apm.yml`` (the canonical shorthand cannot round-trip the probed + boundary). The two probe paths gate this flag differently: + + * **GitLab shorthand** -- True only when the resolved ref is a virtual + package (``is_virtual and virtual_path``); a probe that lands on a bare + repo with no virtual path stays in canonical shorthand form. + * **Artifactory boundary** -- True whenever the probe rebuilt the ref + (parse-time guess differed from the proxy-verified split); a probe that + merely confirms the parse-time boundary keeps the original ref so + apm.yml stays in its existing shape. For Artifactory deps the optional ``resolve_artifactory_boundary`` is authoritative: it returns the proxy-verified boundary or raises -- there diff --git a/src/apm_cli/utils/github_host.py b/src/apm_cli/utils/github_host.py index 5e515f040..c41fed8c2 100644 --- a/src/apm_cli/utils/github_host.py +++ b/src/apm_cli/utils/github_host.py @@ -695,13 +695,17 @@ def parse_artifactory_path(path_segments: list) -> tuple: empty_idx = after_owner.index("") repo_parts = after_owner[:empty_idx] suffix_parts = [s for s in after_owner[empty_idx + 1 :] if s] - if repo_parts: - return ( - prefix, - owner, - "/".join(repo_parts), - "/".join(suffix_parts) if suffix_parts else None, - ) + if not repo_parts: + # ``owner//virtual`` has no segments before the explicit boundary, + # so there is no repo to install -- reject as invalid rather than + # falling through and returning ``repo=''``. + return None + return ( + prefix, + owner, + "/".join(repo_parts), + "/".join(suffix_parts) if suffix_parts else None, + ) repo = after_owner[0] virtual_path = "/".join(after_owner[1:]) if len(after_owner) > 1 else None diff --git a/tests/unit/test_artifactory_support.py b/tests/unit/test_artifactory_support.py index 8fad78874..7a548e111 100644 --- a/tests/unit/test_artifactory_support.py +++ b/tests/unit/test_artifactory_support.py @@ -145,6 +145,30 @@ def test_double_slash_subdirectory_notation(self): assert result[2] == "repo" assert result[3] == "subdir" + def test_empty_segment_before_boundary_returns_none(self): + """``owner//virtual`` has no repo before the explicit boundary -> invalid. + + Regression guard: previously fell through to ``repo=''``, producing a + malformed dep ref that broke downstream URL construction. + """ + assert ( + parse_artifactory_path( + ["artifactory", "github", "owner", "", "subdir"] + ) + is None + ) + + def test_iterator_yields_nothing_for_empty_segment_before_boundary(self): + """Same edge case at the iterator layer -- no candidate is yielded.""" + from apm_cli.utils.github_host import iter_artifactory_boundary_candidates + + candidates = list( + iter_artifactory_boundary_candidates( + ["artifactory", "github", "owner", "", "subdir"] + ) + ) + assert candidates == [] + class TestIterArtifactoryBoundaryCandidates: """Test the candidate iterator that backs the install-time probe.""" From 4ae18cab25f01a27227209dec4a0241ca3bc0f93 Mon Sep 17 00:00:00 2001 From: Copilot <223556219+Copilot@users.noreply.github.com> Date: Wed, 27 May 2026 18:18:55 +0200 Subject: [PATCH 3/8] test(artifactory): add regression traps for boundary resolver Adds focused regression-trap tests for the new install-pipeline boundary resolver (#1472) covering the contracts a future refactor could silently break: - allow_redirects=False on every HEAD probe (token-leak guard against proxy-issued cross-host redirects) - Mode 2 Authorization header is sourced from RegistryConfig (proxy bearer), never from AuthResolver (which returns the github.com PAT and would leak it to the proxy host) - 403 (like 401) classifies as AUTH; mixed 401/non-auth-4xx demotes the candidate set to MISSING so the user-facing error stays accurate (avoids misreporting a missing repo as auth failure) - 429 and other non-{401,403} 4xx are MISSING, not AUTH (guards against accidental broadening of the AUTH set) - _split_owner_repo subgroup folding for 3+ segment GitLab boundaries (group/sub1/sub2/repo correctly folds to owner=group, repo=sub1/sub2/repo) Both mutation-break gates were exercised: flipping allow_redirects to True and rewiring Mode 2 to AuthResolver each made the corresponding test fail; restoring the source code made them pass. Also applies ruff format pass to bring three files in line with the canonical formatter (pre-existing drift on the PR branch that would have failed CI lint check). Co-authored-by: chkp-roniz Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- src/apm_cli/install/artifactory_resolver.py | 12 +- .../deps/test_artifactory_orchestrator.py | 43 ++++ .../unit/install/test_artifactory_resolver.py | 208 ++++++++++++++++++ tests/unit/test_artifactory_support.py | 81 +++---- 4 files changed, 288 insertions(+), 56 deletions(-) create mode 100644 tests/unit/install/test_artifactory_resolver.py diff --git a/src/apm_cli/install/artifactory_resolver.py b/src/apm_cli/install/artifactory_resolver.py index 85620bd82..ed7c60f73 100644 --- a/src/apm_cli/install/artifactory_resolver.py +++ b/src/apm_cli/install/artifactory_resolver.py @@ -61,7 +61,9 @@ class _CandidateStatus: """Probe outcome for one candidate boundary.""" - EXISTS = "exists" # at least one URL shape returned 2xx or 3xx (existence proof, redirect inclusive) + EXISTS = ( + "exists" # at least one URL shape returned 2xx or 3xx (existence proof, redirect inclusive) + ) MISSING = "missing" # every URL shape returned 4xx other than 401/403 AUTH = "auth" # only 401/403 seen -- cannot tell if the repo exists @@ -208,9 +210,7 @@ def _resolve_artifactory_boundary( candidates = list(iter_artifactory_boundary_candidates(path_segments)) if not candidates: - raise ValueError( - f"Artifactory dep '{safe_package}' has no plausible owner/repo split" - ) + raise ValueError(f"Artifactory dep '{safe_package}' has no plausible owner/repo split") if len(candidates) == 1: # Single candidate -- the parse-time dep_ref is already definitive. return dep_ref @@ -240,7 +240,9 @@ def _resolve_artifactory_boundary( and (dep_ref.virtual_path or None) == cand_virtual ): return dep_ref - return _rebuild_dep_ref(dep_ref, host, cand_prefix, cand_owner, cand_repo, cand_virtual, is_mode_1) + return _rebuild_dep_ref( + dep_ref, host, cand_prefix, cand_owner, cand_repo, cand_virtual, is_mode_1 + ) if status == _CandidateStatus.MISSING: all_auth = False diff --git a/tests/unit/deps/test_artifactory_orchestrator.py b/tests/unit/deps/test_artifactory_orchestrator.py index 99e7954b5..a07a93328 100644 --- a/tests/unit/deps/test_artifactory_orchestrator.py +++ b/tests/unit/deps/test_artifactory_orchestrator.py @@ -246,3 +246,46 @@ def test_archive_runtime_error_propagates(self, tmp_path): ) with pytest.raises(RuntimeError, match=r"404"): orch.download_package(dep, tmp_path / "pkg") + + +# --------------------------------------------------------------------------- +# ArtifactoryOrchestrator._split_owner_repo subgroup folding +# --------------------------------------------------------------------------- + + +class TestSplitOwnerRepoSubgroupFolding: + """``_split_owner_repo`` must fold every segment past the owner into ``repo``. + + Regression trap for nested GitLab subgroup paths behind an Artifactory + proxy (#1498): a 4+ segment ``group/sub1/sub2/repo`` boundary -- as + rebuilt by the boundary resolver -- must produce ``owner=group`` and + ``repo=sub1/sub2/repo``. Truncating to two segments here would silently + install the wrong upstream project. + """ + + @pytest.mark.parametrize( + "repo_url,expected_owner,expected_repo", + [ + ("group/repo", "group", "repo"), + ("group/sub/repo", "group", "sub/repo"), + ("group/sub1/sub2/repo", "group", "sub1/sub2/repo"), + ("group/a/b/c/d/repo", "group", "a/b/c/d/repo"), + ], + ) + def test_subgroup_folding(self, repo_url, expected_owner, expected_repo): + dep = DependencyReference.from_artifactory_boundary_probe( + host="art.example.com", + prefix="artifactory/apm", + owner=repo_url.split("/", 1)[0], + repo=repo_url.split("/", 1)[1], + virtual_path=None, + reference=None, + ) + owner, repo = ArtifactoryOrchestrator._split_owner_repo(dep) + assert (owner, repo) == (expected_owner, expected_repo) + + def test_single_segment_rejected(self): + """A bare repo with no owner segment is malformed and must raise.""" + bad = types.SimpleNamespace(repo_url="onlyrepo") + with pytest.raises(ValueError, match=r"expected 'owner/repo' format"): + ArtifactoryOrchestrator._split_owner_repo(bad) diff --git a/tests/unit/install/test_artifactory_resolver.py b/tests/unit/install/test_artifactory_resolver.py new file mode 100644 index 000000000..f2f45b61a --- /dev/null +++ b/tests/unit/install/test_artifactory_resolver.py @@ -0,0 +1,208 @@ +"""Regression-trap tests for ``apm_cli.install.artifactory_resolver``. + +This file extends the broader Artifactory-boundary coverage in +``tests/unit/test_artifactory_support.py`` with focused regression traps for +the security and audience-separation contracts that a refactor could +silently break: + +* ``allow_redirects=False`` on every HEAD probe (token-leak guard against + proxy-issued cross-host redirects). +* Mode 2 ``Authorization`` header is sourced from + :class:`apm_cli.deps.registry_proxy.RegistryConfig` (the proxy bearer), + NOT from :class:`apm_cli.core.auth.AuthResolver` (which would hand back + the github.com PAT and leak it to the proxy host). +* 403 -- like 401 -- is classified as ``AUTH``; mixed 401/404 demotes the + candidate set to ``MISSING`` so the user-facing error stays accurate. + +These are paired with mutation-break gates in the originating shepherd +session: each test was confirmed to FAIL when the contract under test was +intentionally removed in the source. +""" + +from __future__ import annotations + +import os +from unittest.mock import MagicMock, Mock, patch + +import pytest + +from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary +from apm_cli.models.apm_package import DependencyReference + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +def _make_head_recorder(status_code: int = 404): + """Build a fake ``requests.head`` that records every call. + + Returns ``(recorder_list, fake_head)``. Each entry in ``recorder_list`` + is the keyword-argument dict the resolver passed to ``requests.head``. + """ + recorded: list[dict] = [] + + def _head(url, headers=None, timeout=None, verify=None, allow_redirects=None): + recorded.append( + { + "url": url, + "headers": dict(headers or {}), + "timeout": timeout, + "verify": verify, + "allow_redirects": allow_redirects, + } + ) + resp = MagicMock() + resp.status_code = status_code + return resp + + return recorded, _head + + +# --------------------------------------------------------------------------- +# allow_redirects=False mutation-break guard +# --------------------------------------------------------------------------- + + +class TestRedirectLeakGuard: + """``allow_redirects=False`` must hold for every probe. + + Flipping it to ``True`` would let the proxy redirect the resolver to a + different host while the Bearer token rides along -- a cross-host token + leak. This is the single highest-severity invariant in the module. + """ + + def test_every_probe_disables_redirects(self): + """Probe every candidate of an ambiguous Mode 1 dep; assert each call + was made with ``allow_redirects=False``.""" + package = "art.example.com/artifactory/apm/group/sub/repo" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + recorded, fake_head = _make_head_recorder(status_code=404) + with patch("apm_cli.install.artifactory_resolver.requests.head", side_effect=fake_head): + with pytest.raises(ValueError): + _resolve_artifactory_boundary(package, auth, verbose=False) + + # Sanity: the resolver actually probed something (otherwise the + # assertion below would pass vacuously and the mutation-break gate + # would be useless). + assert recorded, "expected at least one HEAD probe" + for call in recorded: + assert call["allow_redirects"] is False, ( + "allow_redirects must be False on every probe to prevent " + "cross-host Bearer-token leakage via proxy-issued redirects" + ) + + +# --------------------------------------------------------------------------- +# Mode 2 token-audience guard +# --------------------------------------------------------------------------- + + +class TestModeTwoTokenAudience: + """Mode 2 (bare shorthand under ``PROXY_REGISTRY_ONLY``) must source the + probe ``Authorization`` header from :class:`RegistryConfig.from_env`, + NOT from :class:`AuthResolver` (which would return the github.com PAT + and leak it to the proxy host). + """ + + def test_authorization_header_is_proxy_bearer_not_github_pat(self): + """Both a github.com PAT (via AuthResolver) and a proxy bearer (via + ``PROXY_REGISTRY_TOKEN``) are set. The HEAD probe must carry the + proxy bearer, never the PAT. + """ + package = "group/sub/repo" + with patch.dict( + os.environ, + { + "PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", + "PROXY_REGISTRY_ONLY": "1", + "PROXY_REGISTRY_TOKEN": "proxy-bearer-PROXY", + }, + clear=True, + ): + dep = DependencyReference.parse(package) + + # Wire AuthResolver to hand back a github.com PAT. If the + # resolver wrongly consults it, this token will end up in the + # outgoing Authorization header. + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token="github-pat-LEAK") + + recorded, fake_head = _make_head_recorder(status_code=404) + with patch( + "apm_cli.install.artifactory_resolver.requests.head", + side_effect=fake_head, + ): + with pytest.raises(ValueError): + _resolve_artifactory_boundary(package, auth, verbose=False, dep_ref=dep) + + assert recorded, "expected at least one HEAD probe" + for call in recorded: + auth_header = call["headers"].get("Authorization", "") + assert "github-pat-LEAK" not in auth_header, ( + "Mode 2 must not carry the github.com PAT to the proxy host" + ) + assert auth_header == "Bearer proxy-bearer-PROXY", ( + f"Mode 2 Authorization must come from RegistryConfig (got {auth_header!r})" + ) + # And the github PAT path must never have been consulted. + auth.resolve_for_dep.assert_not_called() + + +# --------------------------------------------------------------------------- +# 401 vs 403 vs other-4xx error discrimination +# --------------------------------------------------------------------------- + + +class TestErrorDiscrimination: + """401/403 -> AUTH; any non-auth 4xx demotes the candidate set to MISSING.""" + + def _run_with_status_map(self, status_for_url): + package = "art.example.com/artifactory/apm/group/sub/repo" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token="t") + + def _head(url, headers=None, timeout=None, verify=None, allow_redirects=None): + resp = MagicMock() + resp.status_code = status_for_url(url) + return resp + + with patch("apm_cli.install.artifactory_resolver.requests.head", side_effect=_head): + with pytest.raises(ValueError) as excinfo: + _resolve_artifactory_boundary(package, auth, verbose=False) + return str(excinfo.value) + + def test_all_403_raises_auth_specific_error(self): + """403 must classify as AUTH, same as 401.""" + msg = self._run_with_status_map(lambda _url: 403) + assert "authentication problem" in msg + + def test_mixed_401_and_404_demotes_to_unresolved(self): + """If any candidate returns a non-auth 4xx, the set is MISSING -- the + error must be the unresolved-boundary one, not the auth-specific one. + Otherwise a real 404 on one of several candidates would be misreported + as an auth failure and the user would chase the wrong fix. + """ + # First call (shallow candidate) returns 401, all others 404. + state = {"calls": 0} + + def status_for(url: str) -> int: + state["calls"] += 1 + return 401 if state["calls"] == 1 else 404 + + msg = self._run_with_status_map(status_for) + assert "did not resolve" in msg + assert "authentication problem" not in msg + + def test_429_is_not_auth(self): + """A non-auth status code (e.g. 429 Too Many Requests) returned on + the HEAD probe must not be classified as AUTH; the response code + only means ``MISSING`` because the resolver could not get an + existence proof. Guards against accidental broadening of the + AUTH set beyond {401, 403}. + """ + msg = self._run_with_status_map(lambda _url: 429) + assert "did not resolve" in msg + assert "authentication problem" not in msg diff --git a/tests/unit/test_artifactory_support.py b/tests/unit/test_artifactory_support.py index 7a548e111..c96c7c024 100644 --- a/tests/unit/test_artifactory_support.py +++ b/tests/unit/test_artifactory_support.py @@ -137,9 +137,7 @@ def test_nested_subgroup_with_virtual_file_extension(self): def test_double_slash_subdirectory_notation(self): """The ``//subdir`` notation explicitly marks the repo/virtual boundary.""" # Mirrors ``art.example.com/artifactory/github/owner/repo//subdir``. - result = parse_artifactory_path( - ["artifactory", "github", "owner", "repo", "", "subdir"] - ) + result = parse_artifactory_path(["artifactory", "github", "owner", "repo", "", "subdir"]) assert result[0] == "artifactory/github" assert result[1] == "owner" assert result[2] == "repo" @@ -151,21 +149,14 @@ def test_empty_segment_before_boundary_returns_none(self): Regression guard: previously fell through to ``repo=''``, producing a malformed dep ref that broke downstream URL construction. """ - assert ( - parse_artifactory_path( - ["artifactory", "github", "owner", "", "subdir"] - ) - is None - ) + assert parse_artifactory_path(["artifactory", "github", "owner", "", "subdir"]) is None def test_iterator_yields_nothing_for_empty_segment_before_boundary(self): """Same edge case at the iterator layer -- no candidate is yielded.""" from apm_cli.utils.github_host import iter_artifactory_boundary_candidates candidates = list( - iter_artifactory_boundary_candidates( - ["artifactory", "github", "owner", "", "subdir"] - ) + iter_artifactory_boundary_candidates(["artifactory", "github", "owner", "", "subdir"]) ) assert candidates == [] @@ -177,9 +168,7 @@ def test_flat_path_yields_single_candidate(self): from apm_cli.utils.github_host import iter_artifactory_boundary_candidates candidates = list( - iter_artifactory_boundary_candidates( - ["artifactory", "github", "owner", "repo"] - ) + iter_artifactory_boundary_candidates(["artifactory", "github", "owner", "repo"]) ) assert candidates == [("artifactory/github", "owner", "repo", None)] @@ -187,9 +176,7 @@ def test_nested_path_yields_shallow_to_deep(self): from apm_cli.utils.github_host import iter_artifactory_boundary_candidates candidates = list( - iter_artifactory_boundary_candidates( - ["artifactory", "apm", "group", "sub", "repo"] - ) + iter_artifactory_boundary_candidates(["artifactory", "apm", "group", "sub", "repo"]) ) # Shallowest (k=1) yielded first; deepest (all-as-repo) last. assert candidates == [ @@ -478,7 +465,10 @@ def test_three_segment_shorthand_kept_whole(self): """``group/subgroup/repo`` is a nested-group project, not a subdir.""" with patch.dict( os.environ, - {"PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", "PROXY_REGISTRY_ONLY": "1"}, + { + "PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", + "PROXY_REGISTRY_ONLY": "1", + }, clear=False, ): dep = DependencyReference.parse("acme-group/shared-modules/pkg-utils") @@ -499,7 +489,10 @@ def test_three_segment_with_marker_segment_defers_to_resolver(self): """ with patch.dict( os.environ, - {"PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", "PROXY_REGISTRY_ONLY": "1"}, + { + "PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", + "PROXY_REGISTRY_ONLY": "1", + }, clear=False, ): dep = DependencyReference.parse("group/sub/repo/skills/cve-patcher") @@ -518,7 +511,10 @@ def test_three_segment_with_virtual_file_ext_still_virtual(self): """ with patch.dict( os.environ, - {"PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", "PROXY_REGISTRY_ONLY": "1"}, + { + "PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", + "PROXY_REGISTRY_ONLY": "1", + }, clear=False, ): dep = DependencyReference.parse("group/sub/repo/rules.prompt.md") @@ -530,7 +526,10 @@ def test_two_segment_shorthand_unchanged(self): """Two-segment shorthand keeps the GitHub ``owner/repo`` shape.""" with patch.dict( os.environ, - {"PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", "PROXY_REGISTRY_ONLY": "1"}, + { + "PROXY_REGISTRY_URL": "https://art.example.com/artifactory/apm", + "PROXY_REGISTRY_ONLY": "1", + }, clear=False, ): dep = DependencyReference.parse("microsoft/apm-sample-package") @@ -611,11 +610,7 @@ def test_resolver_picks_shallower_boundary_when_subpath_is_virtual(self): def _head(url, headers=None, timeout=None, verify=None, allow_redirects=None): resp = MagicMock() # Only the shallowest archive URL hits 200; deeper paths 404. - resp.status_code = ( - 200 - if ok_url in url and "/group/repo/skills" not in url - else 404 - ) + resp.status_code = 200 if ok_url in url and "/group/repo/skills" not in url else 404 return resp with patch( @@ -661,9 +656,7 @@ def _all_404(url, headers=None, timeout=None, verify=None, allow_redirects=None) resp.status_code = 404 return resp - with patch( - "apm_cli.install.artifactory_resolver.requests.head", side_effect=_all_404 - ): + with patch("apm_cli.install.artifactory_resolver.requests.head", side_effect=_all_404): with pytest.raises(ValueError, match="did not resolve"): _resolve_artifactory_boundary(package, auth, verbose=False) @@ -684,9 +677,7 @@ def _all_401(url, headers=None, timeout=None, verify=None, allow_redirects=None) resp.status_code = 401 return resp - with patch( - "apm_cli.install.artifactory_resolver.requests.head", side_effect=_all_401 - ): + with patch("apm_cli.install.artifactory_resolver.requests.head", side_effect=_all_401): with pytest.raises(ValueError, match="authentication problem"): _resolve_artifactory_boundary(package, auth, verbose=False) @@ -699,12 +690,8 @@ def test_resolver_returns_unchanged_for_unambiguous_path(self): auth = Mock() auth.resolve_for_dep.return_value = Mock(token=None) - with patch( - "apm_cli.install.artifactory_resolver.requests.head" - ) as mock_head: - resolved = _resolve_artifactory_boundary( - package, auth, verbose=False, dep_ref=dep_ref - ) + with patch("apm_cli.install.artifactory_resolver.requests.head") as mock_head: + resolved = _resolve_artifactory_boundary(package, auth, verbose=False, dep_ref=dep_ref) assert resolved is dep_ref mock_head.assert_not_called() @@ -736,18 +723,12 @@ def test_resolver_handles_mode_2_bare_shorthand(self): def _head(url, headers=None, timeout=None, verify=None, allow_redirects=None): resp = MagicMock() resp.status_code = ( - 200 - if ok_marker in url and "/group/sub/repo/skills" not in url - else 404 + 200 if ok_marker in url and "/group/sub/repo/skills" not in url else 404 ) return resp - with patch( - "apm_cli.install.artifactory_resolver.requests.head", side_effect=_head - ): - resolved = _resolve_artifactory_boundary( - package, auth, verbose=False, dep_ref=dep - ) + with patch("apm_cli.install.artifactory_resolver.requests.head", side_effect=_head): + resolved = _resolve_artifactory_boundary(package, auth, verbose=False, dep_ref=dep) assert resolved.repo_url == "group/sub/repo" assert resolved.virtual_path == "skills/foo" @@ -768,9 +749,7 @@ def test_split_owner_repo_two_segments(self): """Owner/repo split is unchanged for flat 2-segment paths.""" from apm_cli.deps.artifactory_orchestrator import ArtifactoryOrchestrator - dep = DependencyReference.parse( - "art.example.com/artifactory/github/owner/repo" - ) + dep = DependencyReference.parse("art.example.com/artifactory/github/owner/repo") owner, repo = ArtifactoryOrchestrator._split_owner_repo(dep) assert owner == "owner" assert repo == "repo" From 301e0a92ff4589e29f3ca3197163c4248407eff1 Mon Sep 17 00:00:00 2001 From: Copilot <223556219+Copilot@users.noreply.github.com> Date: Wed, 27 May 2026 21:28:42 +0200 Subject: [PATCH 4/8] fix(artifactory): route resolver verbose output through CommandLogger + cite // bypass on inconclusive Folds two panel follow-ups on PR #1472: - Verbose probe output now flows through CommandLogger.verbose_detail() when a logger is available, so it honors the shared rich/no-rich console path and respects -v/--verbose self-gating like every other install-time detail emitter. Falls back to print() for callers that pass verbose=True without threading a logger (legacy unit tests). - The INCONCLUSIVE error message (transport error on every candidate URL) now cites the // boundary marker, matching the wording already present on the UNRESOLVED message. Users who hit a flaky proxy now learn about the explicit-boundary escape hatch from the message they actually saw, not just the parallel one. addresses CEO follow-ups FU-3 and FU-4 Co-authored-by: chkp-roniz <35386615+chkp-roniz@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- src/apm_cli/commands/install.py | 1 + src/apm_cli/install/artifactory_resolver.py | 16 +++++++++++++--- src/apm_cli/install/package_resolution.py | 2 ++ 3 files changed, 16 insertions(+), 3 deletions(-) diff --git a/src/apm_cli/commands/install.py b/src/apm_cli/commands/install.py index 676a55f03..aebb763b8 100644 --- a/src/apm_cli/commands/install.py +++ b/src/apm_cli/commands/install.py @@ -470,6 +470,7 @@ def warning_handler(msg): resolve_artifactory_boundary=_resolve_artifactory_boundary, auth_resolver=auth_resolver, verbose=bool(logger and logger.verbose), + logger=logger, ) canonical = dep_ref.to_canonical() identity = dep_ref.get_identity() diff --git a/src/apm_cli/install/artifactory_resolver.py b/src/apm_cli/install/artifactory_resolver.py index af245831b..c87ee223e 100644 --- a/src/apm_cli/install/artifactory_resolver.py +++ b/src/apm_cli/install/artifactory_resolver.py @@ -188,6 +188,7 @@ def _resolve_artifactory_boundary( verbose: bool = False, *, dep_ref: DependencyReference | None = None, + logger=None, ) -> DependencyReference: """Definitively resolve the (owner, repo, virtual_path) boundary on the proxy. @@ -239,10 +240,18 @@ def _resolve_artifactory_boundary( for cand_prefix, cand_owner, cand_repo, cand_virtual in candidates: if verbose: path_suffix = f" [path: {cand_virtual}]" if cand_virtual else "" - print( + probe_msg = ( f" artifactory-resolve: probing {host}/{cand_prefix}/{cand_owner}" f"/{cand_repo}#{ref}{path_suffix}" ) + # Route through CommandLogger when available so verbose output + # honors the shared console/theme path (verbose_detail self-gates + # on logger.verbose). Fall back to print for callers that pass + # verbose=True without a logger (e.g. legacy unit tests). + if logger is not None and hasattr(logger, "verbose_detail"): + logger.verbose_detail(probe_msg) + else: + print(probe_msg) status, exc = _candidate_archive_status( host, cand_prefix, cand_owner, cand_repo, ref, headers, verify, scheme=scheme ) @@ -275,8 +284,9 @@ def _resolve_artifactory_boundary( raise ValueError( f"Artifactory boundary probe could not reach the proxy for any " f"candidate (last error: {last_inconclusive_exc}). " - f"Verify network reachability and TLS trust to {host}. " - f"(package: {safe_package})" + f"Verify network reachability and TLS trust to {host}; the ``//`` " + f"notation can mark the repo/virtual boundary explicitly when the " + f"proxy is unavailable. (package: {safe_package})" ) if all_auth: raise ValueError(f"{_ARTIFACTORY_BOUNDARY_AUTH} (package: {safe_package})") diff --git a/src/apm_cli/install/package_resolution.py b/src/apm_cli/install/package_resolution.py index 84b2119ee..e7ae2ef25 100644 --- a/src/apm_cli/install/package_resolution.py +++ b/src/apm_cli/install/package_resolution.py @@ -41,6 +41,7 @@ def resolve_parsed_dependency_reference( auth_resolver: Any, verbose: bool, resolve_artifactory_boundary: Callable[..., Any] | None = None, + logger: Any = None, ) -> tuple[Any, bool]: """Parse or probe *package* into a ``DependencyReference``. @@ -93,6 +94,7 @@ def resolve_parsed_dependency_reference( auth_resolver, verbose=verbose, dep_ref=dep_ref, + logger=logger, ) if resolved is not dep_ref: return resolved, True From d359a0a0b8af80947008ad4d4ba7c51035804a6d Mon Sep 17 00:00:00 2001 From: Copilot <223556219+Copilot@users.noreply.github.com> Date: Wed, 27 May 2026 21:29:00 +0200 Subject: [PATCH 5/8] docs(artifactory): note proxy-probe lifts nested-shorthand ambiguity Folds the doc-drift follow-up from the apm-review-panel CEO on PR #1472. Both pages claimed that nested-group GitLab repos require the object form because shorthand is ambiguous on >2-segment paths. That claim is still true for direct (non-proxy) GitLab but is no longer true for deps that route through a registry proxy: this PR adds an install-time boundary probe that HEAD-walks candidate splits against the proxy and deterministically resolves the repo/virtual-path boundary, so nested shorthand works without the object form. - packages/apm-guide/.apm/skills/apm-usage/dependencies.md: add the registry-proxy exception sentence to the existing nested-group note. - docs/src/content/docs/reference/manifest-schema.md: scope the object-form REQUIRED clause to direct nested-group access and link the registry-proxy guide for the proxy-routed exception. addresses CEO follow-up FU-2 Co-authored-by: chkp-roniz <35386615+chkp-roniz@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- docs/src/content/docs/reference/manifest-schema.md | 2 +- packages/apm-guide/.apm/skills/apm-usage/dependencies.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/src/content/docs/reference/manifest-schema.md b/docs/src/content/docs/reference/manifest-schema.md index 91e84af64..66adbf143 100644 --- a/docs/src/content/docs/reference/manifest-schema.md +++ b/docs/src/content/docs/reference/manifest-schema.md @@ -358,7 +358,7 @@ dependencies: #### 4.1.2. Object Form -REQUIRED when the shorthand is ambiguous (e.g. nested-group repos with virtual paths). +REQUIRED when the shorthand is ambiguous (e.g. direct nested-group repos with virtual paths). NOT required for nested-group deps that route through a registry proxy (explicit `host/artifactory//...` FQDN, or bare shorthand under `PROXY_REGISTRY_URL` + `PROXY_REGISTRY_ONLY=1`): the install-time boundary probe HEAD-walks candidate splits against the proxy and locks in the first one whose archive responds. See [Registry proxy guide](../../enterprise/registry-proxy/#nested-group-repos-gitlab-subgroups-behind-the-proxy). | Field | Type | Required | Pattern / Constraint | Description | |---|---|---|---|---| diff --git a/packages/apm-guide/.apm/skills/apm-usage/dependencies.md b/packages/apm-guide/.apm/skills/apm-usage/dependencies.md index 11ad4f2d0..fd52a12b4 100644 --- a/packages/apm-guide/.apm/skills/apm-usage/dependencies.md +++ b/packages/apm-guide/.apm/skills/apm-usage/dependencies.md @@ -133,7 +133,7 @@ Virtual packages reference a subset of a repository. Classification is by extension only. A path like `owner/repo/collections/security` (no extension) is a Subdirectory; the actual shape -- APM package (incl. dep-only `apm.yml` with no `.apm/`), skill bundle, or plugin -- is resolved at fetch time by probing for `apm.yml`. -**Gitea and Gogs (self-hosted or vendor-hosted):** virtual packages resolve via the host's `/{owner}/{repo}/raw/{ref}/{path}` URL first, then fall back to the Contents API (v1 native, v3 Gogs-compat). GitLab nested-group repos (`group/subgroup/repo`) require the object form (`git: `, `path: `) -- shorthand is ambiguous on >2-segment paths. +**Gitea and Gogs (self-hosted or vendor-hosted):** virtual packages resolve via the host's `/{owner}/{repo}/raw/{ref}/{path}` URL first, then fall back to the Contents API (v1 native, v3 Gogs-compat). Direct GitLab nested-group repos (`group/subgroup/repo`) require the object form (`git: `, `path: `) -- shorthand is ambiguous on >2-segment paths. **Exception:** when the dep routes through a registry proxy (explicit `host/artifactory//...` FQDN, or bare shorthand under `PROXY_REGISTRY_URL` + `PROXY_REGISTRY_ONLY=1`), the install-time boundary probe HEAD-walks the candidate splits against the proxy and locks in the first one whose archive responds, so nested-group shorthand works without the object form (#1472). > **Removed (#1094):** the legacy `.collection.yml` / `.collection.yaml` virtual-package form is no longer supported. Convert any `.collection.yml` to an `apm.yml` with a `dependencies:` section, then reference the resulting subdirectory as a regular subdirectory virtual package. From d2041cde2f370a8683ae81bdf478685a20566c51 Mon Sep 17 00:00:00 2001 From: Copilot <223556219+Copilot@users.noreply.github.com> Date: Wed, 27 May 2026 21:29:16 +0200 Subject: [PATCH 6/8] docs(changelog): rewrite #1472 fixed bullet in user voice Folds the CEO follow-up to make the Fixed entry release-notes-ready. The previous wording named internal symbols (parse_artifactory_path, build_artifactory_archive_url, _detect_virtual_package, ArtifactoryOrchestrator ._split_owner_repo) -- accurate but unreadable for a release-notes reader. New wording: leads with what the user can now do (apm install against a proxy works for nested-group repos), names the concrete invocation shapes (explicit FQDN and bare shorthand under PROXY_REGISTRY_ONLY), explains the auth-vs-missing distinction and the no-redirect token-leak guard at a level release-notes readers can act on, and surfaces the // escape hatch. addresses CEO follow-up FU-5 Co-authored-by: chkp-roniz <35386615+chkp-roniz@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- CHANGELOG.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 84afc46cc..4237f1791 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -24,7 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Fixed -- `apm install` through a registry proxy now supports nested-group repos (3+ path segments, e.g. `group/subgroup/project`). Previously every trailing segment past `owner/repo` was treated as an in-repo virtual sub-path, so the downloader requested the wrong archive URL and the install failed with HTTP 404 from the proxy. Behavior is gated on `PROXY_REGISTRY_ONLY` so direct (non-proxy) installs keep the legacy two-segment shape. Affects `parse_artifactory_path`, `build_artifactory_archive_url`, `_detect_virtual_package`, the shorthand resolvers in `DependencyReference`, and `ArtifactoryOrchestrator._split_owner_repo` / `download_subdirectory`. (#1472) +- `apm install` against a registry proxy now works for GitLab nested-group repos (3+ path segments, e.g. `group/subgroup/project`). Previously the proxy resolver guessed `owner/repo` from the first two path segments and treated the rest as an in-repo virtual sub-path, so the downloader requested the wrong archive URL and the install failed with HTTP 404. The new install-time probe HEAD-walks candidate splits against the proxy and locks in the first one whose archive responds, so nested shorthand (`apm install /artifactory////` or the bare-shorthand form under `PROXY_REGISTRY_URL` + `PROXY_REGISTRY_ONLY=1`) just works. The probe distinguishes auth (401/403) from missing-repo (4xx) so a misconfigured token surfaces as an auth problem instead of a "missing repo", and runs with `allow_redirects=False` so the bearer token cannot follow a redirect off the proxy host. When the proxy is unreachable, the `//` notation can mark the repo/virtual boundary explicitly as an escape hatch. (#1472) - URL-form Artifactory deps no longer round-trip with the `artifactory/` prefix folded into `repo_url`. The duplicated prefix caused the downloader to construct double-prefixed archive URLs (`/artifactory/key/artifactory/key/owner/repo/...`) and 404. `_validate_url_repo_path` now strips the Artifactory VCS prefix before returning the bare `owner/repo` slug; the prefix is still recovered separately via `_extract_artifactory_prefix`. (#1472) - `apm install --update` now re-resolves direct git-source semver dependencies. Previously, when the dependency's install path already existed on disk, the BFS resolver short-circuited and `--update` was a silent no-op for git-semver refs; the lockfile kept the previously-resolved tag. - `policy.dependencies.require_pinned_constraint: true` no longer misclassifies the npm- and cargo-style explicit-equality form `=1.2.3` as `BARE_BRANCH`. Both `1.2.3` and `=1.2.3` are now recognized as pinned constraints; the pip-style `==1.2.3` form is still rejected (not part of node-semver). Follow-up to #1494 / #1505. From 987f2a55976e68c4268b8b1e17ea6648ab17b88b Mon Sep 17 00:00:00 2001 From: Copilot <223556219+Copilot@users.noreply.github.com> Date: Wed, 27 May 2026 21:32:51 +0200 Subject: [PATCH 7/8] test(artifactory): pipeline integration trap for probe -> rebuild -> yaml entry Folds the CEO follow-up calling for a full-pipeline regression trap on PR #1472. The existing unit tests cover the resolver in isolation; this file traps the cross-module chain that ships into apm.yml / lockfile: resolve_parsed_dependency_reference -> _resolve_artifactory_boundary (HEAD probe, mocked) -> _rebuild_dep_ref (proxy-verified boundary) -> dependency_reference_to_yaml_entry (structured git+path entry) Five scenarios: - Mode 1 (explicit FQDN), nested-path: probe locks in owner+repo split, pipeline marks the dep for structured-entry persistence, the yaml entry embeds the proxy host + artifactory/ prefix + virtual path. - Mode 1, unambiguous two-segment path: single-candidate short-circuit is preserved (no HEAD-probe traffic, apm.yml stays in original shape). - Mode 2 (bare shorthand under PROXY_REGISTRY_ONLY), nested-path: probe locks in the boundary but rebuilt ref stays bare (no artifactory_prefix embedded; host stays github.com), so lockfile shape stays env-portable. - All-404 across candidates: raises UNRESOLVED with the // bypass hint. - All-transport-error across candidates: raises INCONCLUSIVE with the // bypass hint AND surfaces 'could not reach the proxy' so the user knows the failure mode is reachability, not a missing repo. Mutation-break gate (run locally; not in the test file): removing the new // hint from the INCONCLUSIVE message makes the 'test_transport_errors_raise_inconclusive_with_bypass_hint' assertion fail (Regex pattern did not match), confirming the test is a real trap and not a logic-replay assertion. addresses CEO follow-up FU-1 Co-authored-by: chkp-roniz <35386615+chkp-roniz@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../test_artifactory_probe_pipeline_e2e.py | 245 ++++++++++++++++++ 1 file changed, 245 insertions(+) create mode 100644 tests/integration/test_artifactory_probe_pipeline_e2e.py diff --git a/tests/integration/test_artifactory_probe_pipeline_e2e.py b/tests/integration/test_artifactory_probe_pipeline_e2e.py new file mode 100644 index 000000000..d1996c6cd --- /dev/null +++ b/tests/integration/test_artifactory_probe_pipeline_e2e.py @@ -0,0 +1,245 @@ +"""Integration coverage for the Artifactory boundary-probe install pipeline. + +End-to-end-ish trap for PR #1472. The unit suites in +``tests/unit/install/test_artifactory_resolver.py`` and +``tests/unit/test_artifactory_support.py`` already exercise the resolver in +isolation. This file traps the FULL install-side pipeline: + + resolve_parsed_dependency_reference + -> _resolve_artifactory_boundary (HEAD probe, mocked) + -> _rebuild_dep_ref (proxy-verified boundary) + -> dependency_reference_to_yaml_entry + (structured git+path entry shape for apm.yml / lockfile) + +The HTTP layer is the only thing mocked; everything else is real code on +the same call chain ``apm install`` would take. This is the regression +trap the apm-review-panel CEO asked for as a follow-up to #1472: a single +test that detects silent breakage anywhere in the probe-to-lockfile chain +(parse -> probe -> rebuild -> serialize) instead of only at the resolver +boundary. +""" + +from __future__ import annotations + +from unittest.mock import MagicMock, Mock, patch + +import pytest + +from apm_cli.install.artifactory_resolver import _resolve_artifactory_boundary +from apm_cli.install.package_resolution import ( + dependency_reference_to_yaml_entry, + resolve_parsed_dependency_reference, +) +from apm_cli.models.apm_package import DependencyReference + + +def _scripted_head(boundary_owner: str, boundary_repo: str): + """Build a fake ``requests.head`` that returns 200 only for one boundary. + + Every other candidate URL gets a 404. The boundary is identified by the + ``owner/repo`` slug appearing in the probed URL path, which is exactly + how the resolver tells candidates apart. + """ + + def _head(url, headers=None, timeout=None, verify=None, allow_redirects=None): + resp = MagicMock() + marker = f"/{boundary_owner}/{boundary_repo}/" + resp.status_code = 200 if marker in url else 404 + return resp + + return _head + + +class TestArtifactoryProbePipelineMode1: + """Mode 1 (explicit FQDN) end-to-end: parse -> probe -> rebuild -> yaml entry. + + A nested-group GitLab path under the Artifactory proxy must: + + 1. Parse to a shallow parse-time guess (per the new structural rule). + 2. Probe HEAD-walks candidate splits. + 3. Lock in the candidate whose archive responds 2xx. + 4. Surface as a structured ``git:`` + ``path:`` entry in apm.yml. + """ + + def test_nested_path_resolves_and_serializes_to_structured_yaml(self): + """``group/sub/repo/skills/sec`` under an Artifactory key locks in + the probed boundary AND emits a structured apm.yml entry whose + ``git:`` URL embeds the proxy prefix and whose ``path:`` is the + in-repo virtual sub-path.""" + package = "art.example.com/artifactory/apm/group/sub/repo/skills/sec" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + scripted = _scripted_head(boundary_owner="sub", boundary_repo="repo") + + with patch("apm_cli.install.artifactory_resolver.requests.head", side_effect=scripted): + dep_ref, direct_virtual_resolved = resolve_parsed_dependency_reference( + package, + marketplace_dep_ref=None, + dependency_reference_cls=DependencyReference, + try_resolve_gitlab_direct_shorthand=lambda *a, **k: None, + auth_resolver=auth, + verbose=False, + resolve_artifactory_boundary=_resolve_artifactory_boundary, + ) + + # The probe rebuilt the dep_ref at the proxy-verified split, so the + # pipeline marks it for structured-entry persistence. + assert direct_virtual_resolved is True + # Boundary lock-in: owner is always the first post-prefix segment + # (group); repo grows shallow-first until a candidate responds. The + # scripted probe answers for owner='group' + repo='sub/repo', so the + # boundary lands there and the in-repo virtual path is 'skills/sec'. + assert dep_ref.repo_url == "group/sub/repo" + assert dep_ref.virtual_path == "skills/sec" + + # End-to-end serialization: the yaml entry the install pipeline will + # write into apm.yml is the structured form (not bare shorthand). + entry = dependency_reference_to_yaml_entry(dep_ref) + assert "git" in entry + assert "path" in entry + # The git URL must keep the proxy host + the artifactory/ prefix + # so a future install reproduces the same routing. + assert "art.example.com" in entry["git"] + assert "artifactory/apm" in entry["git"] + assert "group/sub/repo" in entry["git"] + assert entry["path"] == "skills/sec" + + def test_unambiguous_path_skips_serialization_overhead(self): + """A two-segment (unambiguous) FQDN dep returns ``direct_virtual_ + resolved=False``: the parse-time dep_ref is already definitive, so + apm.yml stays in its existing shape.""" + package = "art.example.com/artifactory/apm/owner/repo" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + # No HEAD calls are expected (single-candidate short-circuit), but + # patch anyway so an accidental probe surfaces as a clear failure. + called = [] + + def _head(url, **kwargs): + called.append(url) + resp = MagicMock() + resp.status_code = 200 + return resp + + with patch("apm_cli.install.artifactory_resolver.requests.head", side_effect=_head): + dep_ref, direct_virtual_resolved = resolve_parsed_dependency_reference( + package, + marketplace_dep_ref=None, + dependency_reference_cls=DependencyReference, + try_resolve_gitlab_direct_shorthand=lambda *a, **k: None, + auth_resolver=auth, + verbose=False, + resolve_artifactory_boundary=_resolve_artifactory_boundary, + ) + + assert direct_virtual_resolved is False + assert dep_ref.repo_url == "owner/repo" + assert called == [], "unambiguous Mode 1 dep should not HEAD-probe" + + +class TestArtifactoryProbePipelineMode2: + """Mode 2 (bare shorthand under ``PROXY_REGISTRY_ONLY``) end-to-end.""" + + def test_bare_shorthand_resolves_and_keeps_bare_shape(self, monkeypatch): + """Bare nested shorthand under the proxy locks in the boundary BUT + keeps the rebuilt ref bare (no host/artifactory_prefix) so identity + and lockfile shape stay env-driven, not embedded in the entry.""" + monkeypatch.setenv("PROXY_REGISTRY_URL", "https://art.example.com/artifactory/apm") + monkeypatch.setenv("PROXY_REGISTRY_ONLY", "1") + + package = "group/sub/repo/skills/sec" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + scripted = _scripted_head(boundary_owner="sub", boundary_repo="repo") + + with patch("apm_cli.install.artifactory_resolver.requests.head", side_effect=scripted): + dep_ref, direct_virtual_resolved = resolve_parsed_dependency_reference( + package, + marketplace_dep_ref=None, + dependency_reference_cls=DependencyReference, + try_resolve_gitlab_direct_shorthand=lambda *a, **k: None, + auth_resolver=auth, + verbose=False, + resolve_artifactory_boundary=_resolve_artifactory_boundary, + ) + + assert direct_virtual_resolved is True + assert dep_ref.repo_url == "group/sub/repo" + assert dep_ref.virtual_path == "skills/sec" + # Mode 2 ref stays bare shorthand: no artifactory_prefix embedded on + # the rebuilt dep_ref. The proxy routing remains env-driven + # (PROXY_REGISTRY_URL + PROXY_REGISTRY_ONLY) so the lockfile stays + # portable across proxy URLs and host stays github.com (the dep's + # logical home), not the proxy host. + assert not getattr(dep_ref, "artifactory_prefix", None) + assert dep_ref.host == "github.com" + + +class TestArtifactoryProbePipelineErrorSurfacing: + """When the probe rejects every candidate, the pipeline raises a + ValueError that names the boundary trouble in user-facing terms. + + install.py catches ValueError, routes it through the per-package + invalid-outcomes ledger, and continues with the next package. This test + exercises the resolver-raise path (and confirms the message wording is + actionable, NOT silent-fallback-to-a-guess). + """ + + def test_all_404_raises_unresolved_with_bypass_hint(self): + package = "art.example.com/artifactory/apm/group/sub/repo" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + def _all_404(url, **kwargs): + resp = MagicMock() + resp.status_code = 404 + return resp + + with patch("apm_cli.install.artifactory_resolver.requests.head", side_effect=_all_404): + with pytest.raises(ValueError, match=r"//"): + resolve_parsed_dependency_reference( + package, + marketplace_dep_ref=None, + dependency_reference_cls=DependencyReference, + try_resolve_gitlab_direct_shorthand=lambda *a, **k: None, + auth_resolver=auth, + verbose=False, + resolve_artifactory_boundary=_resolve_artifactory_boundary, + ) + + def test_transport_errors_raise_inconclusive_with_bypass_hint(self): + """Every candidate raises a transport error (network unreachable / + DNS / TLS): the resolver fails closed (does NOT silently lock in a + guess) AND the error message names the ``//`` bypass marker so the + user has an escape hatch when the proxy is unavailable.""" + import requests as _requests + + package = "art.example.com/artifactory/apm/group/sub/repo" + auth = Mock() + auth.resolve_for_dep.return_value = Mock(token=None) + + def _all_transport_err(url, **kwargs): + raise _requests.ConnectionError("simulated DNS failure") + + with patch( + "apm_cli.install.artifactory_resolver.requests.head", + side_effect=_all_transport_err, + ): + with pytest.raises(ValueError, match=r"//") as excinfo: + resolve_parsed_dependency_reference( + package, + marketplace_dep_ref=None, + dependency_reference_cls=DependencyReference, + try_resolve_gitlab_direct_shorthand=lambda *a, **k: None, + auth_resolver=auth, + verbose=False, + resolve_artifactory_boundary=_resolve_artifactory_boundary, + ) + # The INCONCLUSIVE branch is the one this PR added the `//` hint to; + # confirm we are on that branch (and not the UNRESOLVED branch which + # had the hint pre-PR). + msg = str(excinfo.value) + assert "could not reach the proxy" in msg From c9d85e8382ddd4c90aafe7db06e9aa5b1e8b8d60 Mon Sep 17 00:00:00 2001 From: Copilot <223556219+Copilot@users.noreply.github.com> Date: Wed, 27 May 2026 21:40:09 +0200 Subject: [PATCH 8/8] test(artifactory): assert yaml entry URL via urllib.parse, not substring CodeQL flagged the substring assertions on entry['git'] (py/incomplete-url-substring-sanitization, high severity, alert #161 on PR #1472) and they also violated the repo's test convention: URL assertions in tests/** must use urllib.parse, never substring (.github/instructions/ tests.instructions.md). Switch to urlparse() + exact-match on hostname and ordered path segments. Same coverage (proxy host + artifactory/ prefix + probed owner/repo + virtual path), stricter assertion (a spoofed host containing 'art.example.com' as a substring would no longer pass), and CodeQL goes quiet. Co-authored-by: chkp-roniz <35386615+chkp-roniz@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../test_artifactory_probe_pipeline_e2e.py | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/tests/integration/test_artifactory_probe_pipeline_e2e.py b/tests/integration/test_artifactory_probe_pipeline_e2e.py index d1996c6cd..639226f5a 100644 --- a/tests/integration/test_artifactory_probe_pipeline_e2e.py +++ b/tests/integration/test_artifactory_probe_pipeline_e2e.py @@ -22,6 +22,7 @@ from __future__ import annotations from unittest.mock import MagicMock, Mock, patch +from urllib.parse import urlparse import pytest @@ -99,10 +100,17 @@ def test_nested_path_resolves_and_serializes_to_structured_yaml(self): assert "git" in entry assert "path" in entry # The git URL must keep the proxy host + the artifactory/ prefix - # so a future install reproduces the same routing. - assert "art.example.com" in entry["git"] - assert "artifactory/apm" in entry["git"] - assert "group/sub/repo" in entry["git"] + # + the probed owner/repo so a future install reproduces the same + # routing. Parse the URL into components and assert on each part by + # exact match (per the repo's test convention -- URL assertions use + # urllib.parse, never substring), so CodeQL's incomplete-URL-substring + # heuristic stays quiet and the assertion can't be satisfied by a + # spoofed lookalike in the host position. + parsed = urlparse(entry["git"]) + assert parsed.hostname == "art.example.com" + # Path is ordered: /artifactory///... + path_parts = [seg for seg in parsed.path.split("/") if seg] + assert path_parts[:5] == ["artifactory", "apm", "group", "sub", "repo"] assert entry["path"] == "skills/sec" def test_unambiguous_path_skips_serialization_overhead(self):