Skip to content

fix: submodule support — reinstate clone, fix pull, add hasSubmodules flag#1550

Open
rpapani wants to merge 5 commits intomainfrom
cm-client-bugs
Open

fix: submodule support — reinstate clone, fix pull, add hasSubmodules flag#1550
rpapani wants to merge 5 commits intomainfrom
cm-client-bugs

Conversation

@rpapani
Copy link
Copy Markdown
Contributor

@rpapani rpapani commented Apr 21, 2026

Summary

Full git-submodule support for Cloud Manager standard repos, plus the schema flag that downstream consumers need.

Three commits on this branch:

  1. fix: clone functionality to include submodules (#1413) — cherry-pick of a5c00169. Originally merged via PR fix: clone functionality to include submodules #1413, reverted in Revert "fix: clone functionality to include submodules (#1413)" #1549 because the Lambda git-layer was missing the submodule helper binaries. Now that the layer has been rebuilt (spacecat-git-layer:4 with git-submodule and git-submodule--helper), this can safely go back in. Also carries the per-operation git timeout bump that was part of the original fix: clone functionality to include submodules #1413: GIT_OPERATION_TIMEOUT_MS defaults to 600_000 ms (10 min, up from 120s) and is now overridable via env var — recursive clones traverse multiple submodule repos so the old 2-minute budget was too tight.

  2. fix: propagate --recurse-submodules to incremental pull — PR fix: clone functionality to include submodules #1413 only patched clone. The incremental-update path uses pull and was still skipping submodules. With this change, pull passes --recurse-submodules and the existing org-scoped extraheader ({origin}/{orgName}/) covers the submodule URLs on the same customer org, so no extra auth plumbing is needed.

  3. feat: add hasSubmodules field to site.code — optional boolean on site.code. The import worker sets it after each clone; downstream autofix / suggestion generation reads it to decide whether to parse .gitmodules and route patches across parent + submodule repos. Additive, unset by default.

Files changed

File Change
packages/spacecat-shared-cloud-manager-client/src/index.js clone (reinstated) and pull (new) both use --recurse-submodules; GIT_OPERATION_TIMEOUT_MS default raised 120s → 600s, now env-overridable
packages/spacecat-shared-cloud-manager-client/test/cloud-manager-client.test.js Assertions that both pull test cases include --recurse-submodules
packages/spacecat-shared-data-access/src/models/site/site.schema.js hasSubmodules: { type: 'boolean', required: false } under code

Why submodule semantics matter here

  • Submodules are pinned to commit SHAs in the parent's tree (gitlinks, mode 160000). .gitmodules only carries URL + branch metadata; it is not a pointer.
  • One git ls-remote on the parent captures every level of nesting — the parent's SHA is a hash of its tree, which includes all gitlinks recursively. No need to ls-remote each submodule.
  • A push into a submodule repo does not update the parent. Autofix for files inside a submodule must also bump the parent's gitlink or the customer's CI builds from the stale pin. This is captured in SITES-43089.

Test plan

  • cm-client unit tests pass — 69 tests, 100% coverage
  • data-access unit tests pass — 1890 tests, 100% coverage
  • Manual end-to-end clone with --recurse-submodules against test repo RaviASOtesting-p158103-uk74111 (with linked submodule ravi-aso-submodule-test) using the updated git Lambda layer :4 — verified working
  • Downstream consumer verification via adobe/spacecat-import-worker#724

Related

🤖 Generated with Claude Code

dmaurya929 and others added 3 commits April 20, 2026 15:55
Please ensure your pull request adheres to the following guidelines:
- [ ] make sure to link the related issues in this description
- [ ] when merging / squashing, make sure the fixed issue references are
visible in the commits, for easy compilation of release notes

## Related Issues


Thanks for contributing!
PR #1413 added --recurse-submodules to clone but missed pull, so
incremental updates of repos with submodules (e.g. Okta's
okta-digital-aem-sites) did not fetch submodule changes. Now pull
uses the same org-scoped extraheader auth as clone — submodules on
the same customer org inherit credentials and are updated to the
SHA pinned in the parent's tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an optional boolean `hasSubmodules` on `site.code` so the
import worker can record whether the cloned repo contains a
.gitmodules file. Downstream consumers (autofix, suggestion
generation) use this flag to decide whether to parse .gitmodules
and route patches/PRs across parent + submodule repos.

Additive, unset by default — no behaviour change for existing sites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rpapani rpapani requested a review from dmaurya929 April 21, 2026 06:33
Replaces the flat boolean with a nested metadata object that carries
the full submodule URL list and an external-vs-internal classification.
Consumers that only need presence check `metadata?.submodules?.urls?.length`;
those that need the URL list or external flag read the same object.

Shape:
  code.metadata = {
    submodules: {
      external: boolean,     // true if any URL points off the parent's host
      urls: string[]         // verbatim from .gitmodules, creds stripped
    }
  }
  // `submodules` is omitted when the repo has no .gitmodules

The importer always overwrites `metadata` on each import so a re-import
that finds no submodules clears stale entries from an earlier import.

Adds CodeMetadata + SubmodulesMetadata TypeScript interfaces and
matching unit tests. Adds a JSDoc doc-comment on the `code` attribute
summarising the contract.

No migration needed — hasSubmodules was unreleased (shipped on this
branch but not yet merged).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

This PR will trigger a minor release when merged.

Background: CM BYOG repos are proxied by the CM repo service at
`{cmRepoUrl}/api/program/{pid}/repository/{numericId}.git`. When a
customer's .gitmodules uses relative URLs (`../foo`) — which is the
common case — git resolves them against the parent's clone URL and
produces `.../repository/foo`. The proxy rejects those because it only
serves numeric repository IDs, breaking submodule clone end-to-end.

Fix: a new `cmProgramRepos` map on `site.code.metadata.submodules`
that maps each CM repo's real name (last path segment of its
`url`/`proxyUrl`, NOT the `repo` display field) to its numeric id.
The cm-client reads this at clone/pull time to rewrite name-based
submodule URLs in .git/config to the numeric-id form before running
`git submodule update`. Populated at onboarding via a single call to
`GET /api/program/{pid}/repositories`; the importer doesn't have CM
Management API access so it can't derive the map itself.

cm-client changes:
  - Export `isBYOG(repoType)` helper (anything !== STANDARD).
  - Split clone() by type:
    * STANDARD: `clone --recurse-submodules` + (on ref switch) sync +
      update --init --recursive. Unchanged behaviour.
    * BYOG: `clone --no-recurse-submodules` + checkout ref + new
      `#resolveByogSubmodules` helper that runs submodule init,
      rewrites URLs via cmProgramRepos, then submodule update
      --recursive with the CM repo service extraheaders re-applied.
  - Same split for pull(): STANDARD keeps `pull --recurse-submodules`,
    BYOG pulls the parent only and then runs the same rewrite pass to
    pick up new submodules introduced by the pulled commits.
  - `#resolveByogSubmodules` is idempotent and explicitly never calls
    `git submodule sync` — sync would copy .gitmodules URLs back into
    .git/config and undo the rewrite. Heavily commented.
  - Graceful fallbacks: no .gitmodules (skip), no map (warn, let the
    subsequent update fail naturally — parent clone is still usable),
    submodule name not in map (leave URL alone).

Schema / types:
  - `SubmodulesMetadata.cmProgramRepos?: Record<string, string>` with
    a thorough TS doc comment explaining the lifecycle, who writes it,
    and why standard repos don't need it.
  - site.schema.js JSDoc distinguishes importer-written fields from
    onboarding-written fields.

Testing: validated end-to-end against eni-plenitude (program 123722,
parent repo 233725, 12 relative submodules on Azure DevOps BYOG) —
all submodules cloned at their pinned SHAs via the CM proxy. Unit
tests: 79 passing, 100% coverage (up from 70).

TODO follow-ups tracked elsewhere:
  - Ask CM to support name-based routes on the repo proxy (would
    remove the need for this map entirely).
  - Consider exposing customer-native URLs + stored PAT so we could
    bypass the proxy for BYOG altogether.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants