fix: submodule support — reinstate clone, fix pull, add hasSubmodules flag#1550
Open
fix: submodule support — reinstate clone, fix pull, add hasSubmodules flag#1550
Conversation
Please ensure your pull request adheres to the following guidelines: - [ ] make sure to link the related issues in this description - [ ] when merging / squashing, make sure the fixed issue references are visible in the commits, for easy compilation of release notes ## Related Issues Thanks for contributing!
PR #1413 added --recurse-submodules to clone but missed pull, so incremental updates of repos with submodules (e.g. Okta's okta-digital-aem-sites) did not fetch submodule changes. Now pull uses the same org-scoped extraheader auth as clone — submodules on the same customer org inherit credentials and are updated to the SHA pinned in the parent's tree. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an optional boolean `hasSubmodules` on `site.code` so the import worker can record whether the cloned repo contains a .gitmodules file. Downstream consumers (autofix, suggestion generation) use this flag to decide whether to parse .gitmodules and route patches/PRs across parent + submodule repos. Additive, unset by default — no behaviour change for existing sites. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dmaurya929
approved these changes
Apr 21, 2026
Replaces the flat boolean with a nested metadata object that carries
the full submodule URL list and an external-vs-internal classification.
Consumers that only need presence check `metadata?.submodules?.urls?.length`;
those that need the URL list or external flag read the same object.
Shape:
code.metadata = {
submodules: {
external: boolean, // true if any URL points off the parent's host
urls: string[] // verbatim from .gitmodules, creds stripped
}
}
// `submodules` is omitted when the repo has no .gitmodules
The importer always overwrites `metadata` on each import so a re-import
that finds no submodules clears stale entries from an earlier import.
Adds CodeMetadata + SubmodulesMetadata TypeScript interfaces and
matching unit tests. Adds a JSDoc doc-comment on the `code` attribute
summarising the contract.
No migration needed — hasSubmodules was unreleased (shipped on this
branch but not yet merged).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
This PR will trigger a minor release when merged. |
Background: CM BYOG repos are proxied by the CM repo service at
`{cmRepoUrl}/api/program/{pid}/repository/{numericId}.git`. When a
customer's .gitmodules uses relative URLs (`../foo`) — which is the
common case — git resolves them against the parent's clone URL and
produces `.../repository/foo`. The proxy rejects those because it only
serves numeric repository IDs, breaking submodule clone end-to-end.
Fix: a new `cmProgramRepos` map on `site.code.metadata.submodules`
that maps each CM repo's real name (last path segment of its
`url`/`proxyUrl`, NOT the `repo` display field) to its numeric id.
The cm-client reads this at clone/pull time to rewrite name-based
submodule URLs in .git/config to the numeric-id form before running
`git submodule update`. Populated at onboarding via a single call to
`GET /api/program/{pid}/repositories`; the importer doesn't have CM
Management API access so it can't derive the map itself.
cm-client changes:
- Export `isBYOG(repoType)` helper (anything !== STANDARD).
- Split clone() by type:
* STANDARD: `clone --recurse-submodules` + (on ref switch) sync +
update --init --recursive. Unchanged behaviour.
* BYOG: `clone --no-recurse-submodules` + checkout ref + new
`#resolveByogSubmodules` helper that runs submodule init,
rewrites URLs via cmProgramRepos, then submodule update
--recursive with the CM repo service extraheaders re-applied.
- Same split for pull(): STANDARD keeps `pull --recurse-submodules`,
BYOG pulls the parent only and then runs the same rewrite pass to
pick up new submodules introduced by the pulled commits.
- `#resolveByogSubmodules` is idempotent and explicitly never calls
`git submodule sync` — sync would copy .gitmodules URLs back into
.git/config and undo the rewrite. Heavily commented.
- Graceful fallbacks: no .gitmodules (skip), no map (warn, let the
subsequent update fail naturally — parent clone is still usable),
submodule name not in map (leave URL alone).
Schema / types:
- `SubmodulesMetadata.cmProgramRepos?: Record<string, string>` with
a thorough TS doc comment explaining the lifecycle, who writes it,
and why standard repos don't need it.
- site.schema.js JSDoc distinguishes importer-written fields from
onboarding-written fields.
Testing: validated end-to-end against eni-plenitude (program 123722,
parent repo 233725, 12 relative submodules on Azure DevOps BYOG) —
all submodules cloned at their pinned SHAs via the CM proxy. Unit
tests: 79 passing, 100% coverage (up from 70).
TODO follow-ups tracked elsewhere:
- Ask CM to support name-based routes on the repo proxy (would
remove the need for this map entirely).
- Consider exposing customer-native URLs + stored PAT so we could
bypass the proxy for BYOG altogether.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Full git-submodule support for Cloud Manager standard repos, plus the schema flag that downstream consumers need.
Three commits on this branch:
fix: clone functionality to include submodules (#1413)— cherry-pick ofa5c00169. Originally merged via PR fix: clone functionality to include submodules #1413, reverted in Revert "fix: clone functionality to include submodules (#1413)" #1549 because the Lambdagit-layerwas missing the submodule helper binaries. Now that the layer has been rebuilt (spacecat-git-layer:4withgit-submoduleandgit-submodule--helper), this can safely go back in. Also carries the per-operation git timeout bump that was part of the original fix: clone functionality to include submodules #1413:GIT_OPERATION_TIMEOUT_MSdefaults to600_000ms (10 min, up from 120s) and is now overridable via env var — recursive clones traverse multiple submodule repos so the old 2-minute budget was too tight.fix: propagate --recurse-submodules to incremental pull— PR fix: clone functionality to include submodules #1413 only patched clone. The incremental-update path usespulland was still skipping submodules. With this change,pullpasses--recurse-submodulesand the existing org-scoped extraheader ({origin}/{orgName}/) covers the submodule URLs on the same customer org, so no extra auth plumbing is needed.feat: add hasSubmodules field to site.code— optional boolean onsite.code. The import worker sets it after each clone; downstream autofix / suggestion generation reads it to decide whether to parse.gitmodulesand route patches across parent + submodule repos. Additive, unset by default.Files changed
packages/spacecat-shared-cloud-manager-client/src/index.jsclone(reinstated) andpull(new) both use--recurse-submodules;GIT_OPERATION_TIMEOUT_MSdefault raised 120s → 600s, now env-overridablepackages/spacecat-shared-cloud-manager-client/test/cloud-manager-client.test.js--recurse-submodulespackages/spacecat-shared-data-access/src/models/site/site.schema.jshasSubmodules: { type: 'boolean', required: false }undercodeWhy submodule semantics matter here
160000)..gitmodulesonly carries URL + branch metadata; it is not a pointer.git ls-remoteon the parent captures every level of nesting — the parent's SHA is a hash of its tree, which includes all gitlinks recursively. No need to ls-remote each submodule.Test plan
--recurse-submodulesagainst test repoRaviASOtesting-p158103-uk74111(with linked submoduleravi-aso-submodule-test) using the updated git Lambda layer:4— verified workingRelated
hasSubmodulesschema field added heregit-submodule+git-submodule--helperintospacecat-git-layer:4, closing the gap that caused the original fix: clone functionality to include submodules #1413 to be reverted in Revert "fix: clone functionality to include submodules (#1413)" #1549🤖 Generated with Claude Code