Skip to content

feat: add AMD SEV-SNP attestation support for TEE providers#719

Merged
nomadicrogue merged 4 commits into
testfrom
dev
Apr 29, 2026
Merged

feat: add AMD SEV-SNP attestation support for TEE providers#719
nomadicrogue merged 4 commits into
testfrom
dev

Conversation

@alex-sandrk
Copy link
Copy Markdown
Collaborator

No description provided.

alex-sandrk and others added 2 commits April 28, 2026 18:02
Route SEV quotes to the dedicated portal endpoint, normalize
space-separated report_data from the SEV portal, and implement
full SEV-SNP workload verification (GCTX launch digest computation
against the SecretVM SEV artifact registry).
Route SEV quotes to the dedicated portal endpoint, normalize
space-separated report_data from the SEV portal, and implement full
SEV-SNP workload verification (GCTX launch digest computation against
the SecretVM SEV artifact registry).
@ether-btc
Copy link
Copy Markdown

🔍 Code Review — PR #719: AMD SEV-SNP Attestation Support

Reviewers: tech-lead + software-engineer (dual-verdict)
Verdict: REQUEST CHANGES 🔴


Critical Bugs — Must Fix Before Merge

1. VMType assigned wrong field — 3 locations 🔴

File: proxy-router/internal/attestation/sev_workload.go — lines 937, 953, 966

// CURRENT (BUGGY) — all 3 return statements:
TemplateName: family.TemplateName,  // ✅ correct
VMType:       family.TemplateName,  // ❌ should be family.VMType
Env:          family.VMType,       // ✅ correct

Compare with the existing TDX implementation (workload_verifier.go:61):

TemplateName: entry.TemplateName,  // ✅ correct
VMType:       entry.VMType,       // ✅ correct — uses VMType, not TemplateName

Impact: VMType is populated with the template name string (e.g. "small") instead of the actual VM type (e.g. "prod"). Any downstream code that uses VMType for routing, policy, or display will receive semantically wrong data.

Fix: Change VMType: family.TemplateNameVMType: family.VMType at all 3 locations.


2. verifyQuote passes wrong URL to VerifyQuote 🔴

File: proxy-router/internal/attestation/verifier.go — line 1138

portalTarget := v.portalURL
if !IsTdxQuote(quote) {
    portalTarget = deriveSEVPortalURL(v.portalURL)  // portalTarget = SEV URL ✅
}
v.log.Infof("sending quote to ... (%s)", portalTarget) // logs SEV URL ✅
result, err := VerifyQuote(ctx, v.portalClient, v.portalURL, quote)  // ← passes BASE URL ❌

portalTarget is correctly computed and logged, but VerifyQuote receives v.portalURL (the base URL).

Note: VerifyQuote itself also checks IsTdxQuote and derives the SEV URL internally, so runtime routing is actually correct — but the code is misleading and the log is wrong. If the internal routing logic ever diverges from deriveSEVPortalURL, this becomes a silent bug.

Fix: Change line 1138 to VerifyQuote(ctx, v.portalClient, portalTarget, quote) and remove the redundant internal routing check (or keep it as a safety net but log what actually happened).


3. SEV registry nil-guard is a one-point-in-time check 🟠

File: proxy-router/internal/attestation/workload_verifier.go — line 1203

if sevRegistry != nil && sevRegistry.IsLoaded() {
    return VerifySevWorkload(sevRegistry, cpuQuoteData, dockerComposeYaml, log)
}

The nil-guard is correct for the current call path. However, VerifySevWorkload calls registry.AllEntries() with no secondary nil check inside the function. If VerifySevWorkload is ever called directly in a future refactor, it panics. The guard is a fragile one-point-in-time check, not an explicit contract.

Fix: Add if registry == nil { return WorkloadResult{Status: WorkloadSkipped} } inside VerifySevWorkload itself, making nil-safety a property of the function rather than its caller.


Medium Issues — Should Fix Before Merge

4. SEV endpoint registered in tests but never exercised 🟠

/api/quote-parse-sev is registered in backend_verifier_test.go:161-162 using the same portalHandler as the TDX endpoint — returns a TDX ParseQuoteResponse. The SEV quote parsing path has zero test coverage.

5. Silent WorkloadSkipped when GitHub unreachable is a security risk 🟠

If sev_registry.go fails to fetch from GitHub at startup, IsLoaded() returns false forever. Every SEV attestation then silently returns WorkloadSkippedunverified workloads are accepted by default with no operator alert. This is fail-open behavior in a security-critical attestation path.

Fix: Either fail closed (reject SEV quotes when registry unavailable) or add an explicit WorkloadUnverified status that forces operator action.

6. Hardcoded SEV registry URL with no fallback 🟠

DefaultSevRegistryURL hardcoded to scrtlabs/secretvm-verify GitHub. Same pattern that caused TDX incidents. If the repo moves or becomes unreachable, SEV attestation is permanently broken.

7. Missing unit tests for core SEV functions 🟠

No dedicated tests for: VerifySevWorkload, SevArtifactRegistry.FindByVMType, CalcSevMeasurement edge cases (zero cmdline, page-boundary GPAs), ParseSevFamilyID null-padding.

8. TestVerifyWorkload_NonTdxQuote — misleading assertion 🟡

Status changed from WorkloadNotAuthenticWorkloadSkipped. Test name reflects old behavior; the assertion name is now semantically misleading.


What's Good ✅

  • Dual-registry architecture (TDX + SEV) is sound and well-separated
  • IsTdxQuote routing is correct
  • WorkloadSkipped correctly differentiates "unavailable" from "rejected"
  • Space-stripping fix in VerifyCPUGPUBinding and VerifyTLSBinding is correct and overdue
  • GCTX calculation in sev_gctx.go matches AMD SNP spec Section 8.17.2 Table 67 — hard crypto math verified correct
  • SevArtifactRegistry uses proper sync.RWMutex for thread-safe reads/writes
  • All HTTP tests use httptest.NewServerno external network dependencies in tests

Summary

Severity Count Blocking?
🔴 Critical 2 Yes
🟠 Medium 5 Yes (before merge)
🟡 Minor 2 No

Recommendation: REQUEST CHANGES. The architecture and crypto math are solid — these are implementation slips in 3 specific locations, all fixable with targeted patches. The test coverage gaps for the SEV path are also a risk for a security-critical feature.


Reviewed by: tech-lead + software-engineer (dual-verdict, 2/2 agree on all critical findings)

@nomadicrogue nomadicrogue merged commit fed5422 into test Apr 29, 2026
17 checks passed
nomadicrogue added a commit that referenced this pull request May 22, 2026
## Summary

Docs-only promotion to `main` — same two commits validated on `test`
(#728), cherry-picked onto current `main`:

- Mintlify MDX site, `AGENTS.md`, `.cursor/rules/morpheus.mdc`
- Unified `docs.yml` validate + OIDC deploy pipeline (Pagefind,
llms.txt)

On merge, the **Docs** workflow deploys to https://nodedocs.mor.org
(`main` environment variables).

**Not a full `test` → `main` merge.** `test` also carries TEE/SEV-SNP
commits (#719#723) that are intentionally excluded here.

## Prerequisites

- [x] Nonprod validated at https://nodedocs.dev.mor.org
- [x] CloudFront URL rewrite applied (`08-nodedocs-mor-org` dev + prd)
- [x] `main` GitHub environment variables configured (`NODEDOCS_*`)

## Test plan

- [ ] Merge; confirm **Docs** workflow completes validate + deploy
- [ ] Verify https://nodedocs.mor.org loads and subpage navigation works
(e.g. `/inference-api/overview`)
- [ ] Confirm `/llms.txt` is present with prod URLs
- [ ] Confirm main CI-CD pipeline does **not** run for this merge


Made with [Cursor](https://cursor.com)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants