Skip to content

Commit 5ed13a7

Browse files
committed
Merge upstream v7.0.0: Full TEE Capability — Phase 2 backend LLM attestation
Brings in upstream's v7.0.0 release (MorpheusAIs#712): - Phase 2 provider-side backend LLM attestation (TDX quote, TLS pinning, RTMR3 workload replay, CPU-GPU nonce binding, NVIDIA NRAS GPU attestation, per-prompt fast verify) - Single on-chain "tee" tag drives both hops; local isTee flag retired - request_id propagation across inference/attestation log paths - Per-entry Badger activity keys for session storage GC reclaim - ECS deploy / CI-CD wait-timing hardening, docs rewrite, swagger updates - Major version bump to 7 Conflict resolved in proxy-router/internal/blockchainapi/service.go: OpenSession keeps the fork's nil-guard on authConfig (for mobile SDK use) alongside upstream's new log := s.requestLog(ctx) binding, and our execution-reverted retry loop was switched to use the request-scoped logger. Verified: proxy-router builds cleanly and our touched packages (chatstorage, storages, proxyapi, mobile) pass their unit tests. The remaining pre-existing failures (attestation fixture/network tests, TestRating, vet warnings) are inherited unchanged from upstream v7.0.0. Made-with: Cursor
2 parents faa6580 + 4c42883 commit 5ed13a7

47 files changed

Lines changed: 4426 additions & 411 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.ai-docs/TEE_Attestation_Architecture.md

Lines changed: 257 additions & 81 deletions
Large diffs are not rendered by default.

.ai-docs/TEE_CICD_Supply_Chain_Hardening.md

Lines changed: 77 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
# CI/CD Supply-Chain Hardening for Morpheus Docker Images
22

3-
**Last updated:** 2026-03-11
4-
**First successful run (Phase 1a — signing):** [#22920492249](https://github.com/MorpheusAIs/Morpheus-Lumerin-Node/actions/runs/22920492249)
3+
**Last updated:** 2026-04-22
4+
**First successful run (Phase 1a — signing):** [#22920492249](https://github.com/MorpheusAIs/Morpheus-Lumerin-Node/actions/runs/22920492249)
55
**First end-to-end run (Phase 1b — deploy + verify):** [#22969993910](https://github.com/MorpheusAIs/Morpheus-Lumerin-Node/actions/runs/22969993910)
66

7+
> **v7.0.0 release status.** The CI/CD hardening described here is the foundation that every downstream trust check depends on. Both **Phase 1c** (consumer-side proxy-router verification of the P-Node) and **Phase 2** (P-Node verifies its own backend LLM) have shipped on top of it — see [`TEE_Attestation_Architecture.md`](TEE_Attestation_Architecture.md) §7.4 and §7.7 for the code-level flow. The CI/CD pipeline itself remains unchanged from Phase 1b in this release; v7 is the *consumer* of the artifacts this pipeline produces.
8+
79
---
810

911
## Why This Matters
@@ -197,38 +199,59 @@ This shows all attached artifacts — signature, attestation, and SBOM — in a
197199

198200
---
199201

200-
## What This Enables — The Full Loop
202+
## What This Enables — The Full Loop (as of v7.0.0)
201203

202-
This CI/CD hardening is the **foundation layer** for the full TEE attestation loop. As of Phase 1b, the pipeline is fully automated end-to-end:
204+
This CI/CD hardening is the **foundation layer** for the full TEE attestation loop. As of v7.0.0, the loop is complete end-to-end — both consumer-side Phase 1 and P-Node-side Phase 2 are shipped:
203205

204206
```
205207
┌──────────────────────────────────────────────────────────────────────┐
206-
CI/CD Pipeline (done)
208+
│ CI/CD Pipeline (Phase 1a + 1b — DONE)
207209
│ │
208210
│ Source Code ──► Build ──► Sign ──► Compute RTMR3 ──► Publish GHCR │
209211
│ │ │ │ │ │
210-
│ │ cosign sig RTMR3 in ├── image
211-
│ │ (Sigstore) manifest ├── SBOM
212-
│ │ └── manifest
213-
│ ▼
214-
│ Deploy to SecretVM ──► Verify live RTMR3 matches
215-
│ (secretvm-cli) (polls attestation quote)
212+
│ │ cosign sig RTMR3 in ├── image │
213+
│ │ (Sigstore) manifest ├── SBOM │
214+
│ │ └── manifest │
215+
│ ▼ │
216+
│ Deploy to SecretVM ──► Verify live RTMR3 matches │
217+
│ (secretvm-cli) (polls attestation quote) │
216218
└──────────────────────────────────────────────────────────────────────┘
217219
218220
219-
┌──────────────────────────────┐
220-
│ Consumer Verification │
221-
│ (proxy-router code, next) │
222-
│ │
223-
│ 1. Detect "tee" model tag │
224-
│ 2. Fetch manifest from GHCR │
225-
│ 3. Fetch quote from :29343 │
226-
│ 4. Compare RTMR3 │
227-
│ 5. If match → session │
228-
│ If fail → reject │
229-
└──────────────────────────────┘
221+
┌──────────────────────────────────────────────────────────────────────┐
222+
│ Phase 1c — Consumer verifies P-Node (DONE in v6.0.0) │
223+
│ │
224+
│ C-Node (v6.0.0+) session open + every prompt: │
225+
│ 1. IsTeeModel(on-chain tags) == true │
226+
│ 2. cosign fetch signed manifest from GHCR │
227+
│ 3. GET provider :29343/cpu → raw TDX quote │
228+
│ 4. POST to TEE_PORTAL_URL → quote is genuine │
229+
│ 5. Compare RTMR3 against manifest golden value │
230+
│ 6. reportData[0:32] == SHA-256(peer TLS cert) → anti-MITM │
231+
│ 7. Cache snapshot; fast-verify on every prompt │
232+
│ (attestation/verifier.go; PR #686, #689, #699) │
233+
└──────────────────────────────────────────────────────────────────────┘
234+
235+
236+
┌──────────────────────────────────────────────────────────────────────┐
237+
│ Phase 2 — P-Node verifies its Backend LLM (DONE in v7.0.0) │
238+
│ │
239+
│ P-Node (-tee image, v7.0.0+) startup + every prompt: │
240+
│ 1. GET backend :29343/cpu → backend TDX quote (portal-verified) │
241+
│ 2. TLS pinning via reportData[0:32] │
242+
│ 3. Artifact-registry lookup for MRTD + RTMR0-2 │
243+
│ 4. Replay RTMR3 from backend docker-compose.yaml (SHA-384 chain) │
244+
│ 5. GET backend :29343/gpu → CPU-GPU binding via reportData[32:64] │
245+
│ 6. NVIDIA NRAS v4 attestation of GPU evidence │
246+
│ 7. Fast-verify on every prompt; PinnedHTTPClient for inference │
247+
│ 8. State exposed at GET /v1/models/attestation │
248+
│ (attestation/backend_verifier.go, workload_verifier.go, │
249+
│ nras_verifier.go, artifacts_registry.go; PR #699, #700, #708-#709) │
250+
└──────────────────────────────────────────────────────────────────────┘
230251
```
231252

253+
**Why v6+ consumers are forward-compatible with v7+ providers:** Phase 2 runs **entirely inside the P-Node** — the consumer never talks to the backend LLM and never sees the backend's attestation quote. The consumer trusts Phase 2 transitively because it has already attested (via Phase 1) that the P-Node is running the exact `-tee` binary that enforces Phase 2. No client-side upgrade is required to get Phase 2 guarantees.
254+
232255
**How each artifact feeds the trust chain:**
233256

234257
1. **Image signing** → Consumers can verify a provider is running an official image, not a modified fork
@@ -255,28 +278,46 @@ This CI/CD hardening is the **foundation layer** for the full TEE attestation lo
255278

256279
## Current Status and Next Steps
257280

258-
### Completed (Phase 1a + 1b)
281+
### Completed (Phase 1a + 1b — CI/CD)
259282

260283
| Step | Description | Status |
261284
|---|---|---|
262-
| **Cosign signing + SBOM** | Keyless signing, digest capture, SPDX SBOM for TEE image | **Done** |
263-
| **TEE attestation manifest** | Signed JSON with digests, hashes, baked env, build provenance | **Done** |
264-
| **RTMR3 computation** | Computed in CI/CD from deployed compose + SecretVM rootfs; embedded in signed manifest | **Done** |
265-
| **Auto-deploy to SecretVM** | `Deploy-SecretVM-Test` job deploys digest-pinned compose to test VM via `secretvm-cli` | **Done** |
266-
| **Post-deploy verification** | Polls live VM attestation, extracts RTMR3 from raw TDX quote, compares against CI-computed value | **Done** |
285+
| **Cosign signing + SBOM** | Keyless signing, digest capture, SPDX SBOM for TEE image | **Done** (v6.0.0) |
286+
| **TEE attestation manifest** | Signed JSON with digests, hashes, baked env, build provenance | **Done** (v6.0.0) |
287+
| **RTMR3 computation** | Computed in CI/CD from deployed compose + SecretVM rootfs; embedded in signed manifest | **Done** (v6.0.0) |
288+
| **Auto-deploy to SecretVM** | `Deploy-SecretVM-Test` job deploys digest-pinned compose to test VM via `secretvm-cli` | **Done** (v6.0.0) |
289+
| **Post-deploy verification** | Polls live VM attestation, extracts RTMR3 from raw TDX quote, compares against CI-computed value | **Done** (v6.0.0) |
290+
| **ECS deploy timing hardening** | Retry + stabilization-timeout improvements so post-deploy healthchecks don't race ECS | **Done** (PR #694/#695, #701) |
267291

268-
### Remaining (Developer WorkProxy-Router Code)
292+
### Completed (Phase 1cConsumer verifies P-Node, v6.0.0 → v6.2.x)
269293

270294
| Step | Description | Status |
271295
|---|---|---|
272-
| **`IsTEEModel()` helper** | Detect `"tee"` tag on blockchain-registered models | TODO |
273-
| **Consumer-side verification** | Fetch attestation from `:29343`, verify RTMR3 against signed manifest before opening session | TODO |
274-
| **Consumer UI TEE badge** | Visual indicator for TEE-verified models | TODO |
296+
| **`IsTeeModel()` helper** | Detect `"tee"` tag on blockchain-registered models; drives both hops of the trust chain | **Done** — PR #708, #709 (consolidated as sole TEE switch) |
297+
| **Consumer-side verification** | Fetch attestation from `:29343`, verify quote via SecretAI portal, compare RTMR3 against signed manifest, pin TLS cert — all before opening session | **Done** (`attestation/verifier.go`) |
298+
| **Per-prompt fast-verify** | Re-fetch quote, compare hash + TLS fingerprint on every forwarded prompt | **Done** — PR #686, #689 |
299+
| **Consumer UI TEE badge** | Visual indicator for TEE-verified models + session status | **Done** |
275300

276-
### Lower Priority (CI/CD)
301+
### Completed (Phase 2a — P-Node verifies its Backend LLM, v7.0.0)
277302

278303
| Step | Description | Status |
279304
|---|---|---|
280-
| **Full RTMR0-2** | Integrate `reproduce-mr` for firmware/kernel layers (blocked on ACPI templates) | TODO |
281-
| **AMD SEV measurement** | Integrate `sev-snp-measure` for AMD platform | TODO |
282-
| **CVE scanning** | Trivy/Grype scan as advisory step, then gating | TODO |
305+
| **`BackendVerifier.AttestBackend`** | Startup full attestation: portal-verified CPU quote, TLS binding, workload RTMR3 replay, CPU-GPU nonce binding, NRAS | **Done** — PR #699 |
306+
| **`FastVerifyBackend`** | Per-prompt hot-path re-check with hash + TLS fingerprint; no TTL | **Done** — PR #699 |
307+
| **`ArtifactRegistry`** | Auto-refreshed SecretVM TDX artifact CSV for MRTD + RTMR0-2 lookup | **Done** — PR #699 |
308+
| **`NrasVerifier`** | NVIDIA NRAS v4 API integration for GPU attestation | **Done** — PR #699 |
309+
| **`PinnedHTTPClient`** | Onward inference rejects any TLS cert whose SHA-256 differs from attested fingerprint | **Done** — PR #699 |
310+
| **`GET /v1/models/attestation`** | Per-model attestation state endpoint for monitoring and forensics | **Done** — PR #699 |
311+
| **New env vars** | `TEE_PORTAL_URL`, `TEE_IMAGE_REPO`, `ARTIFACT_REGISTRY_URL`, `ARTIFACT_REGISTRY_REFRESH_INTERVAL` | **Done** — PR #699 |
312+
313+
### Remaining (Lower Priority / Future)
314+
315+
| Area | Step | Status |
316+
|---|---|---|
317+
| CI/CD | Full RTMR0-2 *recomputation* in CI (today we verify RTMR0-2 by artifact-registry lookup, which is sufficient) | TODO — blocked on ACPI templates |
318+
| CI/CD | AMD SEV-SNP measurement via `sev-snp-measure` | TODO — TDX-only today |
319+
| CI/CD | CVE scanning (Trivy/Grype) — advisory then gating | TODO |
320+
| Proxy-router | Verifiable per-message signing using SecretVM TEE-bound key | Deferred to Phase 2b |
321+
| Proxy-router | Local in-process quote verification (remove `quote-parse` dependency on SCRT Labs) | Deferred to Phase 2b |
322+
| Proxy-router | Co-located proxy-router + LLM in a single TDX VM (collapses both hops into one RTMR3) | Deferred to Phase 2b |
323+
| Proxy-router | NRAS alternatives for non-NVIDIA GPU vendors | Deferred to Phase 2b |

0 commit comments

Comments
 (0)