Skip to content

Commit 49c0b02

Browse files
committed
feat: add firmware reference values workflow for bare metal attestation
Add comprehensive tooling and documentation for collecting and managing firmware reference values (TDX/SNP measurements) used in bare metal attestation policies. **New documentation:** - docs/firmware-reference-values.md: Complete workflow guide covering: - Architecture of TDX (mr_td, rtmr_1/2, xfam) and SNP measurements - SHA-256 vs SHA-384 algorithm clarification (different layers, both correct) - Step-by-step collection using veritas tool - Multi-OCP-version support via merged arrays - Known veritas gaps (TCB versions, SNP policy bits, image measurements) - Security considerations and policy trade-offs **New script:** - scripts/collect-firmware-refvals.sh: Automated wrapper that: - Extracts measurements from veritas JSON output - Transforms to KBS/RVPS expected format (arrays of hex strings) - Merges with existing Vault values to support multi-version - Pushes to secret/data/hub/firmwareReferenceValues **Integration:** - Makefile: Add `make push-firmware-refvals REFVALS_FILE=<path>` target - values-secret.yaml.template: Document firmwareReferenceValues structure This is PR 2A of Wave 2 (firmware hardening). The actual attestation policy enforcement and ESO integration come in subsequent PRs. Part of the bare metal attestation hardening roadmap.
1 parent e69084c commit 49c0b02

4 files changed

Lines changed: 479 additions & 0 deletions

File tree

Makefile

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,13 @@
33
# You can add custom targets above or below the include line
44

55
include Makefile-common
6+
7+
##@ Firmware Reference Values
8+
.PHONY: push-firmware-refvals
9+
push-firmware-refvals: ## Push firmware reference values to Vault (REFVALS_FILE=<path>)
10+
@if [ -z "$(REFVALS_FILE)" ]; then \
11+
echo "Error: REFVALS_FILE not specified" >&2; \
12+
echo "Usage: make push-firmware-refvals REFVALS_FILE=./refvals.json" >&2; \
13+
exit 1; \
14+
fi
15+
@scripts/collect-firmware-refvals.sh $(REFVALS_FILE)

docs/firmware-reference-values.md

Lines changed: 300 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,300 @@
1+
# Firmware Reference Values for Bare Metal Attestation
2+
3+
## Overview
4+
5+
Firmware reference values provide cryptographic measurements of the trusted computing base (TCB) for Intel TDX and AMD SEV-SNP confidential VMs running on bare metal. These values enable attestation policies to verify that workloads are running on known-good firmware with the expected security configuration.
6+
7+
Without firmware reference values, attestation only verifies the `init_data` (runtime configuration hash). With firmware values, the Key Broker Service (KBS) can enforce:
8+
9+
- **Hardware integrity**: Verify firmware measurements (MRTD, RTMRs, launch measurement)
10+
- **TCB version**: Ensure minimum firmware/microcode versions
11+
- **Security configuration**: Enforce debug-disabled mode
12+
13+
## Architecture
14+
15+
### Intel TDX Measurements
16+
17+
- **mr_td** (SHA-384): Initial contents of the TD (Trust Domain) - firmware + initial page tables
18+
- **rtmr_1** (SHA-384): Guest firmware + bootloader measurements
19+
- **rtmr_2** (SHA-384): Kernel + initrd measurements
20+
- **xfam** (hex): Extended feature mask - CPU features available to the TD
21+
22+
### AMD SEV-SNP Measurements
23+
24+
- **snp_launch_measurement** (SHA-384): Hash of initial guest memory contents + VMSA
25+
- **debug flag**: Policy bit indicating whether debug is allowed
26+
27+
### Hash Algorithm Clarification
28+
29+
Different layers use different hash algorithms - this is **correct and expected**:
30+
31+
| Layer | Algorithm | Why |
32+
|-------|-----------|-----|
33+
| init_data (OSC TOML) | SHA-256 | CoCo initdata spec, extends into vTPM PCR8 |
34+
| TDX firmware (mr_td, rtmr_*) | SHA-384 | Intel TDX architecture requirement |
35+
| SNP firmware (launch_measurement) | SHA-384 | AMD SEV-SNP architecture requirement |
36+
| Azure vTPM PCRs | SHA-256 | TPM 2.0 default bank for virtual TPMs |
37+
38+
The attestation policy verifies each independently - there is no conflict.
39+
40+
## Prerequisites
41+
42+
### 1. Veritas Tool
43+
44+
Veritas is a Python tool for collecting reference values from confidential VMs. Install via pip:
45+
46+
```bash
47+
pip install veritas-collectd
48+
```
49+
50+
**Version requirement**: 0.2.0 or later
51+
52+
### 2. Bare Metal Cluster Access
53+
54+
You need:
55+
- A running bare metal cluster with Intel TDX or AMD SEV-SNP hardware
56+
- KataConfig deployed and in Ready state
57+
- At least one kata pod successfully running (proves TEE is functional)
58+
59+
### 3. Vault Access
60+
61+
You need write access to the Vault instance at `secret/data/hub/firmwareReferenceValues`.
62+
63+
If using the pattern's default Vault setup:
64+
```bash
65+
# Get Vault root token from cluster
66+
oc get secret -n vault vault-init -o jsonpath='{.data.root_token}' | base64 -d
67+
```
68+
69+
## Workflow
70+
71+
### Step 1: Collect Reference Values from Kata Pod
72+
73+
Run a kata pod on the bare metal cluster and use veritas to extract firmware measurements:
74+
75+
```bash
76+
# Create a test pod with kata-remote runtime
77+
oc apply -f - <<EOF
78+
apiVersion: v1
79+
kind: Pod
80+
metadata:
81+
name: firmware-collector
82+
namespace: default
83+
spec:
84+
runtimeClassName: kata-remote
85+
containers:
86+
- name: busybox
87+
image: quay.io/quay/busybox:latest
88+
command: ["sleep", "3600"]
89+
EOF
90+
91+
# Wait for pod to be Running
92+
oc wait --for=condition=Ready pod/firmware-collector -n default --timeout=300s
93+
94+
# Exec into the pod and run veritas
95+
oc exec -it firmware-collector -n default -- sh
96+
97+
# Inside the pod:
98+
veritas collect --output /tmp/refvals.json
99+
cat /tmp/refvals.json
100+
exit
101+
102+
# Copy the reference values out
103+
oc cp default/firmware-collector:/tmp/refvals.json ./refvals-$(oc get nodes -o jsonpath='{.items[0].status.nodeInfo.osImage}' | tr ' ' '-').json
104+
```
105+
106+
### Step 2: Transform to Vault Format
107+
108+
Veritas output format differs from what KBS expects. Use the provided script:
109+
110+
```bash
111+
# In the coco-pattern repository root:
112+
make push-firmware-refvals REFVALS_FILE=./refvals-*.json
113+
```
114+
115+
This script:
116+
1. Extracts firmware measurements from veritas JSON
117+
2. Converts to the KBS/RVPS expected format (arrays of hex strings)
118+
3. Pushes to Vault at `secret/data/hub/firmwareReferenceValues`
119+
120+
### Step 3: Vault Secret Format
121+
122+
The script creates a secret with this structure:
123+
124+
```json
125+
{
126+
"mr_td": ["a1b2c3d4..."],
127+
"rtmr_1": ["e5f6a7b8..."],
128+
"rtmr_2": ["c9d0e1f2..."],
129+
"snp_launch_measurement": ["f3e4d5c6..."],
130+
"xfam": ["e742060000000000"]
131+
}
132+
```
133+
134+
**Key points:**
135+
- Each field is an **array** of strings (supports multiple valid values)
136+
- Hash values are lowercase hex strings (SHA-384 = 96 hex chars)
137+
- Empty arrays `[]` mean "not available" - attestation will skip that check
138+
- Missing keys are treated the same as empty arrays
139+
140+
### Step 4: Verify Upload
141+
142+
```bash
143+
# Check the secret was written
144+
vault kv get secret/hub/firmwareReferenceValues
145+
146+
# Expected output:
147+
# ====== Data ======
148+
# Key Value
149+
# --- -----
150+
# mr_td ["a1b2c3d4..."]
151+
# rtmr_1 ["e5f6a7b8..."]
152+
# ...
153+
```
154+
155+
### Step 5: Trigger KBS Sync
156+
157+
The trustee-chart creates an ExternalSecret that pulls from this Vault path. Force a sync:
158+
159+
```bash
160+
# On the cluster with KBS deployed:
161+
oc delete externalsecret firmware-refvals-eso -n trustee-operator-system
162+
163+
# Wait for it to recreate (ArgoCD sync-wave or manual re-apply)
164+
# Verify the secret exists:
165+
oc get secret firmware-reference-values -n trustee-operator-system
166+
```
167+
168+
The RVPS will automatically reload reference values from the `rvps-reference-values` ConfigMap.
169+
170+
## Multi-OCP-Version Support
171+
172+
Different OpenShift versions may have different firmware measurements due to kernel/initrd changes. To support multiple versions:
173+
174+
1. **Collect from each version:**
175+
```bash
176+
# OCP 4.18 cluster
177+
veritas collect --output refvals-ocp-4.18.json
178+
179+
# OCP 4.19 cluster
180+
veritas collect --output refvals-ocp-4.19.json
181+
```
182+
183+
2. **Merge the arrays:**
184+
```json
185+
{
186+
"mr_td": ["<4.18-value>", "<4.19-value>"],
187+
"rtmr_2": ["<4.18-kernel>", "<4.19-kernel>"]
188+
}
189+
```
190+
191+
3. **Push merged values to Vault:**
192+
```bash
193+
vault kv put secret/hub/firmwareReferenceValues \
194+
mr_td='["val1","val2"]' \
195+
rtmr_1='["val1","val2"]' \
196+
rtmr_2='["val1","val2"]'
197+
```
198+
199+
The attestation policy uses `in` checks - a pod passes if its measurement matches **any** value in the array.
200+
201+
## Known Limitations (Veritas Gaps)
202+
203+
As of veritas 0.2.0, the following are **not** collected and must be added manually if needed:
204+
205+
### 1. TCB Version Numbers
206+
207+
Veritas does not extract minimum required TCB levels (e.g., SNP microcode version). To enforce:
208+
209+
```json
210+
{
211+
"tcb_bootloader_min": "3",
212+
"tcb_snp_min": "20",
213+
"tcb_microcode_min": "115"
214+
}
215+
```
216+
217+
Then update the attestation policy to check:
218+
```rego
219+
input.snp.report.reported_tcb.bootloader >= tcb_bootloader_min
220+
```
221+
222+
### 2. SNP Policy Bits
223+
224+
The SNP guest policy contains multiple flags (smt_allowed, migrate_ma, debug, etc.). Veritas reports the full policy word but does not break it into individual enforcement rules.
225+
226+
To enforce specific policy bits, add to attestation policy:
227+
```rego
228+
input.snp.report.policy.smt_allowed == false
229+
input.snp.report.policy.debug == false
230+
```
231+
232+
### 3. Container Image Measurements
233+
234+
Veritas does not measure the application container image digest. Image policy enforcement is handled separately via:
235+
- Confidential Data Hub (CDH) pulling image from KBS
236+
- Kyverno policies validating image signatures (cosign, Notary)
237+
238+
## Troubleshooting
239+
240+
### Veritas collection fails
241+
242+
**Symptom:** `veritas collect` returns empty or errors
243+
244+
**Check:**
245+
1. Pod is using `kata-remote` RuntimeClass
246+
2. Pod is actually running on bare metal (not Azure peer-pods)
247+
3. TEE device exists: `ls /dev/tdx_guest` (TDX) or `ls /dev/sev` (SNP)
248+
249+
### KBS attestation still passes without firmware values
250+
251+
**Expected behavior:** The attestation policy has backwards-compatible fallback rules. If no firmware reference values are in RVPS, the policy only checks `init_data`.
252+
253+
To **enforce** firmware, remove the fallback rules from `attestation-policy.yaml`:
254+
```rego
255+
# Remove these "hardware := 2 if count(query_reference_value(...)) == 0" rules
256+
```
257+
258+
### Hash mismatch after cluster upgrade
259+
260+
**Cause:** Kernel/firmware updated, changing rtmr_2 or mr_td
261+
262+
**Fix:** Re-collect firmware values from upgraded cluster, merge into Vault arrays
263+
264+
## Security Considerations
265+
266+
### Firmware Reference Values Are Sensitive
267+
268+
These values reveal the exact firmware/kernel configuration of your confidential cluster. Treat them as **confidential**:
269+
270+
- Store in Vault with ACLs restricting read access
271+
- Do not commit to public Git repositories
272+
- Rotate if disclosed (re-image nodes with different firmware if possible)
273+
274+
### Attestation Policy Trade-offs
275+
276+
Strict firmware enforcement provides stronger security but reduces operational flexibility:
277+
278+
| Policy | Security | Flexibility |
279+
|--------|----------|-------------|
280+
| init_data only | Medium | High - easy upgrades |
281+
| init_data + firmware | High | Low - every kernel update requires reference value refresh |
282+
| init_data + firmware + TCB min | Highest | Lowest - blocks old firmware entirely |
283+
284+
Choose the level appropriate for your threat model.
285+
286+
### Debug Mode
287+
288+
The attestation policy enforces `debug == false` for both TDX and SNP. Debug mode allows:
289+
- Memory inspection via hypervisor
290+
- Single-stepping the guest
291+
- Extracting secrets from guest memory
292+
293+
**Production workloads must run with debug disabled.** If attestation fails due to debug mode, do not disable the check - fix the KataConfig to disable debug.
294+
295+
## References
296+
297+
- [Veritas Documentation](https://github.com/confidential-containers/veritas)
298+
- [Intel TDX Attestation Spec](https://www.intel.com/content/www/us/en/developer/articles/technical/intel-trust-domain-extensions.html)
299+
- [AMD SEV-SNP Attestation Spec](https://www.amd.com/system/files/TechDocs/56860.pdf)
300+
- [CoCo Attestation Architecture](https://github.com/confidential-containers/attestation-service)

0 commit comments

Comments
 (0)