Skip to content

Commit ca82b82

Browse files
committed
plan(super-domain-rbac-v1): §13 refinements — compose onto PolicyRewriter, A+B+C federation, merkle hard-lock, anonymized researcher
Five same-session refinements folded in additively (§1-§12 architecture unchanged): (1) Enforcement composes onto shipped lance-graph-callcenter::policy PolicyRewriter chain + PolicyKind taxonomy (RowFilter / ColumnMask / RowEncryption / DifferentialPrivacy / Audit). The 4-stage authorize() maps 1:1 onto PolicyKind variants — no parallel enforcement path. ~30% Tier A LOC reduction. (2) Cross-tenant federation upgraded: A (PureWall) + B (KAnonymity) + C (EncryptedViewAggregate) all accepted. Option C lifted from 2027+ R&D track to viable now via LanceDB transparent encrypted views — the engine scans/filters/aggregates over encrypted columns without decrypting full rows. (3) Audit chain integrity built-in via shipped MerkleRoot::from_fingerprint + ClamPath from graph/spo/merkle.rs. AuditEntry carries merkle_root + clam_path + super_domain_salt; HIPAA reviewers detect post-hoc tampering because the merkle would not validate. (4) Hard-lock requirement formalized: Healthcare ↔ OSINT (and 3 other pairs) get 3 layers of cryptographic defense — predicate-time rejection + per-super-domain merkle salt + super-domain-scoped HKDF key derivation. Patient history and OSINT cannot be jointly queried under any role; a leaked row decrypts only with both tenant DEK AND super-domain context. (5) researcher role hardened to anonymized-projection-only: PermissionSet::READ only, no WRITE/EXPORT/REDACT_LIFT, k-anonymity floor (k≥5 default; per-super-domain override for rare-condition Healthcare research), DP noise auto-injected on aggregates via PolicyKind:: DifferentialPrivacy. New deliverables: D-SDR-13 (per-SD merkle salt + HKDF), D-SDR-14 (updated AuditEntry + tamper-detect replay), D-SDR-15 (DP for researcher), D-SDR-16 (EncryptedViewAggregate federation), D-SDR-17 (hard-lock partner matrix enforcement). Resolved open questions: audit format choice + cross-tenant federation. New open questions: hard-lock partner matrix completeness + per-SD DP epsilon defaults + merkle salt rotation cadence + per-SD k-anonymity floor overrides. INTEGRATION_PLANS.md correction line appended per APPEND-ONLY governance.
1 parent 5b3ba39 commit ca82b82

2 files changed

Lines changed: 182 additions & 0 deletions

File tree

.claude/board/INTEGRATION_PLANS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@
4747
**Cross-ref:** `palantir-parity-cascade-v2.md` (this spec adds the enforcement surface), `lance-graph-ontology-v5.md` (this spec sits above v5; v5 unchanged), `GLUE_LAYER_OGIT_TO_OWL_SPEC.md` (source for OWL property characteristics bitfield).
4848
**Open questions:** Foundry ObjectType cross-walk targets, Wikidata QID mappings, audit format choice (JSON Lines / CloudEvents / OTel), DEK rotation cadence, escalation UX, HPO/MONDO multi-member confirmation, slot 0xFF schema-only convention.
4949

50+
**Correction (2026-05-13):** §13 refinements added (same session). (a) Enforcement composes onto shipped `lance-graph-callcenter::policy::PolicyRewriter` chain + `PolicyKind` taxonomy (RowFilter/ColumnMask/RowEncryption/DifferentialPrivacy/Audit) rather than introducing parallel path — ~30% Tier A LOC reduction. (b) Cross-tenant federation upgraded to A+B+C all accepted; Option C (`EncryptedViewAggregate`) viable now via LanceDB transparent encrypted views, not 2027+ R&D. (c) Audit chain integrity built-in via `MerkleRoot::from_fingerprint` + `ClamPath` from `graph/spo/merkle.rs` (the merkle/DN-path mixing already shipped). (d) Hard-lock requirement formalized: Healthcare ↔ OSINT (and 3 other pairs) get 3 layers of defense — predicate + per-super-domain merkle salt + super-domain-scoped HKDF key derivation. (e) `researcher` role hardened to anonymized-projection-only with k-anonymity floor + DP noise injection on aggregates. New deliverables D-SDR-13..17 added. Open questions on audit format + cross-tenant federation RESOLVED; new open questions on hard-lock partner matrix + per-super-domain DP epsilon + merkle salt rotation cadence.
51+
5052
---
5153

5254
## v1 — LF Integration Mapping (authored 2026-04-25)

.claude/plans/super-domain-rbac-tenancy-v1.md

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -577,3 +577,183 @@ A is the conservative HIPAA-defensible default. B is the practical compromise fo
577577
## 12 — One-line summary
578578

579579
> 4-level hierarchy (meta-anchor → super domain → OGIT basin → slot), 6 bytes per row (4-byte tenant + 2-byte OWL identity), inline per-family codebook with label+schema+verbs, single masked predicate enforces tenant + super-domain + role + slot in one DataFusion vector pass. Foundry parity at the enforcement surface, sub-microsecond hot path.
580+
581+
---
582+
583+
## 13 — Refinements (2026-05-13, same session)
584+
585+
Post-draft user feedback surfaced four substrate facts and two requirement upgrades. All folded in as additive corrections — the §3 DTOs, §8 deliverables, and §12 summary remain valid; this section makes the underlying compositor explicit and tightens the federation + hard-lock policy.
586+
587+
### 13.1 The compositor is already shipped: `lance-graph-callcenter::policy`
588+
589+
The 4-stage `UnifiedBridge::authorize()` (§3.9) is **not** a new enforcement layer — it composes against `lance-graph-callcenter/src/policy.rs`'s shipped `PolicyRewriter` trait + `PolicyKind` taxonomy:
590+
591+
```rust
592+
pub enum PolicyKind {
593+
RowFilter, // tenant + super-domain + basin bitmask predicates
594+
ColumnMask, // per-slot RedactionMask (Null/Constant/Hash/Truncate)
595+
RowEncryption, // per-tenant DEK at LanceDB column level
596+
DifferentialPrivacy, // k-anonymity aggregate noise (federation Option B)
597+
Audit, // side-channel emission per access
598+
}
599+
```
600+
601+
The 4 stages map 1:1 to `PolicyKind` variants:
602+
603+
| `authorize()` stage | `PolicyKind` |
604+
|---|---|
605+
| 1. Chinese wall (tenant) | `RowFilter` (existing `RlsRewriter` as ancestor) |
606+
| 2. Super-domain | `RowFilter` (additional predicate, same rewriter chain) |
607+
| 3. Role group | `ColumnMask` (drives slot-level visibility per `RedactionMode`) |
608+
| 4. Slot redaction | `ColumnMask` + `RowEncryption` (when `clearance_floor ≥ Confidential`) |
609+
| Audit emission | `Audit` (composed last) |
610+
611+
**Consequence:** Tier A deliverables (D-SDR-1..5) **wire onto the existing `PolicyRewriter` chain** rather than introducing a parallel enforcement path. ~30% LOC reduction on Tier A. The DataFusion `OptimizerRule` machinery already handles the predicate-vector composition described in §3.10.
612+
613+
### 13.2 LanceDB transparent encryption upgrades Option C from R&D to viable
614+
615+
Earlier framing of cross-tenant federation (§6) classified Option C (homomorphic-encryption aggregate) as a 2027+ R&D track. **Correction:** LanceDB ships **transparent encrypted views** at the column level — the engine scans/filters/aggregates over encrypted columns without decrypting full rows, with key access gated by tenant DEK. This is the substrate Option C needs without bespoke FHE primitives.
616+
617+
**Updated federation policy:**
618+
619+
```rust
620+
#[repr(u8)]
621+
pub enum FederationPolicy {
622+
PureWall = 0, // default — no cross-tenant queries
623+
KAnonymityAggregate = 1, // Phase 2 — k ≥ 5 via PolicyKind::DifferentialPrivacy
624+
EncryptedViewAggregate = 2, // Phase 2-3 — LanceDB transparent encrypted view + per-tenant DEK
625+
// (was Option C, now viable; not slow)
626+
}
627+
```
628+
629+
**Tier E (D-SDR-12) scope expands:** ship A+B together as Phase 2; add Phase 3 EncryptedViewAggregate path that lifts the k-anonymity threshold for tenants whose data column is encrypted at rest with their own DEK (the engine aggregates over ciphertext when the operation is sum/count/avg with bounded sensitivity).
630+
631+
### 13.3 Merkle + ClamPath integration: audit chain + hard-lock attestation
632+
633+
`crates/lance-graph/src/graph/spo/merkle.rs` ships:
634+
635+
```rust
636+
pub struct MerkleRoot(pub u64); // XOR-fold hash of fingerprint content
637+
impl MerkleRoot {
638+
pub fn from_fingerprint(fp: &Fingerprint) -> Self { /* ... */ }
639+
}
640+
641+
pub struct ClamPath { // hierarchical DN address
642+
pub path: String, // "agent:test:node"
643+
pub depth: u32,
644+
}
645+
```
646+
647+
**The merkle/DN-path mixing the user remembered is here**`MerkleRoot` stamps content, `ClamPath` carries the hierarchical address (the same shape as `DnPath` from `lance-graph-callcenter::dn_path`).
648+
649+
**Wire into the spec:**
650+
651+
- **Audit chain integrity:** every `AuditEntry` (Tier D, D-SDR-10) carries the `MerkleRoot` of the row at access time. A second access produces a new merkle root; the audit log records both, so HIPAA reviewers can detect post-hoc tampering (the merkle would not validate against the recorded fingerprint).
652+
- **Hard-lock attestation (§13.4):** the cryptographic separation between Healthcare and OSINT super domains is attested by **distinct merkle root salts per super domain**. A row whose merkle root validates against the OSINT salt cannot validate against the Healthcare salt, so a leaked row is provably mis-routed at integrity-check time even if the predicate filter is misconfigured.
653+
654+
**Updated `AuditEntry` shape** (Tier D refinement):
655+
656+
```rust
657+
pub struct AuditEntry {
658+
pub tenant: TenantId,
659+
pub super_domain: SuperDomain,
660+
pub actor_role: &'static str,
661+
pub owl: OwlIdentity,
662+
pub op: u8, // PermissionSet bit
663+
pub merkle_root: MerkleRoot, // NEW: fingerprint at access time
664+
pub clam_path: ClamPath, // NEW: hierarchical DN address
665+
pub timestamp: u64,
666+
pub super_domain_salt: u64, // NEW: per-super-domain merkle salt
667+
}
668+
```
669+
670+
### 13.4 Hard-lock requirement: Healthcare ↔ OSINT crypto barrier
671+
672+
**HIPAA compliance and clinical staff trust require a guarantee stronger than predicate filtering between patient history and OSINT.** The user's framing: "doctors will want to know that patient history and OSINT are hard lock."
673+
674+
**Updated DTO:**
675+
676+
```rust
677+
pub struct SuperDomainEntry {
678+
// ... fields as in §3.4 ...
679+
pub merkle_salt: u64, // NEW: per-super-domain integrity salt
680+
pub hard_lock_partners: &'static [SuperDomain], // NEW: explicit cryptographic separation
681+
}
682+
```
683+
684+
**Per-super-domain hard-lock matrix (initial):**
685+
686+
| Super domain | `hard_lock_partners` (cannot share rows or be queried jointly under any role) |
687+
|---|---|
688+
| `Healthcare` | `[OSINT]` |
689+
| `OSINT` | `[Healthcare]` |
690+
| `WorkOrderBilling` | `[OSINT]` (financial confidentiality) |
691+
| `Science` (when ITAR-tagged) | `[OSINT]` (export control vs intel) |
692+
693+
**Enforcement mechanism (3 layers of defense):**
694+
695+
1. **Predicate-time:** `authorize()` rejects any query whose `super_domain_target` is in the actor's `hard_lock_partners` list, even if the actor has the source super domain authorized.
696+
2. **Integrity-time:** different `merkle_salt` per super domain. A misconfigured query that bypasses (1) cannot validate cross-domain merkle roots.
697+
3. **Encryption-time:** rows in a hard-locked super domain are encrypted with super-domain-scoped key derivation (per-tenant DEK × per-super-domain HKDF info string). A leaked row decrypts only with both the tenant DEK *and* the super-domain context — neither alone suffices.
698+
699+
**Sales narrative refresh:**
700+
701+
> "Patient history and OSINT are hard-locked. Three layers of defense — predicate, merkle salt, key derivation. A clinician's bridge cannot construct a query that joins patient records with intel; the optimizer rejects it, the merkle would not validate, and the encryption keys won't combine. HIPAA reviewers see a cryptographically attested separation, not a policy promise."
702+
703+
### 13.5 Research role: anonymized projection only
704+
705+
**The `researcher` role from §4.3 is upgraded to a hard requirement, not a configuration knob.** Per user: "research using anonymized."
706+
707+
**Updated `researcher` role definition:**
708+
709+
```rust
710+
RoleGroup {
711+
role_name: "researcher",
712+
permissions: PermissionSet(PermissionSet::READ), // no WRITE, no EXPORT, no REDACT_LIFT
713+
clearance_floor: ClearanceLevel(1), // Restricted; never elevated
714+
audit_required: true, // every access logged with k-anonymity check
715+
redaction_mask: FieldRedactionMask {
716+
readable_slots: BIT_SET_DEIDENTIFIED_ONLY, // only de-identified slots visible
717+
writable_slots: BitSet256([0; 4]), // empty — researchers never write
718+
redacted_slots: BIT_SET_DIRECT_IDENTIFIERS, // name, SSN, DOB, MRN, address — all hashed
719+
},
720+
},
721+
```
722+
723+
**Composes with `PolicyKind::DifferentialPrivacy`:** when the researcher role queries an aggregate, the optimizer chain auto-injects DP noise per the differential-privacy parameter `ε` configured at the super-domain level (`SuperDomainEntry.dp_epsilon: f32`, NEW field).
724+
725+
**Three additive constraints for the researcher role:**
726+
727+
1. **Field-level:** direct identifiers always hashed (k-anonymity-style pseudonymization).
728+
2. **Row-level:** queries over <k=5 rows error out with `RbacError::KAnonymityViolation` rather than returning thin slices.
729+
3. **Aggregate-level:** when the federated-aggregation gate (§13.2) is enabled, cross-tenant aggregates pass through the encrypted-view path — researcher never sees any tenant's raw values, even pseudonymized.
730+
731+
### 13.6 Net architecture diff vs §1-§12
732+
733+
| Aspect | §1-§12 baseline | §13 refinement |
734+
|---|---|---|
735+
| Enforcement mechanism | New 4-stage `authorize()` | Composes onto shipped `PolicyRewriter` chain in `lance-graph-callcenter::policy` |
736+
| Federation | A+B accepted, C deferred to 2027+ | A+B+C all accepted; C uses LanceDB transparent encrypted view |
737+
| Audit format | TBD (open question) | `AuditEntry` carries `MerkleRoot + ClamPath + super_domain_salt`; tamper-detection built in |
738+
| Cross-domain leakage | Predicate filter + per-tenant DEK | + per-super-domain merkle salt + super-domain-scoped key derivation = 3 layers |
739+
| Researcher role | Optional configuration | Hard requirement: anonymized projection + k-anonymity floor + DP noise on aggregates |
740+
| Hard-lock pairs | Not specified | Healthcare ↔ OSINT, WorkOrderBilling ↔ OSINT, Science(ITAR) ↔ OSINT |
741+
742+
### 13.7 Updated open questions (§10 carry-over + new)
743+
744+
- ~~**Audit format choice**~~ — RESOLVED in §13.3: `AuditEntry` shape with merkle + ClamPath + salt. JSON Lines for serialization, OTel optional bridge.
745+
- ~~**Cross-tenant federation**~~ — RESOLVED in §13.2: A+B+C all accepted.
746+
- **Hard-lock partner matrix** — confirm the initial 4 pairs in §13.4 are correct; any additional pairs (e.g., Genetics ↔ OSINT?) before locking.
747+
- **Per-super-domain DP epsilon defaults**`dp_epsilon` per super domain (Healthcare = 1.0? OSINT = 0.1?) needs statistician-level review.
748+
- **Merkle salt rotation** — quarterly per-super-domain salt rotation for audit-chain unforgeability; aligns with DEK rotation cadence.
749+
- **K-anonymity floor for `researcher`** — k=5 default; per-super-domain override needed (Healthcare typically k=10 for rare-condition research).
750+
751+
### 13.8 Tier additions
752+
753+
- **D-SDR-13**`merkle_salt` field on `SuperDomainEntry` + per-super-domain HKDF context derivation in `TenantContext::encryption_key`. ~80 LOC + 4 integration tests covering hard-lock crypto barrier.
754+
- **D-SDR-14**`AuditEntry` updated schema (merkle + ClamPath + salt) + `JsonLinesAuditSink` impl that includes integrity verification on replay. ~120 LOC + 6 tests including post-hoc tamper detection.
755+
- **D-SDR-15**`PolicyKind::DifferentialPrivacy` activation for `researcher` role: aggregate-only enforcement + ε-bounded noise injection + k-anonymity floor check. ~150 LOC + 5 tests.
756+
- **D-SDR-16**`EncryptedViewAggregate` federation policy: LanceDB transparent encrypted view bridge for cross-tenant aggregate. ~200 LOC + 4 integration tests against an actual encrypted column.
757+
- **D-SDR-17** — Hard-lock partner matrix as static table + predicate-time enforcement in `authorize()`. ~60 LOC + 4 tests covering each documented pair.
758+
759+
**Status:** Refinements are additive to the §1-§12 architecture. No prior DTO removed; all existing fields stay. Merkle/audit/hard-lock weave through the existing 4-stage flow as policy-rewriter composition rather than parallel paths.

0 commit comments

Comments
 (0)