Skip to content

Commit 121e4f5

Browse files
Address code review feedback: Cast support, doc fix, and refactor
- Add SqlExpr::Cast handling in sql_ast_to_df_expr with basic type mapping (Utf8, Int16/32/64, Float32/64, Boolean); unknown targets return a clear error - Fix stale doc comment on ObligationEffects.row_filters: cross-policy semantics are AND not OR (matches the bug fix in the last commit) - Make FunctionArguments::Subquery return an explicit error instead of silently producing empty args - Remove redundant use StringArray in two test functions (already in scope) - Extract visibility_matches to crate::policy_match module as single source of truth (used by both engine/mod.rs and hooks/policy.rs)
1 parent cd52042 commit 121e4f5

5 files changed

Lines changed: 265 additions & 29 deletions

File tree

ideas.md

Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,115 @@
8787

8888
> **See also:** DS-14 (schema-level deny), DS-15 (table-level deny) in `permission_stories.md`
8989
90+
### Validate `deny` + `column_mask` Combination
91+
92+
- **Problem**: The codebase silently ignores `column_mask` obligations on deny-effect policies. The `PolicyHook` only processes `column_mask` from permit policies, so if a user creates a deny policy with a `column_mask` obligation, it has no effect — the column is not masked.
93+
- **Recommendation**: Add validation in both API and UI to prevent this invalid combination:
94+
- **API**: Reject policy creation/update if `effect: "deny"` and `obligation_type: "column_mask"` are both present. Return a clear validation error.
95+
- **UI**: When "deny" effect is selected, hide `column_mask` from the available obligation types in the policy creation form. Show a tooltip or help text explaining why (e.g., "Column masking is not supported on deny policies").
96+
97+
### Conditional Column Masking
98+
99+
- **Use case**: Mask sensitive columns only when certain user attributes match a condition. For example:
100+
- Mask `salary` when `user.team != 'finance'`
101+
- Mask `ssn` when `user.role != 'admin'`
102+
- Mask `customer_email` when `user.region != user.customer_region`
103+
- **Proposed syntax**:
104+
```json
105+
{
106+
"schema": "hr",
107+
"table": "employees",
108+
"column": "salary",
109+
"mask_expression": "'***'",
110+
"condition": "user.team != 'finance'"
111+
}
112+
```
113+
- **Behavior**: If the condition evaluates to true, apply the mask. If false, return the original column value.
114+
- **Implementation**: Extend `ColumnMaskDef` with an optional `condition` field. At query time, evaluate the condition (similar to how `{user.*}` variables are substituted) and only apply the mask expression if true.
115+
- **Alternative**: Could also support "mask else original" semantics where a different mask is applied when condition is false, but the simple conditional masking covers most use cases.
116+
117+
### Conditional Obligations (All Types)
118+
119+
Should every obligation type support conditional application? This would allow policies that activate only under certain conditions:
120+
121+
| Obligation Type | Conditional Use Case | Complexity |
122+
|-----------------|---------------------|------------|
123+
| `row_filter` | Filter rows only for non-admin users | Low - filter_expression IS the condition |
124+
| `column_mask` | Mask sensitive data for non-permissioned users | Medium - proposed above |
125+
| `column_access deny` | Hide columns from non-admin users | Medium |
126+
| `object_access deny` | Hide schemas/tables from certain teams | Medium |
127+
128+
**Option A: Per-obligation condition field (recommended)**
129+
130+
Each obligation type gets an optional `condition` field:
131+
132+
```json
133+
// Row filter - condition is redundant, filter_expression IS the condition
134+
{
135+
"obligation_type": "row_filter",
136+
"condition": "user.role != 'admin'", // redundant, but consistent
137+
"definition": { "schema": "orders", "table": "*", "filter_expression": "tenant_id = {user.tenant}" }
138+
}
139+
140+
// Column mask - condition adds fine-grained control
141+
{
142+
"obligation_type": "column_mask",
143+
"condition": "user.role != 'admin'",
144+
"definition": { "schema": "hr", "table": "employees", "column": "ssn", "mask_expression": "'***-**-****'" }
145+
}
146+
147+
// Column access deny - hide columns conditionally
148+
{
149+
"obligation_type": "column_access",
150+
"condition": "user.clearance_level < 5",
151+
"definition": { "schema": "secret", "table": "files", "action": "deny", "columns": ["content"] }
152+
}
153+
154+
// Object access deny - hide tables conditionally
155+
{
156+
"obligation_type": "object_access",
157+
"condition": "user.team != 'executive'",
158+
"definition": { "schema": "analytics", "action": "deny" }
159+
}
160+
```
161+
162+
**Option B: Condition IN the obligation definition**
163+
164+
Embed the condition inside the definition object rather than as a top-level field. More compact but less consistent across types.
165+
166+
**Option C: Split into two policies**
167+
168+
Current workaround: Create separate policies for each condition. For example:
169+
- Policy 1: `row_filter` for regular users
170+
- Policy 2: No obligations for admins (implicit allow)
171+
172+
This works but creates policy explosion when combining multiple conditions.
173+
174+
**Recommendation**: Go with **Option A** - add optional `condition` field to all obligation types. This provides:
175+
- Consistency across all obligation types
176+
- Clear semantics: obligation only applies when condition is true
177+
- Future-proof: easy to extend with more complex condition expressions
178+
- Backward compatible: condition is optional, existing policies work unchanged
179+
180+
**Condition expression syntax**:
181+
- Reuse same expression parser as `filter_expression` / `mask_expression`
182+
- Available variables: `{user.*}` substitutions (tenant, username, id, role, team, etc.)
183+
- Operators: `=`, `!=`, `<`, `>`, `<=`, `>=`, `AND`, `OR`, `NOT`, `IN`
184+
- Examples: `user.role = 'admin'`, `user.team NOT IN ('sales', 'marketing')`, `user.clearance_level >= 3`
185+
186+
**Priority when multiple conditions match**:
187+
- If multiple policies have conditions that all evaluate to true, apply all obligations (AND semantics, same as now)
188+
- If a condition evaluates to false, that obligation is skipped
189+
- Order: evaluate all conditions first, then apply matching obligations
190+
191+
**Implementation plan**:
192+
1. Add `condition` field to `Obligation` struct (optional, nullable)
193+
2. Add condition evaluation helper (reuses existing `{user.*}` substitution logic)
194+
3. Update `ObligationEffects::collect()` to check condition before including each obligation
195+
4. Update tests to cover conditional obligations
196+
197+
> **See also:** Related to DM-03 (mask vs. hide decision), DM-04 (canary rollout for testing policies on subset of users)
198+
90199
## UI/UX Improvements
91200

92201
### User Name, Datasource Name, Policy Name Validation
@@ -240,6 +349,13 @@ Given complexity of new policy system (interaction with DataFusion and PostgreSQ
240349
- 2026-03-08: Row filter policy interaction bug - when two separate row filter policies are enabled (e.g., tenant filter on tenant='foo' AND state filter on state!='WY'), the result contains more rows than either policy alone. Both tenant 'foo' rows AND non-WY state rows appear, rather than rows satisfying BOTH conditions.
241350
- Sometimes SQL queries take long time and cause UI to hang - need performance testing, may be missing indexes
242351

352+
### Git Commit Hook Improvements
353+
354+
- Current: pre-commit hook runs `cargo fmt`, `cargo clippy`, `npm run typecheck`, `npm run test:run` on ALL changes (staged + unstaged)
355+
- Problem: Uncommitted unstaged changes cause false failures - hook fails because of code in files you haven't committed yet
356+
- Proposed: Modify hook to only check staged changes using `git diff --staged` or `git diff --cached`
357+
- This preserves ability to have WIP changes without them interfering with the commit process
358+
243359
## Frontend Architecture Guideline: Future-Proofing UI
244360

245361
### 1. Decouple Logic from Presentation

proxy/src/engine/mod.rs

Lines changed: 3 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -638,28 +638,7 @@ async fn create_session_context_from_catalog(
638638
}
639639

640640
// ---------- visibility matching ----------
641-
642-
/// Check whether an obligation's schema/table matches a given (df_alias, table_name) pair.
643-
/// Uses the same semantics as `PolicyHook::matches_schema_table()` to ensure consistency.
644-
fn visibility_matches(
645-
obl_schema: &str,
646-
obl_table: &str,
647-
df_alias: &str,
648-
table: &str,
649-
df_to_upstream: &HashMap<String, String>,
650-
) -> bool {
651-
if obl_table != "*" && obl_table != table {
652-
return false;
653-
}
654-
if obl_schema == "*" {
655-
return true;
656-
}
657-
let upstream = df_to_upstream
658-
.get(df_alias)
659-
.map(|s| s.as_str())
660-
.unwrap_or(df_alias);
661-
obl_schema == upstream
662-
}
641+
// Delegated to crate::policy_match::matches_schema_table (single source of truth).
663642

664643
// ---------- engine cache ----------
665644

@@ -940,7 +919,7 @@ impl EngineCache {
940919
if let Some((obl_schema, obl_table)) = schema_table {
941920
for (df_alias, vs) in &catalog.schemas {
942921
for table_name in vs.tables.keys() {
943-
if visibility_matches(
922+
if crate::policy_match::matches_schema_table(
944923
obl_schema,
945924
obl_table,
946925
df_alias,
@@ -965,7 +944,7 @@ impl EngineCache {
965944
{
966945
for (df_alias, vs) in &catalog.schemas {
967946
for table_name in vs.tables.keys() {
968-
if visibility_matches(
947+
if crate::policy_match::matches_schema_table(
969948
obl_schema,
970949
obl_table,
971950
df_alias,

proxy/src/hooks/policy.rs

Lines changed: 38 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ fn mangle_vars(template: &str, vars: &UserVars) -> (String, Vec<(String, String)
135135

136136
/// Convert a sqlparser AST expression to a DataFusion Expr.
137137
/// Handles: identifiers (column refs or placeholder vars), literals, binary ops,
138-
/// IS NULL, IS NOT NULL, NOT, BETWEEN, LIKE, IN LIST, and scalar functions.
138+
/// IS NULL, IS NOT NULL, NOT, BETWEEN, LIKE, IN LIST, CAST, and scalar functions.
139139
///
140140
/// Pass `Some(ctx)` as `registry` to enable full scalar function lookup (required for
141141
/// column mask expressions). Pass `None` for row filter expressions where only
@@ -289,7 +289,12 @@ fn sql_ast_to_df_expr(
289289
))),
290290
})
291291
.collect::<datafusion::error::Result<Vec<_>>>()?,
292-
_ => vec![],
292+
FunctionArguments::None => vec![],
293+
other => {
294+
return Err(datafusion::error::DataFusionError::Plan(format!(
295+
"Unsupported function arguments in mask/filter expression: {other:?}"
296+
)));
297+
}
293298
};
294299

295300
if let Some(reg) = registry {
@@ -312,6 +317,36 @@ fn sql_ast_to_df_expr(
312317
}
313318
}
314319
}
320+
SqlExpr::Cast {
321+
expr, data_type, ..
322+
} => {
323+
use datafusion::arrow::datatypes::DataType as ArrowType;
324+
use datafusion::sql::sqlparser::ast::DataType as SqlDataType;
325+
let inner = sql_ast_to_df_expr(expr, var_values, registry)?;
326+
let arrow_type = match data_type {
327+
SqlDataType::Varchar(_)
328+
| SqlDataType::Text
329+
| SqlDataType::Char(_)
330+
| SqlDataType::String(_) => ArrowType::Utf8,
331+
SqlDataType::SmallInt(_) => ArrowType::Int16,
332+
SqlDataType::Integer(_) | SqlDataType::Int(_) => ArrowType::Int32,
333+
SqlDataType::BigInt(_) => ArrowType::Int64,
334+
SqlDataType::Float(_) | SqlDataType::Float4 | SqlDataType::Real => {
335+
ArrowType::Float32
336+
}
337+
SqlDataType::Double(_)
338+
| SqlDataType::DoublePrecision
339+
| SqlDataType::Float8
340+
| SqlDataType::Float64 => ArrowType::Float64,
341+
SqlDataType::Boolean => ArrowType::Boolean,
342+
other => {
343+
return Err(datafusion::error::DataFusionError::Plan(format!(
344+
"Unsupported CAST target type in mask/filter expression: {other:?}"
345+
)));
346+
}
347+
};
348+
Ok(datafusion::logical_expr::cast(inner, arrow_type))
349+
}
315350
other => Err(datafusion::error::DataFusionError::Plan(format!(
316351
"Unsupported expression type in filter: {other:?}"
317352
))),
@@ -704,7 +739,7 @@ impl PolicyError {
704739

705740
/// Collected effects from all policies — separates "what to apply" from "how to apply it".
706741
struct ObligationEffects {
707-
/// Combined row filter per (df_schema, table): AND within a policy, OR across policies.
742+
/// Combined row filter per (df_schema, table): AND within a policy, AND across policies.
708743
row_filters: HashMap<(String, String), datafusion::logical_expr::Expr>,
709744
/// Column mask expressions keyed by column name. First (highest priority) wins.
710745
column_masks: HashMap<String, datafusion::logical_expr::Expr>,
@@ -2179,7 +2214,6 @@ mod tests {
21792214
"ssn should be present (masked, not denied): {cols:?}"
21802215
);
21812216
// Verify all SSN values are the mask value, not original data.
2182-
use datafusion::arrow::array::StringArray;
21832217
let ssn_idx = batches[0].schema().index_of("ssn").unwrap();
21842218
let ssn_array = batches[0]
21852219
.column(ssn_idx)
@@ -2220,7 +2254,6 @@ mod tests {
22202254
assert!(had_effects);
22212255
let batches = exec_plan(&ctx, result_plan).await;
22222256
assert_eq!(total_rows(&batches), 3, "Only acme rows expected");
2223-
use datafusion::arrow::array::StringArray;
22242257
let ssn_idx = batches[0].schema().index_of("ssn").unwrap();
22252258
let ssn_array = batches[0]
22262259
.column(ssn_idx)

proxy/src/lib.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,4 +10,5 @@ pub mod engine;
1010
pub mod entity;
1111
pub mod handler;
1212
pub mod hooks;
13+
pub mod policy_match;
1314
pub mod server;

proxy/src/policy_match.rs

Lines changed: 107 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,107 @@
1+
use std::collections::HashMap;
2+
3+
/// Check whether an obligation's (schema, table) pattern matches a DataFusion table scan.
4+
///
5+
/// Supports `"*"` as a wildcard for either field.
6+
/// `df_schema` is the DataFusion schema alias; `df_to_upstream` maps it to the upstream name.
7+
///
8+
/// This is the single source of truth shared by `PolicyHook` and the engine's
9+
/// `compute_user_visibility`. Any changes to matching semantics must be made here.
10+
pub fn matches_schema_table(
11+
obl_schema: &str,
12+
obl_table: &str,
13+
df_schema: &str,
14+
table: &str,
15+
df_to_upstream: &HashMap<String, String>,
16+
) -> bool {
17+
if obl_table != "*" && obl_table != table {
18+
return false;
19+
}
20+
if obl_schema == "*" {
21+
return true;
22+
}
23+
let upstream_schema = df_to_upstream
24+
.get(df_schema)
25+
.map(|s| s.as_str())
26+
.unwrap_or(df_schema);
27+
obl_schema == upstream_schema
28+
}
29+
30+
#[cfg(test)]
31+
mod tests {
32+
use super::*;
33+
34+
#[test]
35+
fn test_exact_match() {
36+
let map = HashMap::new();
37+
assert!(matches_schema_table(
38+
"public", "orders", "public", "orders", &map
39+
));
40+
}
41+
42+
#[test]
43+
fn test_wrong_table() {
44+
let map = HashMap::new();
45+
assert!(!matches_schema_table(
46+
"public", "orders", "public", "users", &map
47+
));
48+
}
49+
50+
#[test]
51+
fn test_schema_wildcard() {
52+
let map = HashMap::new();
53+
assert!(matches_schema_table(
54+
"*",
55+
"orders",
56+
"any_schema",
57+
"orders",
58+
&map
59+
));
60+
}
61+
62+
#[test]
63+
fn test_table_wildcard() {
64+
let map = HashMap::new();
65+
assert!(matches_schema_table(
66+
"public", "*", "public", "anything", &map
67+
));
68+
}
69+
70+
#[test]
71+
fn test_both_wildcards() {
72+
let map = HashMap::new();
73+
assert!(matches_schema_table("*", "*", "any", "anything", &map));
74+
}
75+
76+
#[test]
77+
fn test_alias_resolved() {
78+
let mut map = HashMap::new();
79+
map.insert("sales".to_string(), "public".to_string());
80+
assert!(matches_schema_table(
81+
"public", "orders", "sales", "orders", &map
82+
));
83+
}
84+
85+
#[test]
86+
fn test_alias_no_match() {
87+
let mut map = HashMap::new();
88+
map.insert("sales".to_string(), "public".to_string());
89+
assert!(!matches_schema_table(
90+
"private", "orders", "sales", "orders", &map
91+
));
92+
}
93+
94+
#[test]
95+
fn test_wrong_schema_no_alias() {
96+
let map = HashMap::new();
97+
assert!(!matches_schema_table(
98+
"public", "orders", "private", "orders", &map
99+
));
100+
}
101+
102+
#[test]
103+
fn test_empty_schema_matches_itself() {
104+
let map = HashMap::new();
105+
assert!(matches_schema_table("", "orders", "", "orders", &map));
106+
}
107+
}

0 commit comments

Comments
 (0)