Skip to content

Commit 77e042a

Browse files
arunpamulapatishdzhangkiran-anandshreelshah12
authored
Release/0.7.0 core (#311)
* Fix typos in configs and notebook docstrings - Fix GOV-20 check name: "Unity Sharing" → "Unity Catalog" - Fix "metasores" → "metastores" in GOV-20 logic description - Fix "cofiguration" → "configuration" in 5 manual check descriptions (DP-4, IA-1, IA-2, IA-3, INFO-7) - Fix "configruation" → "configuration" in initialize.py and sat_checks_config.py docstrings * Fix "recepient" → "recipient" typo in delta sharing code - Rename functions: get_sharing_recepients_list, get_sharing_recepient, get_sharing_recepient_permissions → recipient spelling - Rename intermediate table: delta_sharing_recepients_list → delta_sharing_recipients_list (safe: table is in intermediate schema which is dropped after each SAT run) - Update all callers in workspace_bootstrap, workspace_analysis, and tests * Fix typos in security checks CSV and workspace analysis notebook - configs/security_best_practices.csv: fix "custer"→"cluster" (GOV-9) and "diplayName"→"displayName" (INFO-6) - notebooks/Includes/workspace_analysis.py: fix "uc_metasore*" → "uc_metastore*" in local variable and inner function names * Revert non-typo changes swept into previous commit Restores CHANGELOG.md and configs/sat_dasf_mapping.csv to their release/0.7.0_core state, and removes the unintended backup file. This branch should only contain the two typo fixes: - security_best_practices.csv: custer→cluster, diplayName→displayName - workspace_analysis.py: uc_metasore*→uc_metastore* * Fix 2 CSV typos cleanly (no extra changes) Restores security_best_practices.csv to its pre-ac66c3d state and re-applies only the 2 intended typo fixes: - GOV-9: 'custer' → 'cluster' in recommendation field - INFO-6: 'diplayName' → 'displayName' in logic field * Bump SDK to 0.1.41 and rebuild wheel - Increment __version__ in setup.py: 0.1.40 → 0.1.41 - Replace lib/dbl_sat_sdk-0.1.40-*.whl with newly built 0.1.41 wheel - Update SDK_VERSION in install_sat_sdk.py to match * feat(SFE-4539): add UC schema/table/column comments for Genie Adds apply_schema_comments() to common.py with data-verified descriptions for all 12 tables and ~110 columns in the security_analysis schema. Called from initialize.py after load_sat_dasf_mapping() so comments are applied on every SAT run. - Covers 9 SAT core tables + 3 BrickHound tables (BH wrapped in try/except) - Single quotes escaped as '' in SQL literals - Idempotent: safe to re-run on existing deployments * docs: add CLAUDE.md to repo and schema comment/validation rules - Add CLAUDE.md to version control (force-add, was previously gitignored) - Add Schema Comment Sync rule: keep apply_schema_comments() in sync whenever tables/columns are added, removed, or renamed - Add security_best_practices.csv uniqueness validation rule: check that id and check_id are both unique before every CSV commit - Add Pre-Commit Typo Check rule using codespell - Update security_best_practices table/column comments to explicitly document that id and check_id are unique identifiers * chore: allow CLAUDE.md to be tracked by git Remove CLAUDE.md from .gitignore so it is versioned alongside the codebase. * refactor(SFE-4539): co-locate table/column comments with table creation Move all UC Genie column/table comments from the monolithic apply_schema_comments() into the function that creates each table, so documentation lives next to the DDL that defines it. - Each create_*() in common.py now calls _set_table_comment + _set_column_comments immediately after its CREATE TABLE DDL - readBestPracticesConfigsFile() and load_sat_dasf_mapping() set comments inside their saveAsTable blocks - apply_schema_comments() deleted; callers in initialize.py removed - Schema-level COMMENT ON SCHEMA moved into create_schema() - BrickHound: get_vertex_schema() and get_edge_schema() in schema.py get inline COMMENT clauses on every column - New GraphSchema.get_metadata_schema() DDL with inline column comments called in permission_analysis_data_collection.py before the first brickhound_collection_metadata saveAsTable, fixing the timing bug where BrickHound table comments were silently skipped at init time * fix(SFE-4539): guard GraphSchema call when brickhound not installed When brickhound is not installed the try/except ImportError block leaves GraphSchema undefined, causing a NameError at the get_metadata_schema() call added in the SFE-4539 refactor. Fix: set GraphSchema = None in the except branch and guard the spark.sql() call with `if GraphSchema is not None`. When brickhound IS installed the UC column comments are applied as before. When it is not installed the metadata table is still created via saveAsTable without column comments, restoring the original works-with-or-without contract. * feat(SFE-4539): add UC table/column comments for brickhound tables Replace unreliable GraphSchema DDL approach with explicit ALTER TABLE ALTER COLUMN COMMENT calls after each saveAsTable, mirroring the SAT pattern used in common.py. Comments now apply regardless of whether brickhound is installed and work on pre-existing tables. Tables covered: brickhound_vertices (14 cols), brickhound_edges (8 cols), brickhound_collection_metadata (10 cols). * feat(SFE-4548): add INFO-42 Git repository allowlist check - Add check id=113, check_id=INFO-42 to security_best_practices.csv - Implement enableProjectsAllowList rule in workspace_settings.py - Add DASF-52 mapping to sat_dasf_mapping.csv - Append Phase 5 (bugs) and Phase 6 (new check backlog) to sat_checks_audit.md * feat(SFE-4548): include Git URL allowlist entries in INFO-42 check details When enableProjectsAllowList passes, also return the projectsAllowList comma-separated URL prefixes in additional details so reviewers can verify the allowlist is properly scoped. Add projectsAllowList to ws_keymap and expand the SQL/rule function to fetch both keys. Rebuild wheel 0.1.41. * feat(SFE-4548): add /add-sat-check Claude skill and expand checks audit doc - Add .claude/commands/add-sat-check.md: project-level skill that guides end-to-end implementation of a new SAT security check (CSV → SDK → notebook check block → DASF mapping → validations) - Expand docs/sat_checks_audit.md with INFO-42 additional-details section and Phase 7 planned checks (NS-12, IA-10, GOV-44, NS-13) * fix: correct typo 'respones' -> 'responses' in notebook header comments * chore: remove sat_checks_audit.md from branch * feat(SFE-4549): remove 24 unrelated checks and self-assessment functionality - Remove 24 checks from security_best_practices.csv: DP-4, GOV-1/6/7/8/9/13/23/24/26, IA-1/2/3/7, INFO-1/2/4/7/12/13/14/17, NS-1/2 - Remove corresponding DASF mapping entries and self_assessment_checks.yaml - Delete self-assessment notebook (Setup/9) and its two functions in sat_checks_config.py - Remove schema fields: object_storage_encrypted, vpc_peering_done, table_access_control_enabled, sso_enabled, scim_enabled from account_workspaces DDL, CSV schema, drivers, setup notebooks, and config utilities - Remove all check implementations from workspace_analysis.py and workspace_settings.py * feat(SFE-4549): remove legacy Databricks SQL API EOL endpoints and bump SDK to 0.1.42 Remove deprecated /api/2.0/sql/alerts, /sql/queries, /sql/config/warehouses, /preview/sql/permissions, and /preview/sql/data_sources usages. Dead-code bootstrap calls for dbsql_workspaceconfig and dbsql_alerts are dropped from workspace_bootstrap.py. configure_alerts_template.py now uses the warehouse ID directly as data_source_id. SDK version bumped 0.1.41 -> 0.1.42 with rebuilt wheel. * feat(SFE-4549): remove alerts feature and drop alert column from schema - Delete notebooks/Setup/6. configure_alerts_template.py (deprecated SQL API endpoints at EOL) - Remove dangling references to notebooks 6 and 9 from security_analysis_initializer.py - Drop alert column from configs/security_best_practices.csv (all 55 rows had alert=0) - Remove alert from security_best_practices Delta table schema in common.py (schema_list, DDL, select, column comments) - Remove alert widget and UPDATE SQL field from sat_checks_config.py * fix(SFE-4549): NS-9 — treat all DRY_RUN modes and unknown enforcement as violations Only ENFORCED passes. Selective dry-run (non-empty product filter) now returns DRY_RUN_SELECTIVE violation instead of passing. Unknown/missing enforcement_mode now returns UNKNOWN_ENFORCEMENT_MODE violation instead of passing. * fix: escape single quotes in schema comment helpers to prevent SQL parse errors _set_table_comment and _set_column_comments now escape single quotes in comment strings before interpolating into SQL. Also removed embedded single-quoted examples from the additional_details column comment that triggered the error. * fix: correct SQL single-quote escaping in comment helpers and insertIntoInfoTable - _set_table_comment/_set_column_comments: use standard SQL '' escaping instead of backslash escaping (which is unreliable in Spark SQL) - insertIntoInfoTable: escape name and category before SQL interpolation (was already escaping jsonstr but not the other string fields) * fix: remove pre-escaped single quotes from schema comment strings Comment strings passed to _set_table_comment/_set_column_comments were using SQL-style '' escaping manually, which the helper then doubled again to '''', causing PARSE_SYNTAX_ERROR. Replaced all ''word'' patterns with plain text. Affected: account_info.category, account_workspaces table comment, sat_dasf_mapping table and dasf_control_id column comments. * removing manual config text from dashboard * widget name * chore: bump sat_version to 0.7.0 in initialize.py --------- Co-authored-by: shdzhang <39942190+shdzhang@users.noreply.github.com> Co-authored-by: Kiran Anand <16294307+kiran-anand@users.noreply.github.com> Co-authored-by: Shreel Shah <shreelshah12@gmail.com>
1 parent 95d4324 commit 77e042a

32 files changed

Lines changed: 1285 additions & 1591 deletions

.claude/commands/add-sat-check.md

Lines changed: 222 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,222 @@
1+
A new security check row has been added to `configs/security_best_practices.csv`.
2+
Implement it end-to-end by following these steps in order.
3+
4+
---
5+
6+
## Step 1 — Read the new check definition
7+
8+
Find the newly added row (the one with the highest `id` value) in
9+
`configs/security_best_practices.csv`. Extract:
10+
- `id` (integer), `check_id` (e.g. INFO-42), `category`, `check` (display name)
11+
- `evaluation_value`, `severity`, `aws`, `azure`, `gcp`, `enable`
12+
- `logic` field — this describes the data source (workspace-conf key, Settings V2 type, or table name)
13+
- `api` field — the API endpoint being called
14+
15+
---
16+
17+
## Step 2 — Determine the implementation file and data source type
18+
19+
Choose based on the `logic`/`api` fields:
20+
21+
| Data source | Implementation file |
22+
|---|---|
23+
| `workspacesettings` table (workspace-conf key via `GET /preview/workspace-conf?keys=...`) | `notebooks/Includes/workspace_settings.py` |
24+
| Dedicated Settings V2 table (`automatic_cluster_update`, `compliance_security_profile`, etc.) | `notebooks/Includes/workspace_analysis.py` |
25+
| Data table (clusters, jobs, tokens, secretscopes, etc.) | `notebooks/Includes/workspace_analysis.py` |
26+
27+
Note: `workspace_settings.py` uses variable `id`; `workspace_analysis.py` uses `check_id`.
28+
29+
---
30+
31+
## Step 3 — Verify data collection; add to SDK if missing
32+
33+
### For workspace-conf keys (workspace_settings.py path):
34+
Check if the key exists in `ws_keymap` in
35+
`src/securityanalysistoolproject/clientpkgs/ws_settings_client.py`.
36+
37+
If the key is missing, add it to `ws_keymap`:
38+
```python
39+
{"name": "<key_name>", "defn": "<description of what this setting controls>"},
40+
```
41+
Insert near related keys. Then rebuild the wheel (see Step 6).
42+
43+
### For Settings V2 types (workspace_analysis.py path):
44+
Check if a getter method exists in `ws_settings_client.py` (workspace-level) or
45+
`accounts_settings.py` (account-level), AND a corresponding `bootstrap(...)` call exists in
46+
`workspace_bootstrap.py` or `accounts_bootstrap.py`.
47+
48+
If missing, add both the method and the bootstrap call. Then rebuild the wheel.
49+
50+
### For data tables (workspace_analysis.py path):
51+
Check if the table is already bootstrapped in `workspace_bootstrap.py` or
52+
`accounts_bootstrap.py`. If already present, no SDK change is needed.
53+
54+
---
55+
56+
## Step 4 — Implement the check block
57+
58+
Insert a new `# COMMAND ----------` block **before** the final timing/exit block at the end of
59+
the target file.
60+
61+
### Pattern A — workspace_settings.py, boolean check where TRUE = good (PASS)
62+
(Mirror: `enforceUserIsolation` id=40, `enableEnforceImdsV2` id=43, `enableProjectsAllowList` id=113)
63+
64+
```python
65+
# COMMAND ----------
66+
67+
id = '<id>' # <check name>
68+
enabled, sbp_rec = getSecurityBestPracticeRecord(id, cloud_type)
69+
70+
def <camelCaseFunctionName>(df):
71+
value = 'false'
72+
defn = {'defn' : ''}
73+
for row in df.collect():
74+
value = row.value if row.value else 'false'
75+
defn = {'defn' : row.defn.replace("'", '')}
76+
if value == 'true':
77+
return (id, 0, defn)
78+
else:
79+
return (id, 1, defn)
80+
81+
if enabled:
82+
tbl_name = 'workspacesettings' + '_' + workspace_id
83+
sql = f\'\'\'
84+
SELECT * FROM {tbl_name}
85+
WHERE name="<workspace_conf_key>"
86+
\'\'\'
87+
sqlctrl(workspace_id, sql, <camelCaseFunctionName>)
88+
```
89+
90+
### Pattern B — workspace_settings.py, boolean check where TRUE = bad (VIOLATION)
91+
(Mirror: `enableDeprecatedGlobalInitScripts` id=63, `enableDeprecatedClusterNamedInitScripts` id=65)
92+
93+
Same as Pattern A but swap the return scores:
94+
```python
95+
if value == 'true':
96+
return (id, 1, defn) # true = violation
97+
else:
98+
return (id, 0, defn) # false = pass
99+
```
100+
101+
### Pattern C — workspace_analysis.py, Settings V2 table check
102+
(Mirror: `enhanced_security_monitoring` id=109, `automatic_cluster_update` id=107)
103+
104+
```python
105+
# COMMAND ----------
106+
107+
check_id = '<id>' # <check name>
108+
enabled, sbp_rec = getSecurityBestPracticeRecord(check_id, cloud_type)
109+
110+
def <snake_case_function_name>(df):
111+
if df is not None and not isEmpty(df):
112+
return (check_id, 0, {'<setting_name>': 'True'})
113+
else:
114+
return (check_id, 1, {'<setting_name>': 'False'})
115+
116+
if enabled:
117+
tbl_name = '<settings_table_name>' + '_' + workspace_id
118+
sql = f\'\'\'
119+
SELECT * FROM {tbl_name}
120+
WHERE <condition_column> = true
121+
\'\'\'
122+
sqlctrl(workspace_id, sql, <snake_case_function_name>)
123+
```
124+
125+
### Pattern D — workspace_analysis.py, data table check (count-based violation)
126+
(Mirror: NS-1 clusters SSH keys, IA-4 tokens with no lifetime)
127+
128+
```python
129+
# COMMAND ----------
130+
131+
check_id = '<id>' # <check name>
132+
enabled, sbp_rec = getSecurityBestPracticeRecord(check_id, cloud_type)
133+
134+
def <snake_case_function_name>(df):
135+
if df is not None and not isEmpty(df):
136+
violations = {}
137+
for row in df.collect():
138+
violations[row['<id_column>']] = row['<name_column>']
139+
return (check_id, len(violations), violations)
140+
else:
141+
return (check_id, 0, {})
142+
143+
if enabled:
144+
tbl_name = '<table_name>' + '_' + workspace_id
145+
sql = f\'\'\'
146+
SELECT <id_column>, <name_column>
147+
FROM {tbl_name}
148+
WHERE <violation_condition>
149+
\'\'\'
150+
sqlctrl(workspace_id, sql, <snake_case_function_name>)
151+
```
152+
153+
---
154+
155+
## Step 5 — Add DASF mapping
156+
157+
Append to `configs/sat_dasf_mapping.csv` if the check maps to a DASF control:
158+
```
159+
<id>,DASF-XX:<control name>,
160+
```
161+
If no clear DASF mapping applies, skip this step.
162+
163+
---
164+
165+
## Step 6 — Rebuild wheel (only if SDK was changed in Step 3)
166+
167+
```bash
168+
cd src/securityanalysistoolproject
169+
python setup.py sdist bdist_wheel
170+
cp dist/dbl_sat_sdk-*.whl ../../lib/
171+
cd ../..
172+
```
173+
174+
Note: The wheel version in `setup.py` should already reflect the current version. Only bump
175+
the patch version (e.g. 0.1.41 → 0.1.42) if explicitly asked.
176+
177+
---
178+
179+
## Step 7 — Run validations (always)
180+
181+
```bash
182+
# CSV uniqueness — must pass before committing
183+
python3 - <<'EOF'
184+
import csv, sys
185+
ids, check_ids = {}, {}
186+
errors = []
187+
with open("configs/security_best_practices.csv") as f:
188+
for i, row in enumerate(csv.DictReader(f), start=2):
189+
if row["id"] in ids:
190+
errors.append(f" Duplicate id={row['id']} on rows {ids[row['id']]} and {i}")
191+
else:
192+
ids[row["id"]] = i
193+
if row["check_id"] in check_ids:
194+
errors.append(f" Duplicate check_id={row['check_id']} on rows {check_ids[row['check_id']]} and {i}")
195+
else:
196+
check_ids[row["check_id"]] = i
197+
if errors:
198+
print("VIOLATIONS — fix before committing:"); print("\n".join(errors)); sys.exit(1)
199+
else:
200+
print(f"OK — {len(ids)} rows, all unique")
201+
EOF
202+
203+
# Codespell — report pre-existing typos separately from new ones; do not commit if new ones found
204+
codespell configs/security_best_practices.csv configs/sat_dasf_mapping.csv \
205+
notebooks/Includes/workspace_settings.py notebooks/Includes/workspace_analysis.py \
206+
src/securityanalysistoolproject/clientpkgs/ws_settings_client.py 2>/dev/null || true
207+
```
208+
209+
---
210+
211+
## Step 8 — Summarise
212+
213+
Report what was done in this format:
214+
215+
**Files changed:**
216+
- List each file and the specific change (e.g. "added id=113 row", "added enableProjectsAllowList to ws_keymap")
217+
218+
**Wheel rebuilt:** yes/no
219+
220+
**Validations:** CSV OK (N rows, all unique) | codespell: N findings (pre-existing / new)
221+
222+
Ask the user if they want to commit before running `git commit`.

.gitignore

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@
1111
/configs/security_best_practices_user.csv
1212
.databricks/
1313
.vscode/settings.json
14-
CLAUDE.md
1514

1615
# Byte-compiled / optimized / DLL files
1716
__pycache__/
@@ -194,5 +193,4 @@ notebooks/.ipynb_checkpoints/
194193
**/terraform.tfstate*
195194

196195
# Project documentation and planning files
197-
CLAUDE.md
198196
docs/*Implementation_Plan.html

0 commit comments

Comments
 (0)