Skip to content

Commit aa02b32

Browse files
marco-ferraro-dbarunpamulapatimwitvlietshreelshah12dleiva04
authored
Doc release/0.6.0 (#301)
* feat(sdk): add job_runs_client and bump version to 0.1.38 Add JobRunsClient module for fetching job runs via the Databricks API. Update SDK version from 0.0.124 to 0.1.38 to match published wheel. * feat: add accounts_console support for staging, gov cloud, and DoD environments Add configurable accounts_console URL in initialize.py for non-standard Databricks environments (staging, GovCloud, DoD). When provided, dbclient.py uses it directly for account API calls, and the workspace config notebook derives workspace URLs from it. Supports: - Staging: accounts.staging.cloud.databricks.com - GovCloud: accounts.cloud.databricks.us - DoD: accounts-dod.cloud.databricks.mil * fix: use venv and auto-detect Python 3.9+ in install.sh - Auto-detect Python 3.9+ from available interpreters - Create virtual environment to avoid PEP 668 restrictions on newer Python - Fix shebang from #/bin/bash to #!/bin/bash - Use consistent Python command throughout the script * security: mark client_secret as sensitive in terraform variables Add sensitive = true to client_secret variable in AWS, Azure, and GCP terraform configurations to prevent secrets from being displayed in plan/apply output and logs per Terraform best practices. * fix(secrets): fix column name and add TruffleHog download error handling Fix two bugs in TruffleHog secret scanning: 1. Fix incorrect column name in get_current_run_id() - changed from 'run_time' to 'check_time' to match run_number_table schema 2. Add comprehensive error handling for TruffleHog download failures with clear user guidance to allowlist required domains The column name mismatch caused the function to fall back to timestamp-based run IDs. Download failures now provide actionable messages directing users to contact IT/Security teams. * fix(secrets): fix IDENTITY column insert error in get_current_run_id The runID column in run_number_table is GENERATED ALWAYS AS IDENTITY, which means it auto-increments and cannot accept explicit values. Changed get_current_run_id() to: - Only insert check_time (not runID) - Let database auto-generate runID via IDENTITY column - Retrieve generated runID by querying max value This matches the pattern used in insertNewBatchRun() from common.py and fixes the error: "DELTA_IDENTITY_COLUMNS_EXPLICIT_INSERT_NOT_SUPPORTED" * Update 1. list_account_workspaces_to_conf_file.py Removed staging URL examples * Update initialize.py removed staging references * feat(secrets): rename table and add clusters secret scanning schema (Stages 1-2) Stage 1: Rename secret_scan_results to notebooks_secret_scan_results - Renamed function in common.py: create_notebooks_secret_scan_results_table() - Updated 5 references in trufflehog_scan.py (function calls, INSERTs, SELECTs) - Updated 8 references in SAT_Dashboard_definition.json (all 4 datasets) - Updated documentation in usage.mdx Stage 2: Add clusters_secret_scan_results table - Created create_clusters_secret_scan_results_table() in common.py - Added initialization call in initialize.py - Schema includes: cluster_id, cluster_name, config_field, config_key - Same partitioning strategy (scan_date) as notebooks table This lays the foundation for scanning spark_env_vars in cluster configurations. Next: Implement cluster scanning logic and orchestration. * feat(secrets): implement cluster secrets scanning and orchestration (Stages 3-4) Stage 3: Create cluster_secrets_scan.py (650+ lines) - Full cluster configuration secret scanning implementation - Scans spark_env_vars using TruffleHog dual-scan approach - Pattern based on trufflehog_scan.py for consistency - Functions: get_all_clusters(), extract_spark_env_vars(), serialize_env_vars_to_file() - TruffleHog scanning: scan_cluster_config_for_secrets(), process_trufflehog_output() - Database: insert_cluster_secret_scan_results(), insert_no_secrets_tracking_row() - Main workflow: main_cluster_scanning_workflow() - Stores results in clusters_secret_scan_results table with cluster_id, config_field, config_key Stage 4: Orchestrator integration in security_analysis_secrets_scanner.py - Added generate_shared_run_id(): Creates shared run_id for both scans - Added processClusterScan(): Orchestrates cluster scanning per workspace - Modified processTruffleHogScan(): Now accepts run_id parameter - Modified runTruffleHogScanForAllWorkspaces(): Runs both notebook + cluster scans * Generates shared run_id per workspace * Calls both scans sequentially * Graceful error handling (one scan failure doesn't block the other) Both notebook and cluster scans now share run_id for correlation. Next: Update dashboard with UNION queries to show both sources. * feat(dashboard): add cluster secret scanning to dashboard queries Update all 4 secret scanner datasets to combine notebook and cluster findings: - Secret Scanner Metadata: Add clusters_with_secrets metric - Secret Scanner Workspaces: UNION both tables for workspace listing - Secret Scanner Details By Workspace: Add source_type, config_field, config_key columns - Secret Scanner Metadata By Workspace: Add clusters_with_secrets metric All queries now use UNION ALL to combine: - notebooks_secret_scan_results (notebooks) - clusters_secret_scan_results (clusters with spark_env_vars) Shared run_id enables correlation between notebook and cluster scans. Dashboard now provides unified view of all secrets across both sources. * fix(dashboard): resolve column mismatch in Secret Scanner Metadata UNION query Fixed NUM_COLUMNS_MISMATCH error by explicitly selecting columns instead of using s.* in combined_results CTE. Both notebooks and clusters now select same 9 columns (source_type, object_id, workspace_id, detector_name, secret_sha256, verified, secrets_found, run_id, scan_time) for proper UNION. * Edit to help claude with the setup * fix(dashboard): update table widget to use unified column schema Update "Secret Scanner Details By Workspace" table widget to support both notebooks and clusters: - Rename notebook_name → object_name, notebook_path → object_path - Add new columns: source_type, config_field, config_key - Reorder columns: source_type(0), object_name(1), object_path(2), config_field(3), config_key(4), detection_type(5), secret_hash(6), scan_time_est(7) Table now displays both notebook and cluster secret findings in unified view. * refactor(cluster-scan): align setup with trufflehog_scan pattern Standardize cluster_secrets_scan.py setup to match trufflehog_scan.py: - Add Step 1: Install Dependencies and Setup TruffleHog (%sh cell) - Remove duplicate imports and logging configuration - Remove duplicate db_client initialization - Use existing loggr from common setup instead of creating new logger - Update comments to reflect TruffleHog installation in Step 1 - Streamline Step 2: Configuration and Authentication - Keep Step 3: Configuration Setup with Config class Both scanners now follow identical setup pattern for consistency. * fix(cluster-scan): make TruffleHog installation idempotent Add check for existing TruffleHog binary before attempting installation. Since the orchestrator runs notebook scanner first (which installs TruffleHog), cluster scanner can reuse the existing binary instead of reinstalling. Changes: - Check if /tmp/trufflehog exists before installation - If exists, skip installation and reuse existing binary - If not exists, proceed with normal installation - Add success message after setup verification This prevents installation conflicts when cluster scanner runs after notebook scanner. * debug(cluster-scan): add comprehensive logging to diagnose missing env vars Add detailed logging to understand why clusters with spark_env_vars are being skipped: 1. Log available keys in cluster config for debugging 2. Check alternative field names (spark_env_variables, environment_variables, env_vars) 3. Print visible status for each cluster processed: - ⚠️ Failed to get config - ⏭️ No environment variables found (with reason) - ✅ Found N environment variables 4. Show first cluster config keys structure for debugging This will help identify: - Correct field name for environment variables in API response - Which clusters are being skipped and why - If specific clusters (like "Arun's Personal Compute Cluster") are processed * fix(sdk): return full cluster config from get_cluster_info Critical bug fix: get_cluster_info() was returning empty list instead of full config. Before: - Called .get('satelements', []) on API response - Always returned empty list [] - spark_env_vars and other config fields were never available After: - Returns full API response from /clusters/get - Includes all fields: spark_env_vars, spark_conf, custom_tags, etc. - Enables cluster secret scanning to access environment variables This fixes the issue where cluster scanner found 62 clusters but scanned 0. All clusters appeared to have no spark_env_vars because the API response was being discarded. * chore(sdk): bump version to 0.0.125 and rebuild wheel Build new wheel with clusters_client.get_cluster_info() fix. Changes: - Bump version from 0.0.124 to 0.0.125 - Rebuild wheel with fixed get_cluster_info() method - Add wheel to notebooks/Includes/ for deployment This wheel contains the critical fix that enables cluster secret scanning by returning the full cluster configuration including spark_env_vars. * fix(sdk): update install script to use SDK version 0.0.125 Update install_sat_sdk to install the new SDK version 0.0.125 which contains the critical get_cluster_info() fix for cluster secret scanning. Changes: - Update SDK_VERSION from 0.1.38 to 0.0.125 - Install from local wheel file in notebooks/Includes/ - Use --find-links to locate the wheel file This ensures cluster scanner can access full cluster configurations including spark_env_vars field. * fix(cluster-scan): use direct API calls instead of SDK Replace ClustersClient SDK calls with direct API calls to avoid SDK bugs. Changes: - Remove dependency on ClustersClient from clientpkgs - Use db_client.get() for direct API calls (same pattern as notebook scanner) - get_all_clusters(): Direct call to /clusters/list endpoint - get_cluster_config(): Direct call to /clusters/get endpoint - Returns full cluster configuration including spark_env_vars This bypasses the broken SDK completely and makes the cluster scanner independent of SDK wheel updates. Uses same reliable pattern as trufflehog_scan.py for notebook retrieval. * debug(cluster-scan): enhance logging to diagnose spark_env_vars detection Add comprehensive INFO-level logging to understand why clusters aren't being scanned: Changes: - Show config keys for first 3 clusters (not just first) - Check and log spark_env_vars field presence for EVERY cluster - Log spark_env_vars value type when field exists - Special debug output for test cluster (containing "Arun" or "Personal") - Show full spark_env_vars value and keys for test cluster - Change logger.debug to logger.info for env var extraction This will help diagnose: - Whether spark_env_vars field exists in cluster configs - What type spark_env_vars is (dict, None, empty dict) - Why test cluster with env vars might not be detected - If API response structure is different than expected * Pointing to correct SDK * fix(cluster-scan): preserve debug info in notebook exit output The dbutils.notebook.exit() at the end was clearing all debug output. Now debug information is captured in the return dict and survives the exit. Debug info includes: - sample_cluster_configs: Config keys from first 3 clusters - test_cluster_found: Whether test cluster (Arun/Personal) was found - test_cluster_has_spark_env_vars: Whether test cluster has the field - test_cluster_spark_env_vars_value: First 200 chars of the value - test_cluster_config_keys: All config keys for test cluster - clusters_with_spark_env_vars_field: Count of clusters with the field This allows us to diagnose why clusters aren't being scanned even after notebook.exit() clears the cell output. * fix(cluster-scan): extract cluster config from 'satelements' wrapper in API response Critical bug fix: db_client.get() returns response in format: { 'satelements': <actual_cluster_config>, 'http_status_code': 200 } Previously we were using the wrapper dict as the cluster config, which only had keys ['satelements', 'http_status_code'] instead of the actual cluster fields like 'cluster_id', 'spark_version', 'spark_env_vars', etc. Now we properly extract the cluster config from response['satelements']. This was discovered via debug output showing: "Cluster #1 config keys: ['satelements', 'http_status_code']" This fix will allow spark_env_vars to be detected and scanned. * fix(cluster-scan): handle satelements as list with single item The db_client.get() response format for /clusters/get is: { 'satelements': [<cluster_config>], # List with ONE element 'http_status_code': 200 } Previous fix assumed satelements was a dict, but it's actually a list containing a single dict element. Now we: 1. Check if satelements is a list 2. Extract the first (and only) element: satelements[0] 3. Return that as the cluster config This fixes the error: 'list' object has no attribute 'keys' * fix(cluster-scan): move variable extraction before try block Fixed: local variable 'cluster_id' referenced before assignment The error occurred because cluster_id and other variables were defined inside the try block. If an exception occurred during table creation (before variable assignment), the except block would try to use cluster_id which hadn't been defined yet. Solution: Move variable extraction to the very beginning of the function, before the try block. Now these variables are always defined, even if an exception occurs early. * feat(cluster-scan): add detailed summary output without clearing display Replaced dbutils.notebook.exit() with comprehensive summary display to preserve all scan output and debug information. Changes: - Remove notebook.exit() call that was clearing all output - Add detailed final summary with statistics - Include debug information in output - Add recommendations based on scan results - Follow same pattern as trufflehog_scan.py (no exit at end) The orchestrator doesn't use the return value anyway, so this allows users to see the full scan progress and results without having output cleared by exit(). Summary displays: - Total clusters found and scanned - Secrets detected count - Debug info (clusters with spark_env_vars, test cluster status) - Recommended actions based on findings - Best practices for secure configurations * feat(cluster-scan): add cleanup and file inspection section Add cleanup section matching trufflehog_scan.py pattern to show: - Temporary cluster config files in /tmp/clusters/ - TruffleHog binary and config files - Cleanup information and file lifecycle This provides visibility into what files were created during the scan and confirms they will be cleaned up when cluster terminates. * debug(cluster-scan): add comprehensive logging for database insertion Add detailed INFO-level logging to diagnose why secrets aren't being inserted into clusters_secret_scan_results table: Changes: - Add logging before/after insert_cluster_secret_scan_results call - Add try/except around insertion with detailed error logging - Add logging inside insert function for each step: - Table creation - Processing each secret - SQL statement execution - Log full traceback on insertion errors - Show progress indicators in output This will help identify: - Whether table creation succeeds - Which secrets are being processed - Whether SQL execution succeeds - What specific error occurs during insertion * refactor: rename trufflehog_scan.py to notebook_secret_scan.py and add idempotent TruffleHog installation Rename: - trufflehog_scan.py → notebook_secret_scan.py for clarity Changes to notebook_secret_scan.py: - Add idempotent check for TruffleHog installation - Skip installation if /tmp/trufflehog already exists - Prevents installation conflicts when running multiple scanners Changes to cluster_secrets_scan.py: - Add directory creation verification with write test - Ensure /tmp/clusters/ is writable before scanning starts - Add defensive directory creation in serialize_env_vars_to_file() - Update comment to reference notebook_secret_scan.py Changes to orchestrator: - Update notebook path from trufflehog_scan to notebook_secret_scan This ensures both scanners can run in sequence without conflicts. * fix(notebook-scanner): respect run_id from orchestrator and remove check=True bug CRITICAL BUG FIX: Notebook scanner was ignoring run_id passed from orchestrator, breaking correlation with cluster scanner. Changes: 1. Respect run_id from orchestrator - Check json_.get("run_id") first - Use passed run_id if provided (shared with cluster scan) - Fall back to generate_run_id() for standalone execution - Now matches cluster_secrets_scan.py pattern 2. Fix subprocess.run() check=True bug - Remove check=True parameter from TruffleHog scans - Was incorrectly raising exceptions on non-zero exit codes - Comment said "Don't raise exception" but check=True does the opposite - Now properly handles non-zero exit codes Before: - Notebook scanner always generated its own run_id - Notebook and cluster scans had different run_ids - No correlation possible between findings After: - Both scanners use shared run_id from orchestrator - Enables correlation of notebook + cluster findings in same run - Maintains backward compatibility for standalone execution This fixes the fundamental issue preventing cross-scanner correlation. * refactor(cluster-scan): remove debug code and personal cluster references Clean up cluster_secrets_scan.py for production: Removed: - Test cluster detection (Arun/Personal cluster references) - debug_info structure and all related tracking - Sample cluster config logging (first 3 clusters) - Excessive database insertion logging - Verbose print statements for each step Simplified: - Logging for clusters without env vars (debug level only) - Database insertion (removed per-secret logging) - Progress messages (cleaner output) - Final summary (removed debug info section) Result: - 90 lines removed - Production-ready code - Essential logging preserved - No personal/test-specific code Ready for final test and PR. * fix(cluster-scan): remove incorrect import statement for Databricks notebook environment Fix import error: No module named 'common' In Databricks notebooks, functions from files loaded via %run are available directly in the namespace, not as importable modules. Changed: - Remove: from common import create_clusters_secret_scan_results_table - Direct call: create_clusters_secret_scan_results_table() This matches the pattern used in notebook_secret_scan.py and fixes the database insertion error. * security: fix SQL injection vulnerability in secret scanners CRITICAL SECURITY FIX: Escape single quotes in all SQL string values to prevent SQL injection attacks. Issue: - Cluster/notebook names with apostrophes (e.g., "Arun Pamulapati's Cluster") were breaking SQL queries - SQL syntax error: VALUES ('...', 'Arun Pamulapati's...', ...) would terminate string at first apostrophe - Could be exploited for SQL injection if malicious names used Fix: - Escape all string values by replacing ' with '' (SQL standard) - Applied to both cluster_secrets_scan.py and notebook_secret_scan.py - Escaped values: * workspace_id, cluster_id, cluster_name, notebook_id, notebook_path, notebook_name * config_field, config_key, detector_name, secret_sha256, source_file Before: VALUES ('{workspace_id}', '{cluster_name}', ...) -> FAILS with names containing apostrophes After: workspace_id_escaped = workspace_id.replace("'", "''") VALUES ('{workspace_id_escaped}', '{cluster_name_escaped}', ...) -> SAFE: "Arun Pamulapati's Cluster" becomes "Arun Pamulapati''s Cluster" This prevents both SQL syntax errors and SQL injection attacks. * chore: code cleanup and formatting fixes - Fix whitespace indentation in workspace_bootstrap.py - Clean up comment formatting in clusters_client.py - Update SDK version in setup.py * chore: add CLAUDE.md to gitignore and update clusters_client docstring - Add CLAUDE.md to .gitignore to exclude from version control - Update get_cluster_info() docstring to reflect actual usage - Clean up trailing whitespace in comments * fix: add newline at end of clusters_client.py * revert: restore clusters_client.py to original main branch state * feat: add serverless egress control security checks (SFE-3862) Add 6 new Network Security checks to assess serverless compute network security posture and prevent data exfiltration risks: - NS-9 (Check 111): Serverless workspaces have network policies configured - NS-10 (Check 112): Network policies use restricted access mode - NS-11 (Check 113): Network policies are enforced (not dry-run) - NS-12 (Check 114): Network policies have destination allow-lists - NS-13 (Check 115): Serverless SQL warehouses have network policy coverage - INFO-18 (Check 116): Network policy default vs custom assignment tracking Changes: - configs/security_best_practices.csv: Added 6 new check definitions (rows 111-116) - notebooks/Includes/workspace_analysis.py: Implemented check logic for all 6 checks - docs/: Added comprehensive implementation plan documentation (HTML format) Implementation Details: - Leverages existing account_networkpolicies data collection - Cloud support: AWS Enterprise, Azure Premium (GCP not yet available) - Checks evaluate: access mode, enforcement mode, allow-lists, workspace coverage - Results written to security_checks table for dashboard visualization Severity Levels: - High (NS-9, NS-10, NS-13): Critical security controls - Medium (NS-11, NS-12): Important configuration validation - Low (INFO-18): Informational tracking References: - Databricks Serverless Egress Controls documentation - Network Policies REST API: GET /accounts/{account_id}/network-policies * fix: enable GCP support for serverless egress control checks Updated all 6 serverless egress control checks to support GCP: - Changed GCP flag from 0 to 1 for checks 111-116 - Added GCP documentation URLs for all checks - GCP support: Enterprise tier (Public Preview) GCP serverless egress control is available in Public Preview on Enterprise tier, supporting the same features as AWS and Azure: - Network policy management at account level - Restricted/Full access modes - Enforcement/Dry-run modes - FQDN and storage destination allow-lists Documentation: - https://docs.databricks.com/gcp/en/security/network/serverless-network-security/network-policies - https://docs.databricks.com/gcp/en/security/network/serverless-network-security/manage-network-policies Updated checks: - NS-9 (111): Serverless workspaces have network policies - NS-10 (112): Network policies use restricted access mode - NS-11 (113): Network policies are enforced - NS-12 (114): Network policies have destination allow-lists - NS-13 (115): Serverless SQL warehouses have policy coverage - INFO-18 (116): Network policy assignment tracking * fix: resolve schema mismatches in serverless egress control checks (SFE-3862) This commit fixes all 6 serverless egress control checks (NS-9 through INFO-18, check IDs 111-116) that were failing due to schema mismatches between expected flat fields and actual nested JSON structure from the API. Changes: - Added get_workspace_network_configuration() method to AccountsSettings SDK - Updated accounts_bootstrap.py to collect workspace network configs for all workspaces - Fixed NS-10 (112): Updated SQL to use egress.network_access.restriction_mode - Fixed NS-11 (113): Updated SQL to use egress.network_access.policy_enforcement.enforcement_mode - Fixed NS-12 (114): Updated SQL for nested fields, set allowed_fqdns to NULL - Fixed NS-9 (111): Changed table name to dbsql_workspaceconfig_* and JOIN on workspace_network_config - Fixed NS-13 (115): Changed table name to dbsql_warehouselistv2_* and JOIN on workspace_network_config - Fixed INFO-18 (116): Use network_policy_id as name, detect default by ID pattern - Bumped SDK version from 0.0.124 to 0.0.125 Root cause: API returns nested egress.network_access.* structure, but queries expected flat fields. Also, workspace-to-policy mappings needed separate API call. Testing: All SQL queries validated against confirmed table schema from DESCRIBE TABLE. * fix: add fallback logic for NS-9 and NS-13 when workspace network config tables missing Added table existence checks for workspace_network_config tables in NS-9 and NS-13. If the tables don't exist (because accounts_bootstrap hasn't run with new SDK yet), the checks fall back to a simpler approach: - Check if account has ANY network policies configured - If serverless is enabled but NO policies exist at account level -> FAIL This prevents TABLE_OR_VIEW_NOT_FOUND errors while still providing useful checks. Once workspace_network_config tables are collected, checks will automatically use the more accurate workspace-specific policy assignments. Fallback logic: - NS-9: Checks serverless workspaces against account-level policies - NS-13: Checks serverless SQL warehouses against account-level policies * fix: correct API endpoint for workspace network configuration Fixed the get_workspace_network_configuration() method to use the correct API endpoint path as documented in the Workspace Network Option API. Changed: - From: /accounts/{account_id}/workspace-network-configuration/{workspace_id} - To: /accounts/{account_id}/workspaces/{workspace_id}/network This fixes the 503 TEMPORARILY_UNAVAILABLE errors when collecting workspace network configurations during accounts_bootstrap. Response format: {"network_policy_id": "...", "workspace_id": ...} Bumped SDK version from 0.0.125 to 0.0.126 * fix: wrap workspace network config response in list for bootstrap compatibility The get_workspace_network_configuration() method now wraps the single config object in a list to match the expected format for the bootstrap() function. API returns: {"network_policy_id": "...", "workspace_id": ...} Method returns: [{"network_policy_id": "...", "workspace_id": ...}] This fixes the "DELTA_EMPTY_DATA: Data doesn't have any columns" error when creating workspace_network_config tables during accounts_bootstrap. The bootstrap() function expects a list of items (line 26 iterates with "for ifld in lst"), so single objects must be wrapped in a list. Bumped SDK version from 0.0.126 to 0.0.127 * fix: access network_policy_id from satelements array in NS-9 and NS-13 The workspace network configuration API returns data in a satelements array structure. Updated SQL queries to access satelements[0].network_policy_id instead of flat network_policy_id field. Fixes field not found error in NS-9 and NS-13 checks. * fix: rename INFO-18 to INFO-19 to avoid conflict with existing check Check 116 (Network policy default vs custom assignment) was using INFO-18, which conflicts with check 62 (Delta Sharing permissions). Renamed to INFO-19 to resolve the conflict. * refactor: replace 6 educational checks with 1 comprehensive network policy check (SFE-3862) Replace NS-9 through INFO-19 (checks 111-116) with single comprehensive check that follows decision tree: 1. Check workspace has policy assigned 2. Validate policy uses RESTRICTED_ACCESS mode 3. Validate policy is ENFORCED or selective DRY_RUN 4. Flag violations with specific types for actionable remediation Per team feedback, educational checks replaced with actionable security assessment. Changes: - configs/security_best_practices.csv: Removed 6 checks (111-116), added 1 comprehensive check (111) - notebooks/Includes/workspace_analysis.py: Replaced ~257 lines (6 checks) with ~178 lines (1 check) * fix: handle missing dry_run_mode_product_filter and use insertIntoControlTable (SFE-3862) Two critical fixes for comprehensive network policy check: 1. SQL Error Fix: Split policy details query to avoid accessing dry_run_mode_product_filter field that doesn't exist in ENFORCED policies. Now fetches this field only when enforcement_mode is DRY_RUN. 2. Function Call Fix: Replace undefined wrapperUpdates() with insertIntoControlTable() which is the correct function for writing check results to security_checks table. Errors fixed: - [FIELD_NOT_FOUND] No such struct field dry_run_mode_product_filter - name 'wrapperUpdates' is not defined * refactor: use sqlctrl framework pattern for network policy check (SFE-3862) Refactor comprehensive network policy check to follow standard framework pattern used throughout workspace_analysis.py: 1. Function now takes DataFrame as input (not manual parameters) 2. Use sqlctrl(workspace_id, sql, function) pattern like all other checks 3. SQL query joins workspace_network_config + account_networkpolicies tables 4. CASE statement conditionally fetches dry_run_mode_product_filter only for DRY_RUN policies 5. Function evaluates DataFrame and returns (check_id, score, additional_details) Benefits: - Consistent with existing codebase patterns (see enhanced_security_monitoring) - Cleaner code: ~110 lines vs ~187 lines - Better error handling via sqlctrl's AnalysisException catching - Automatic insertIntoControlTable via sqlctrl framework * fix: update schema to match actual account_networkpolicies structure (SFE-3862) Update comprehensive network policy check to match actual schema which does not include dry_run_mode_product_filter field. Simplify decision logic: Schema fields available: - egress.network_access.restriction_mode (FULL_ACCESS | RESTRICTED_ACCESS) - egress.network_access.policy_enforcement.enforcement_mode (ENFORCED | DRY_RUN) - egress.network_access.allowed_storage_destinations (array) Simplified decision tree: 1. No policy assigned → VIOLATION (NO_POLICY_ASSIGNED) 2. FULL_ACCESS mode → VIOLATION (FULL_ACCESS_MODE) 3. RESTRICTED_ACCESS + ENFORCED → PASS 4. RESTRICTED_ACCESS + DRY_RUN → VIOLATION (DRY_RUN_MODE) Removed non-existent dry_run_mode_product_filter field from SQL query and logic. * feat: capture dry_run_mode_product_filter with explicit schema (SFE-3862) Fix schema inference issue preventing dry_run_mode_product_filter from being captured. Spark's automatic schema inference was missing this field because it's absent in ENFORCED policies. Use explicit schema definition to guarantee all fields are captured. Changes: 1. notebooks/Utils/accounts_bootstrap.py: - Replace generic bootstrap() with explicit schema for network policies - Define complete schema including dry_run_mode_product_filter array field 2. notebooks/Includes/workspace_analysis.py: - Update SQL query to select dry_run_mode_product_filter field - Restore DRY_RUN logic to check empty vs non-empty product filter - Update docstring to reflect complete decision tree Expected behavior after fix: - Dryrun-some (non-empty filter) → PASS (selective dry-run acceptable) - DryrunAll (empty filter) → FAIL (all products in dry-run) - NoEgress (ENFORCED + RESTRICTED) → PASS - default-policy (FULL_ACCESS) → FAIL Implements decision tree: 1. Has policy assigned? → NO = VIOLATION 2. FULL_ACCESS mode? → YES = VIOLATION 3. ENFORCED? → YES = PASS 4. DRY_RUN with empty filter? → YES = VIOLATION 5. DRY_RUN with non-empty filter? → YES = PASS * fix: escape single quotes in JSON data for SQL INSERT statements (SFE-3862) - Add single quote escaping in insertIntoControlTable() and insertIntoInfoTable() - Fixes SQL syntax error when policy names contain single quotes (e.g., 'default-policy') - Applies to all checks, particularly NS-9 network policy violations - Uses SQL standard escaping: ' -> '' - Preserves data integrity while preventing SQL injection issues * docs: add guidelines for communicating git status and commit info - Add new 'Communicating Changes' section to CLAUDE.md - Require Claude to explicitly state git status when summarizing changes - Include commit hash, branch name, and files changed in summaries - Provides example format for consistency * chore: remove CLAUDE.md and implementation plan HTML from version control - Remove CLAUDE.md from git tracking (local development file) - Remove docs/SFE-3862_Serverless_Egress_Control_Implementation_Plan.html - Add both to .gitignore to prevent future accidental commits - Files remain locally but won't be tracked in git * revert: remove terraform sensitive variable changes from branch - Restore terraform/aws/variables.tf to main branch state - Restore terraform/azure/variables.tf to main branch state - Restore terraform/gcp/variables.tf to main branch state - Removes 'sensitive = true' from client_secret variables - These changes came from release/0.6.0 merge but should not be in this feature branch * build: add SAT SDK v0.0.127 wheel file for distribution - Add dbl_sat_sdk-0.0.127-py3-none-any.whl to version control - Supports deployment and testing of serverless egress control checks - Force-added despite dist/ being in .gitignore for distribution purposes * fix: restore terraform variables to release/0.6.0 state - Restore terraform/aws/variables.tf with sensitive = true - Restore terraform/azure/variables.tf with sensitive = true - Restore terraform/gcp/variables.tf with sensitive = true - Matches release/0.6.0 branch configuration * fix: remove redundant enum value variants in network policy checks - Remove 'FULLACCESSMODE' from restriction_mode check (line 1565) - Remove 'DRYRUN' from enforcement_mode check (line 1580) - API only returns 'FULL_ACCESS' and 'DRY_RUN' per Databricks SDK - Aligns with official Databricks SDK enum definitions * Fixed the import issue which was re-organized and now core package i not available * feat: add account-level disable legacy features security check (GOV-37) Implements check ID 111 to validate that the disable_legacy_features account setting is enabled. This ensures new workspaces cannot access DBFS root, Hive Metastore, no-isolation shared clusters, or pre-13.3 LTS runtimes. Changes: - Add get_disablelegacyfeatures() method to AccountsSettings client - Add bootstrap call to create account_disable_legacy_features table - Implement security check logic in workspace_analysis.py - Add check configuration to security_best_practices.csv (severity: High) - Bump SDK version to 0.0.125 The check passes when disable_legacy_features.value == true and fails when disabled, missing, or API returns an error. * fix: remove commas from check 111 recommendation to prevent CSV parsing error The recommendation field contained commas which were being interpreted as CSV delimiters, causing column misalignment and IntegerType parsing errors. Simplified recommendation text from: "...prevent new workspaces from accessing DBFS root, Hive Metastore, no-isolation shared clusters, and pre-13.3 LTS runtimes" To: "...prevent new workspaces from accessing legacy ungoverned features" Detailed feature list is still available in the documentation URLs. * fix: restore detailed recommendation for check 111 with proper CSV quoting Replace simplified recommendation text with the original detailed version, properly quoted to handle commas within the CSV field. The recommendation now explicitly lists the legacy features that will be disabled: DBFS root, Hive Metastore, no-isolation shared clusters, and pre-13.3 LTS runtimes. This provides better clarity for security practitioners while maintaining correct CSV parsing with pandas. * fix: extract satelements in get_disablelegacyfeatures() to match CSP/ESM pattern Fixes the "cannot find field 'disable_legacy_features' in SQL" error by making the disable legacy features method consistent with other account settings methods. Changes: - Update get_disablelegacyfeatures() to extract satelements array like CSP/ESM - This creates a clean table schema: disable_legacy_features, etag, setting_name - Simplify workspace_analysis.py check logic to use straightforward SQL query - Bump SDK version to 0.0.126 After deploying this SDK and re-running accounts_bootstrap, the table schema will be flat instead of nested, making the security check work correctly. Root cause: SatDBClient.get() wraps account responses in satelements array, but the previous implementation returned the full wrapped response instead of extracting the array contents. * feat: integrate BrickHound permissions analysis into SAT Integrates BrickHound graph-based permissions analysis tool into SAT as a complementary security capability. BrickHound provides interactive permissions analysis, privilege escalation detection, and compliance reporting. Key Changes: - Add BrickHound notebooks (7 analysis notebooks + SDK installer) - Add BrickHound Python SDK (separate from SAT SDK to avoid conflicts) - Add web application (Gradio UI for interactive analysis) - Add Terraform job definition (weekly data collection schedule) - Update documentation with integration guides Integration Architecture: - Credentials: Reuses SAT's sat_scope (no separate secrets needed) - Storage: Same Unity Catalog schema as SAT (analysis_schema_name) - Tables: Namespaced with brickhound_ prefix (vertices, edges, metadata) - Schedule: Weekly (Sunday 2 AM) vs SAT's Mon/Wed/Fri - SDK: Kept separate to avoid dependency conflicts Files Added: - notebooks/brickhound/ (8 notebooks including install_brickhound_sdk.py) - src/brickhound/ (Complete Python SDK) - app/brickhound/ (Web UI and configuration) - terraform/common/brickhound_job.tf (Job definition) - docs/BRICKHOUND_INTEGRATION.md (Integration guide) - docs/brickhound_*.md (Reference documentation) Files Modified: - README.md: Added BrickHound section - CLAUDE.md: Added integration details and architecture Benefits: - Unified credential management via sat_scope - Complementary analysis (SAT=config, BrickHound=permissions) - Single deployment with SAT - Shared Unity Catalog schema * refactor: reorganize BrickHound notebook structure and improve project organization Reorganized BrickHound notebooks to follow SAT naming conventions and improved overall project structure based on comprehensive analysis. Notebook Changes: - Moved: notebooks/brickhound/01_data_collection.py → notebooks/permission_analysis_data_collection.py - Follows SAT pattern (same level as security_analysis_driver.py) - Renamed to match SAT convention: <tool>_<function>_<descriptor>.py - Renumbered: BrickHound analysis notebooks (02-05 → 01-04) - Removed numbering gap for cleaner sequence - 01_principal_resource_analysis.py - 02_escalation_paths.py - 03_impersonation_analysis.py - 04_advanced_reports.py Documentation Updates: - Updated all notebook references in docs and READMEs - Updated Terraform job definition with new path - Added comprehensive .gitignore - Added PROJECT_STRUCTURE_IMPROVEMENTS.md with analysis and recommendations - Added databricks_mcp_tools.md reference Benefits: - Consistent notebook naming between SAT and BrickHound - Clearer project organization - Sequential numbering without gaps - Comprehensive improvement roadmap for future development Files Modified: - 5 notebook renames (Git tracking as R) - 10 documentation/config updates - 1 new comprehensive .gitignore - 2 new documentation files * Fixed the path for config * fix: handle backticks in catalog/schema names for comparison checks - Strip backticks only for display and existence checks - Keep backticks in CATALOG/SCHEMA variables for SQL queries - Follows SAT pattern of using backticks to handle special chars * fix: use COLLECTION_METADATA_TABLE variable in all table references - Replace hardcoded table references in 00_analysis_common.py - Update flask_app.py to use brickhound_ prefix for table names - Ensures consistent use of namespaced table names across all components * fix: update BrickHound SDK installation path in 00_analysis_common - Change from hardcoded /Workspace/Applications/BrickHound - Use relative path ../../src/brickhound from notebooks/brickhound/ - This ensures SecurityAnalyzer loads correctly in SAT integration * fix: update file references and error messages across notebooks - Update error message in 00_analysis_common to reference correct SDK path - Fix notebook prerequisites to reference /notebooks/permission_analysis_data_collection.py - Remove outdated references to 01_data_collection.py (renamed file) - Ensure all documentation points to correct paths in SAT integration * fix: make README.md optional in brickhound setup.py - setup.py was failing when README.md doesn't exist - Make README.md read optional with fallback description - Fixes pip install -e failure in notebooks * fix: include principal_name and principal_email in standalone function output - get_resource_access() standalone function now returns same columns as SecurityAnalyzer - Adds principal_name and principal_email alongside combined principal column - Fixes grouping query error when SecurityAnalyzer is not available * revert: restore original BrickHound standalone function logic This reverts changes from commit 9acdf1e. The standalone get_resource_access() function is intentionally simplified as a fallback when the SDK is not available. It returns fewer columns and less complete results than the SecurityAnalyzer implementation. The grouping queries in the notebook are designed to work with the SecurityAnalyzer path (which returns principal_name and principal_email), not the standalone function path. With the SDK now loading correctly (via fixes to installation path), the SecurityAnalyzer should always be used, and this standalone function serves only as a basic fallback. This aligns with the original BrickHound design where the three components (App, SDK, Notebooks) intentionally have different implementations with varying completeness. * fix: remove grouping query from standalone function path in Resource Access The standalone get_resource_access() function returns a combined 'principal' column, not separate 'principal_name' and 'principal_email' columns. The grouping query expecting those columns is only designed for the SecurityAnalyzer path. This restores the original BrickHound design: - SecurityAnalyzer path: Full grouping and deduplication - Standalone function path: Simplified results without grouping The else branch now simply displays the standalone function results as-is, matching the original BrickHound behavior. * feat: add comprehensive SDK loading diagnostics and status indicators - Enhanced SDK installation with path verification and verbose output - Added prominent SDK loading verification showing module location - Created dedicated SDK status verification cell (highly visible) - Added SDK status checks to all analysis notebooks (01, 02, 03, 04) - Removed --quiet flag from SDK pip install to show installation details - Show clear warnings when fallback mode is active - Document feature differences between SDK and fallback functions This ensures users can immediately tell if the SecurityAnalyzer SDK is loaded or if notebooks are using simplified fallback functions. Addresses issue where SDK loading status was not visible and users unknowingly used incomplete fallback analysis. Key improvements: - Phase 1: SDK installation diagnostics with path verification - Phase 2: Enhanced import verification with troubleshooting steps - Phase 3: Post-initialization verification cell (can't miss it) - Phase 5: Analysis notebooks show SDK status at startup Related to: Integration of BrickHound into SAT * fix: move MAGIC commands outside Python control flow IndentationError occurred because # MAGIC commands cannot be nested inside Python if/else blocks in Databricks notebooks. Restructured to: - Show path diagnostics first - Display warning if path doesn't exist - Run pip install commands unconditionally at end The pip install will fail gracefully if path doesn't exist, which is acceptable since we show the warning and have fallback functions. * fix: correct sys.path for SDK imports in Databricks Repos The editable install (pip install -e) doesn't work reliably in Databricks Repos due to how the file system is mounted. The fallback sys.path manipulation was adding the wrong directory. Changed from: sys.path.insert(0, os.path.dirname(os.getcwd())) # Adds: /Workspace/Repos/.../notebooks (wrong!) To: repo_root = os.path.dirname(os.path.dirname(os.getcwd())) src_path = os.path.join(repo_root, 'src') sys.path.insert(0, src_path) # Adds: /Workspace/Repos/.../src (correct!) This allows Python to find the brickhound package and its submodules (brickhound.graph.analyzer, etc.) even when editable install fails. The path calculation works from: cwd: /Workspace/Repos/.../notebooks/brickhound -> dirname: /Workspace/Repos/.../notebooks -> dirname: /Workspace/Repos/... (repo root) -> join with 'src': /Workspace/Repos/.../src * chore: remove redundant SDK installation notebook The install_brickhound_sdk.py notebook is no longer needed because: - 00_analysis_common.py now handles SDK installation with diagnostics - All analysis notebooks auto-run 00_analysis_common via %run - install_brickhound_sdk had outdated imports (GraphAnalyzer instead of SecurityAnalyzer) - Consolidating to single installation mechanism reduces confusion SDK installation now happens automatically when running any analysis notebook: - notebooks/brickhound/01_principal_resource_analysis.py - notebooks/brickhound/02_escalation_paths.py - notebooks/brickhound/03_impersonation_analysis.py - notebooks/brickhound/04_advanced_reports.py All notebooks run '%run ./00_analysis_common' which: - Installs SDK with path verification - Shows prominent SDK status indicators - Handles fallback sys.path configuration - Loads configuration and data snapshots * fix: use correct Databricks SDK Secrets API in Flask app The Flask app was incorrectly trying to use dbutils.secrets.get() which is only available in Databricks notebooks, not Flask applications. This caused the app to silently fall back to hardcoded defaults (main.security_analysis) instead of reading SAT configuration from sat_scope secrets. Changes: 1. Configuration reading (lines 9-79): - Changed from: client.dbutils.secrets.get(scope, key) - Changed to: client.secrets.get_secret(scope, key).value - Added proper logging with logger.info/warning/error - Added validation for secret format (handles backticks) - Three-tier fallback: env vars → SAT secrets → defaults 2. Warehouse ID reading (lines 104-115): - Same fix: workspace_client.secrets.get_secret() - Added error logging instead of silent exception catch 3. Added /api/config endpoint (lines 4623-4631): - New debug endpoint to show current configuration - Returns catalog, schema, and table names - Easier troubleshooting than checking logs Root cause: dbutils is NOT available in Databricks Apps/Flask context Correct API: WorkspaceClient.secrets.get_secret(scope, key) Testing: - App should now read catalog/schema from sat_scope - Logs should show: [CONFIG] Read from sat_scope: CATALOG=arunuc, SCHEMA=security_analysis - Visit /api/config to verify configuration - App should query correct tables: arunuc.security_analysis.brickhound_* Fixes: App defaulting to main.security_analysis instead of SAT configuration * fix: decode base64-encoded warehouse ID from Databricks SDK The Databricks Python SDK (WorkspaceClient.secrets.get_secret()) returns secret values as base64 encoded, unlike the notebook API (dbutils.secrets.get()) which returns plain text. Error before fix: [AUTH] Using SQL Warehouse from sat_scope: NzgyMjI4ZDc1YmY2M2U1Yw== Error: NzgyMjI4ZDc1YmY2M2U1Yw== is not a valid endpoint id After fix: [AUTH] Using SQL Warehouse from sat_scope: 782228d75bf63e5c Queries execute successfully Changes: - Added base64 decoding when reading sql-warehouse-id from sat_scope - Decode: base64.b64decode(secret_response.value).decode('utf-8') - Graceful fallback if decode fails (uses value as-is) - Added explanatory comments about SDK behavior Root cause: - SAT stores warehouse ID as plain text (dabs/sat/config.py:144) - SAT notebooks use dbutils.secrets.get() → returns plain text - Flask app uses WorkspaceClient.secrets.get_secret() → returns base64 encoded - This is documented Databricks SDK behavior (see secrets_client.py:26-38) Note: analysis_schema_name does NOT need decoding - it's working correctly. Fixes: Invalid endpoint id error when app reads warehouse ID from sat_scope * refactor: use environment variables for app configuration - Set BRICKHOUND_CATALOG, BRICKHOUND_SCHEMA, WAREHOUSE_ID in app.yaml - Remove base64 decoding logic (no longer needed with env vars) - Simplify configuration reading to just use os.getenv() - Values set to SAT configuration: arunuc.security_analysis, warehouse 782228d75bf63e5c Fixes issue where Databricks SDK returns base64-encoded secrets. Environment variables bypass this complexity entirely. * fix: move env variables to top level in app.yaml Environment variables must be at top level, not nested under config. This matches Databricks Apps documentation format. Also added debug logging to verify environment variable values. * fix: correct port mismatch in app.yaml App runs on port 8000 but health check was configured for 8501. This caused health check to fail and app to become unavailable. Changed: - PORT env variable: 8501 → 8000 - health_check port: 8501 → 8000 * feat(brickhound): change data collection schedule from weekly to daily - Updated schedule from '0 0 2 ? * Sun' (weekly) to '0 0 2 * * ?' (daily) - Daily collection provides fresher permissions data for analysis - Part of BrickHound DABS/Terraform integration plan * feat(dabs): add BrickHound configuration prompts to installer - Added BrickHound prompts to dabs/sat/config.py form(): * Enable BrickHound deployment (default: true) * Deploy web app (default: true if BrickHound enabled) * Collection schedule (default: daily at 2 AM ET) - Updated dabs/main.py install() to include BrickHound config - Config passed to DABS templates for conditional deployment Part of BrickHound DABS/Terraform integration plan * feat(dabs): add BrickHound job template and app deployment - Created brickhound_job.yml.tmpl for DABS job deployment * Conditional deployment based on enable_brickhound flag * Supports serverless and classic cluster modes * Configurable schedule (default: daily at 2 AM ET) * 3-worker cluster with 4-hour timeout - Updated setup.sh to copy app directory for deployment - App directory now included in DABS bundle for app deployment Part of BrickHound DABS/Terraform integration plan (Phase 1 complete) * feat(terraform): add BrickHound app deployment resource - Created terraform/common/brickhound_app.tf * Databricks App resource for Flask web application * Conditional deployment via deploy_brickhound_app variable * Auto-configured environment variables from Terraform * Catalog, schema, and warehouse ID templated * Small compute size, port 8000, /health endpoint * Outputs: app URL and app ID - Added deploy_brickhound_app variable (default: true) - App dependencies: Git repo and data collection job Part of BrickHound DABS/Terraform integration plan (Phase 2 complete) * feat(terraform): add automated permission grants for BrickHound app - Created terraform/common/brickhound_grants.tf * Conditional grants based on deploy_brickhound_app variable * Grants USE_SCHEMA permission to app service principal * Grants SELECT permission on all tables in schema * Alternative table-specific grants provided (commented) - Grants automatically applied when app is deployed - No manual GRANT statements needed Part of BrickHound DABS/Terraform integration plan (Phase 3 complete) * docs: add comprehensive BrickHound user guide - Architecture and data collection overview - Web app usage guide with examples - Detailed notebook usage for all 4 analysis notebooks - Result interpretation (permission sources, privilege levels, risk scoring) - Comprehensive troubleshooting section - Configuration reference (env vars, secrets, schedules) - Best practices for security operations - Security considerations and compliance mappings * fix(terraform): resolve BrickHound app deployment and Unity Catalog permissions This commit addresses the BrickHound app deployment process and fixes a critical Unity Catalog permissions issue that prevented the app from accessing data. Changes: 1. Databricks Provider Upgrade - Upgrade provider from 1.11.1 to ~> 1.88.0 (AWS, Azure, GCP) - Enables support for newer Databricks features 2. BrickHound App Terraform Resources - Remove databricks_app resource (incompatible schema) - Delete terraform/common/brickhound_app.tf - Delete terraform/common/brickhound_grants.tf - databricks_app resource only supports permissions config, not app deployment - Remove deploy_brickhound_app variable from common module 3. Secret Scope Configuration - Make secret_scope_name variable configurable with default "sat_scope" - Add examples to template.tfvars for secret_scope_name and sqlw_id - Allow customers to override default secret scope name 4. BrickHound App Deployment Documentation - Add comprehensive manual deployment instructions to all Terraform docs - Include both CLI and UI deployment methods - Add detailed UI configuration walkthrough with step-by-step instructions - Document required app resources, API scopes, and compute settings 5. Critical Unity Catalog Permissions Fix - Add USE CATALOG grant to documentation (REQUIRED) - Unity Catalog hierarchical permissions require: - USE CATALOG on catalog - USE SCHEMA on schema - SELECT on tables - Without USE CATALOG, app gets 0 rows even with SELECT permissions - This fixed issue where app showed no data despite successful collection Files Modified: - terraform/aws/provider.tf, terraform/azure/provider.tf, terraform/gcp/provider.tf - terraform/aws/variables.tf, terraform/azure/variables.tf, terraform/gcp/variables.tf - terraform/aws/template.tfvars, terraform/azure/template.tfvars, terraform/gcp/template.tfvars - terraform/common/variables.tf - terraform/aws/TERRAFORM_AWS.md - terraform/azure/TERRAFORM_Azure.md - terraform/gcp/TERRAFORM_GCP.md Files Deleted: - terraform/common/brickhound_app.tf - terraform/common/brickhound_grants.tf Tested on AWS with successful data collection (3,708 vertices, 5,354 edges) and verified app functionality after USE CATALOG grant applied. * fix(dabs): add BrickHound variables to template schema Add missing BrickHound-related variables to databricks_template_schema.json: - enable_brickhound: boolean to enable/disable BrickHound deployment - deploy_brickhound_app: boolean to deploy the web app - brickhound_schedule: cron expression for collection schedule This fixes the error: "variable 'enable_brickhound' not defined" The DABS installer was collecting these values from the user but the template schema didn't declare them, causing bundle init to fail when rendering brickhound_job.yml.tmpl. * refactor(dabs): improve Permissions Analysis user experience Remove "BrickHound" from all user-facing terminology and simplify installation prompts for better user experience. Changes: 1. Terminology Updates - Replace "BrickHound Permissions Analysis" with "Permissions Analysis" - Replace "BrickHound Web App" with "Permissions Analysis Web App" - Update job name from "BrickHound Permissions Analysis - Data Collection" to "SAT Permissions Analysis - Data Collection" - Update application tags from "SAT-BrickHound" to "SAT-PermissionsAnalysis" 2. Simplified Installation - Remove schedule configuration prompts during installation - Always use default schedule (Daily at 2 AM ET) - Reduces installer questions from 4 to 2 for Permissions Analysis 3. Files Updated - dabs/sat/config.py: Removed schedule prompts, updated message text - dabs/main.py: Simplified schedule handling, always use default - dabs/dabs_template/databricks_template_schema.json: Updated descriptions - dabs/dabs_template/template/tmp/resources/brickhound_job.yml.tmpl: Updated job name and tags Installer now shows: - "Deploy Permissions Analysis? (Y/n)" - "Deploy Permissions Analysis Web App? (Y/n)" Instead of: - "Deploy BrickHound Permissions Analysis? (Y/n)" - "Deploy BrickHound Web App? (Y/n)" - "Use default collection schedule (Daily at 2 AM ET)? (Y/n)" - "Enter custom cron schedule" * fix(dabs): fix notebooks not syncing + remove misleading app prompt This fixes DABS notebooks not being synced to workspace due to Databricks CLI breaking change in sync behavior between versions. Root Cause: - release/0.6.0 DABS worked (user confirmed) - Newer Databricks CLI changed default sync behavior - Old: synced all files automatically - New: requires explicit sync.include configuration - notebooks/ stopped syncing while configs/ and dashboards/ continued Changes: 1. Add Explicit Sync Configuration (databricks.yml.tmpl) - Add sync.include section to force notebooks sync - Include: notebooks/**/*.py, configs/**/*, dashboards/**/*, app/**/* - This ensures all directories sync with newer CLI versions 2. Update Notebook Paths to Absolute (all 4 job templates) - Change from: "../notebooks/security_analysis_driver.py" - Change to: "/Applications/SAT/notebooks/security_analysis_driver.py" - Makes paths explicit and avoids relative path resolution issues - Updated: sat_driver_job.yml.tmpl, sat_initiliazer_job.yml.tmpl, sat_secrets_scanner_job.yml.tmpl, brickhound_job.yml.tmpl 3. Remove Misleading Web App Deployment Prompt - Remove deploy_brickhound_app from config.py (installer prompt) - Remove deploy_brickhound_app from main.py (config variable) - Remove deploy_brickhound_app from databricks_template_schema.json - This variable was collected but never used (no deployment logic) - Web app deployment must be done manually per Terraform docs Files Modified: - dabs/dabs_template/template/tmp/databricks.yml.tmpl - dabs/dabs_template/template/tmp/resources/sat_driver_job.yml.tmpl - dabs/dabs_template/template/tmp/resources/sat_initiliazer_job.yml.tmpl - dabs/dabs_template/template/tmp/resources/sat_secrets_scanner_job.yml.tmpl - dabs/dabs_template/template/tmp/resources/brickhound_job.yml.tmpl - dabs/sat/config.py - dabs/main.py - dabs/dabs_template/databricks_template_schema.json Expected Result After Fix: - Notebooks sync to /Applications/SAT/notebooks/ - All 4 jobs run successfully (no FileNotFoundError) - Installer only asks: "Deploy Permissions Analysis? (Y/n)" - No misleading web app deployment prompt Tested: Will be tested by user after deployment * fix(dabs): correct notebook paths to include /files/ subdirectory DABS syncs files to {root_path}/files/ automatically, so notebook paths must include the /files/ subdirectory. Also removed .py extensions as workspace notebook paths don't use file extensions. Fixed paths in all 4 job templates: - sat_driver_job.yml.tmpl - sat_initiliazer_job.yml.tmpl - sat_secrets_scanner_job.yml.tmpl - brickhound_job.yml.tmpl Old: /Applications/SAT/notebooks/security_analysis_driver.py New: /Workspace/Applications/SAT/files/notebooks/security_analysis_driver This fixes the "Unable to access notebook" errors that occur when jobs run. * docs: clarify Permissions Analysis web app requires manual deployment The Permissions Analysis web app is NOT automatically deployed by DABS or Terraform and must be deployed manually using databricks apps create. Updated documentation: - docs/BRICKHOUND_INTEGRATION.md: Added warning and detailed manual deployment steps - docs/brickhound_user_guide.md: Removed Terraform output reference, added deployment section This prevents confusion about app deployment expectations during SAT installation. * chore: update SAT version to 0.6.0 and simplify BrickHound job config - Updated SAT version from 0.5.0 to 0.6.0 in notebooks/Utils/initialize.py - Removed notification settings from terraform/common/brickhound_job.tf (pause_status, email_notifications, notification_settings) This aligns with the 0.6.0 release which includes BrickHound integration. * chore: remove CLAUDE.md from git tracking Add CLAUDE.md to .gitignore and remove from repository. The file remains locally for AI assistant context but is no longer tracked in version control. * chore: remove PROJECT_STRUCTURE_IMPROVEMENTS.md from git tracking Add PROJECT_STRUCTURE_IMPROVEMENTS.md to .gitignore and remove from repository. The file remains locally for reference but is no longer tracked in version control. * chore: remove BrickHound documentation from git tracking Add BrickHound-related documentation files to .gitignore and remove from repository: - docs/BRICKHOUND_INTEGRATION.md - docs/brickhound_PERMISSIONS.md - docs/brickhound_README.md - docs/brickhound_user_guide.md - docs/databricks_mcp_tools.md Files remain locally for reference but are no longer tracked in version control. * feat: align account API access logic with dbclient.py and add User-Agent header - Add helper functions get_domain_from_url() and parse_cloud_type() to support government clouds (e.g., .com.br, .mil) - Update account host determination to use domain-aware URL construction - Remove redundant account host logic by reusing stored value - Add User-Agent header "databricks-sat/0.1.0" to all API calls - Fix Azure workspace URL construction to use deployment_name Aligns with reference implementation in src/securityanalysistoolproject/core/dbclient.py * chore: remove test/development files from app/brickhound - Remove app.py (redundant entry point, not used by app.yaml) - Remove flask_app.py (alternative implementation for testing) - Remove minimal_app.py (minimal Flask test app) - Remove simple_flask.py (simple rendering test) - Remove test_app.py (diagnostic test app) - Update app.yaml to use placeholder values instead of hardcoded config Only production files remain: - working_app.py (main app called by app.yaml) - app.yaml (deployment configuration) - databricks.yml (databricks configuration) - README.md (documentation) - requirements.txt (dependencies) * refactor: rename working_app.py to app.py and fix debug mode - Rename working_app.py to app.py for cleaner naming - Fix debug mode to use FLASK_DEBUG environment variable (default: False) - Update app.yaml to reference app.py instead of working_app.py - Add security comment warning about debug mode in production Security improvement: Debug mode now defaults to False, preventing exposure of sensitive information in production. Can be enabled for local development with FLASK_DEBUG=true environment variable. * fix: generalize network policy check from serverless to all workspaces - Update check NS-9 (ID 111) to apply to all workspaces, not just serverless - Change "Serverless workspaces" to "Workspaces" in security_best_practices.csv - Update corresponding comment in workspace_analysis.py This aligns the check with the actual implementation which evaluates network policies for all workspaces in the account. * fix: remove hardcoded configuration values and add validation Security and portability improvements: - Remove hardcoded warehouse ID (82fc4fdc7d3edb9f) - Remove hardcoded default values for CATALOG and SCHEMA - Add validation to ensure required environment variables are set - Raise clear errors with actionable messages if configuration is missing - Update app.yaml comments to clarify configuration requirements - Fix missing closing bracket in WAREHOUSE_ID placeholder This ensures the app fails fast with clear error messages if not properly configured, rather than silently using hardcoded values that won't work in other environments. Changes: - app.py: Add validation for BRICKHOUND_CATALOG, BRICKHOUND_SCHEMA, WAREHOUSE_ID - app.yaml: Improve configuration comments and fix placeholder syntax * Feature/terraform additional sat config variables (#277) * add var.job_compute_num_workers * add var.job_schedule_timezone_id * add variable driver and secret scanner cron expressions * add config variables sql warehouse * fix cron_expression validation * Add var.provisioner_name to prevent Terraform plan diffs when switching between local users and CI service principal * updating security scanning and contributing.md * Enhance job scheduling configuration in Terraform and Python files * feat: centralize SAT SDK wheel in lib/ directory with workspace file installation - Create lib/ directory at project root for centralized wheel storage - Add dbl_sat_sdk-0.1.40-py3-none-any.whl to lib/ (forced add despite gitignore) - Update install_sat_sdk.py to use Databricks workspace file installation syntax - Upgrade SDK version from 0.1.38 to 0.1.40 in install_sat_sdk.py - Implement dynamic path construction to find wheel from any notebook location - Remove legacy commented code (DBFS FileStore approach) - Add fallback path for production deployments Benefits: - Single source of truth for SDK wheel file - Portable across users and workspace locations - Works from all notebook depths (root, subdirs, nested subdirs) - No changes needed to 23+ notebooks that call install_sat_…
1 parent 6978257 commit aa02b32

19 files changed

Lines changed: 914 additions & 249 deletions
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
---
2+
sidebar_position: 4
3+
---
4+
import useBaseUrl from '@docusaurus/useBaseUrl';
5+
import Admonition from '@theme/Admonition';
6+
7+
# Cluster Configuration Secrets Scanning
8+
9+
<Admonition type="info" title="Version 0.6.0 Feature">
10+
SAT's secret scanning capability—previously limited to notebooks—now extends to cluster configurations.
11+
</Admonition>
12+
13+
## Overview
14+
15+
Developers sometimes hardcode credentials in cluster environment variables during testing and forget to remove them. These secrets can persist for months, accessible to anyone who can view cluster configurations. SAT's Cluster Configuration Secrets Scanning helps you identify and remediate these security risks.
16+
17+
## What It Does
18+
19+
SAT scans `spark_env_vars` in cluster configurations for hardcoded credentials using TruffleHog with 800+ detector patterns, including:
20+
21+
- **AWS credentials**: Access keys, secret keys, session tokens
22+
- **Azure secrets**: Service principal keys, storage account keys
23+
- **GCP credentials**: Service account keys, API keys
24+
- **GitHub tokens**: Personal access tokens, OAuth tokens
25+
- **API keys**: Stripe, Slack, SendGrid, and other service API keys
26+
- **Database passwords**: Connection strings with embedded credentials
27+
- **Private keys**: SSH keys, RSA keys, and other cryptographic material
28+
- **Databricks-specific tokens**: DKEA, DAPI, DOSE tokens
29+
30+
## Why It Matters
31+
32+
Hardcoded credentials in cluster configurations pose significant security risks:
33+
34+
- **Persistent Exposure**: Secrets remain in cluster configs until explicitly removed
35+
- **Broad Access**: Anyone with cluster view permissions can see these secrets
36+
- **Compliance Violations**: Hardcoded secrets violate security best practices and compliance requirements
37+
- **Attack Surface**: Exposed credentials can be used to access cloud resources, databases, and external services
38+
39+
## How It Works
40+
41+
1. **Data Collection**: SAT scans all cluster configurations in your workspaces
42+
2. **Pattern Detection**: TruffleHog analyzes `spark_env_vars` using 800+ detector patterns
43+
3. **Correlation**: Results are correlated with notebook scan findings for comprehensive secret exposure analysis
44+
4. **Reporting**: Findings are displayed in the SAT Dashboard and stored in the `clusters_secret_scan_results` table
45+
46+
## Viewing Results
47+
48+
Cluster secret scan results are available in multiple places:
49+
50+
### SAT Dashboard
51+
52+
The Secret Scanner Dashboard displays cluster secret findings alongside notebook findings, providing a comprehensive view of secret exposure across your environment.
53+
54+
<div className='bg-gray-100 p-4 rounded-lg mb-6'>
55+
<img src={useBaseUrl('/img/secret_scanner_dashboard.png')} alt="Secret Scanner Dashboard" />
56+
</div>
57+
58+
### Database Tables
59+
60+
Results are stored in the `clusters_secret_scan_results` table within your SAT schema:
61+
62+
- **Cluster ID**: Identifier of the cluster with exposed secrets
63+
- **Cluster Name**: Human-readable cluster name
64+
- **Environment Variable**: The variable name containing the secret
65+
- **Detector**: Type of secret detected (e.g., "AWS Secret Key")
66+
- **Severity**: Risk level (HIGH, MEDIUM, LOW)
67+
- **Workspace**: Workspace where the cluster was found
68+
- **Scan Timestamp**: When the scan was performed
69+
70+
## Example Finding
71+
72+
```
73+
Cluster: dev-team-cluster
74+
Environment Variable: AWS_SECRET_ACCESS_KEY
75+
Detector: AWS Secret Key
76+
Severity: HIGH
77+
Recommendation: Move to Databricks Secrets
78+
```
79+
80+
## Remediation
81+
82+
<Admonition type="warning" title="Remediation Required">
83+
High-severity findings should be remediated immediately by removing hardcoded secrets and moving them to Databricks Secrets.
84+
</Admonition>
85+
86+
### Recommended Approach
87+
88+
1. **Remove Hardcoded Secrets**: Delete the secret from the cluster's `spark_env_vars` configuration
89+
2. **Use Databricks Secrets**: Store credentials in Databricks Secrets (Databricks-backed or customer-managed)
90+
3. **Reference Secrets**: Update cluster configuration to reference secrets using `{{secrets/scope/key}}` syntax
91+
4. **Verify Access**: Ensure the cluster has appropriate permissions to access the secret scope
92+
93+
### Example: Migrating to Databricks Secrets
94+
95+
**Before (Insecure):**
96+
```json
97+
{
98+
"spark_env_vars": {
99+
"AWS_SECRET_ACCESS_KEY": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
100+
}
101+
}
102+
```
103+
104+
**After (Secure):**
105+
```json
106+
{
107+
"spark_env_vars": {
108+
"AWS_SECRET_ACCESS_KEY": "{{secrets/aws-credentials/secret-key}}"
109+
}
110+
}
111+
```
112+
113+
## Configuration
114+
115+
Secret scanning behavior can be configured via the `trufflehog_detectors.yaml` file located in the `configs` folder:
116+
117+
- **days_back**: Controls how far back to search for modified clusters
118+
- Set to `1` to scan clusters modified in the last day (default)
119+
- Set to `7` to scan clusters modified in the last week
120+
- Set to `0` to scan **ALL clusters** in the workspace (no time filter)
121+
- **page_size**: Number of clusters to process per API page (default: 50)
122+
- **custom detectors**: Add custom regex patterns to detect organization-specific secret formats
123+
124+
### Firewall Allowlist Requirements
125+
126+
If your environment uses firewalls, you may need to allowlist the following GitHub domains to ensure secret scanning can download TruffleHog and access required resources:
127+
128+
<Admonition type="warning" title="Firewall Configuration">
129+
If you encounter connection issues during secret scanning, ensure your firewall allows outbound connections to the following domains:
130+
</Admonition>
131+
132+
Add the following URLs to your firewall allowlist:
133+
134+
- `raw.githubusercontent.com`
135+
- `github.com`
136+
- `token.actions.githubusercontent.com`
137+
- `release-assets.githubusercontent.com`
138+
139+
## Best Practices
140+
141+
<Admonition type="tip" title="Regular Scanning">
142+
Run cluster secret scans regularly (weekly or bi-weekly) to catch newly introduced secrets before they become security risks.
143+
</Admonition>
144+
145+
<Admonition type="tip" title="Automated Workflows">
146+
Consider implementing automated workflows to alert security teams when high-severity secrets are detected in cluster configurations.
147+
</Admonition>
148+
149+
<Admonition type="tip" title="Developer Training">
150+
Educate developers on using Databricks Secrets instead of hardcoding credentials. Include this in onboarding and security training.
151+
</Admonition>
152+
153+
<Admonition type="warning" title="False Positives">
154+
Some patterns may trigger false positives. Review findings carefully and adjust detector patterns if needed.
155+
</Admonition>
156+
157+
## Integration with Notebook Scanning
158+
159+
Cluster Configuration Secrets Scanning complements SAT's existing notebook secret scanning:
160+
161+
- **Comprehensive Coverage**: Identifies secrets in both notebooks and cluster configurations
162+
- **Correlated Analysis**: Correlates findings to identify patterns of secret exposure
163+
- **Unified Reporting**: All secret findings are displayed in a single dashboard
164+
165+
## Learn More
166+
167+
- [Usage Guide](/docs/usage) - Instructions on running SAT workflows and viewing results
168+
- [Permissions Analysis](../permissions-analysis) - Understand who can access cluster configurations
169+
- [General Dashboard](../general-dashboard) - View comprehensive security findings
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
---
2+
sidebar_position: 2
3+
---
4+
import useBaseUrl from '@docusaurus/useBaseUrl';
5+
import Admonition from '@theme/Admonition';
6+
7+
# Executive Dashboard
8+
9+
The **Executive Dashboard** is a streamlined, high-level view designed for stakeholders and executives who need a quick overview of the security posture without getting into technical details.
10+
11+
<div className='bg-gray-100 p-4 rounded-lg mb-6'>
12+
<img src={useBaseUrl('/img/executive_dashboard.png')} alt="Executive Dashboard" />
13+
</div>
14+
15+
## Overview
16+
17+
In addition to the comprehensive SAT dashboard, an **Executive Dashboard** is also provided. This view consolidates key findings from the complete dashboard and presents them in an easy-to-digest format that's perfect for leadership and business stakeholders.
18+
19+
<Admonition type="tip" title="Databricks One Integration">
20+
You can use **Databricks One** to access and share the SAT dashboards across your organization, making it easier to distribute security insights to stakeholders.
21+
</Admonition>
22+
23+
## What Leadership Can Do
24+
25+
The Executive Dashboard enables leadership to:
26+
27+
-**Quickly understand** the overall security health of Databricks deployments
28+
-**Identify critical security issues** that require immediate attention
29+
-**Track security improvements** over time
30+
-**Make informed decisions** about security investments and priorities
31+
32+
This executive-level view complements the detailed technical dashboard, ensuring that both technical teams and business stakeholders have access to the security insights they need.
33+
34+
## Key Metrics
35+
36+
The Executive Dashboard focuses on high-level metrics and trends:
37+
38+
### Security Health Score
39+
40+
A consolidated view of overall security posture across all workspaces, typically displayed as:
41+
- Overall security score or rating
42+
- Trend indicators (improving, stable, declining)
43+
- Comparison across workspaces or time periods
44+
45+
### Critical Issues Summary
46+
47+
High-level summary of critical security findings:
48+
- Count of high-severity issues
49+
- Count of medium-severity issues
50+
- Count of low-severity issues
51+
- Trend analysis showing improvement or regression
52+
53+
### Workspace Overview
54+
55+
Summary information about monitored workspaces:
56+
- Number of workspaces analyzed
57+
- Workspace health status
58+
- Geographic distribution
59+
- Cloud provider breakdown
60+
61+
### Compliance Status
62+
63+
High-level compliance indicators:
64+
- Overall compliance score
65+
- Key compliance categories status
66+
- Audit readiness indicators
67+
68+
## Benefits for Different Audiences
69+
70+
### For Executives
71+
72+
- **Time-efficient**: Get security insights in minutes, not hours
73+
- **Actionable**: Clear priorities and recommendations
74+
- **Trend-focused**: Understand security posture over time
75+
- **Risk-aware**: Identify critical issues requiring attention
76+
77+
### For Security Leadership
78+
79+
- **Strategic view**: Understand security posture across the organization
80+
- **Resource planning**: Identify areas needing investment
81+
- **Compliance tracking**: Monitor compliance status and trends
82+
- **Stakeholder communication**: Share security status with business leaders
83+
84+
### For Business Stakeholders
85+
86+
- **Non-technical**: Easy to understand without deep technical knowledge
87+
- **Business-aligned**: Focus on business impact and risk
88+
- **Transparent**: Clear visibility into security posture
89+
- **Confidence-building**: Demonstrate proactive security management
90+
91+
## Dashboard Configuration
92+
93+
The Executive Dashboard follows the same configuration process as the General Dashboard:
94+
95+
1. Click on the dashboard and in the top right, click on the **"Share"** button.
96+
2. Click on the cogwheel icon and select **"Assign new owner"**. Choose the new owner of the dashboard.
97+
3. Click on the **"Published"** icon next to the name of the dashboard, and switch to the **"Draft"** version. Click on the **"Publish"** button.
98+
- Choose from one of the two credential options:
99+
- **Embed credentials (default):** All viewers run queries using the owner's credentials and compute.
100+
- **Don't embed credentials:** Each viewer must have access to the workspace and associated data to view the dashboard.
101+
> We recommend using this option for more secure access control.
102+
4. The dashboard can be shared with other team members by clicking the **"Share"** button from the **"Published"** mode.
103+
104+
<Admonition type="tip" title="Sharing Configuration">
105+
Consider creating different sharing groups for executives vs. technical teams. Executives may only need view access, while technical teams may need access to drill down into details.
106+
</Admonition>
107+
108+
## When to Use
109+
110+
### Regular Reviews
111+
112+
Use the Executive Dashboard for:
113+
- **Weekly security briefings** with leadership
114+
- **Monthly security reviews** with stakeholders
115+
- **Quarterly business reviews** including security posture
116+
- **Board presentations** on security status
117+
118+
### Decision Making
119+
120+
The Executive Dashboard supports:
121+
- **Budget planning** for security improvements
122+
- **Resource allocation** for security initiatives
123+
- **Risk assessment** for business decisions
124+
- **Compliance reporting** to auditors and regulators
125+
126+
### Communication
127+
128+
Share the Executive Dashboard to:
129+
- **Demonstrate progress** on security initiatives
130+
- **Highlight achievements** in security improvements
131+
- **Raise awareness** of security priorities
132+
- **Build confidence** in security posture
133+
134+
## Relationship to General Dashboard
135+
136+
The Executive Dashboard and General Dashboard work together:
137+
138+
| **Executive Dashboard** | **General Dashboard** |
139+
|-------------------------|----------------------|
140+
| High-level metrics | Detailed findings |
141+
| Business-focused | Technical-focused |
142+
| Strategic view | Operational view |
143+
| Quick overview | Deep dive analysis |
144+
| For leadership | For technical teams |
145+
146+
<Admonition type="info" title="Dashboard Relationship">
147+
Both dashboards use the same underlying data, ensuring consistency. The Executive Dashboard provides the "what" and "why," while the General Dashboard provides the "how" and "where."
148+
</Admonition>
149+
150+
## Best Practices
151+
152+
<Admonition type="tip" title="Regular Updates">
153+
Ensure the Executive Dashboard is updated regularly (weekly or bi-weekly) to provide current security insights to leadership.
154+
</Admonition>
155+
156+
<Admonition type="tip" title="Customization">
157+
Consider customizing the Executive Dashboard to highlight metrics most relevant to your organization's security priorities and compliance requirements.
158+
</Admonition>
159+
160+
<Admonition type="tip" title="Dashboard Usage">
161+
Use both dashboards together: Executive Dashboard for high-level status, General Dashboard for detailed remediation.
162+
</Admonition>
163+
164+
## Learn More
165+
166+
- [General Dashboard](../general-dashboard) - Detailed technical dashboard
167+
- [Permissions Analysis](../permissions-analysis) - Graph-based access analysis
168+
- [Cluster Secrets Scanning](../cluster-secrets-scanning) - Detect exposed credentials
169+
- [Usage Guide](/docs/usage) - Instructions on running SAT workflows and accessing dashboards

0 commit comments

Comments
 (0)