Improve Data Drifter: app refactor, live controls, DRY notebooks#1
Open
artsheiko wants to merge 5 commits intodatabricks-solutions:mainfrom
Open
Conversation
Analysis notebooks saved intermediate results as Spark temp views, which
don't persist across DAB job tasks running in separate Spark contexts.
Replace all 18 temp views with persisted Delta tables using
write.mode("overwrite").saveAsTable() with fully-qualified names.
Also fixes:
- Missing config_file_path variable in databricks.yml
- NameError on df_with_weather in 04_race_summary
- Hardcoded warehouse ID replaced with ${var.sql_warehouse_id}
- deploy.sh --profile now uses $DATABRICKS_CONFIG_PROFILE env var
- Reduce real_time_duration_seconds from 300 to 120
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Let the --profile flag determine the workspace instead of hardcoding it in databricks.yml, so deploy.sh works across different environments. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Document service principal setup in Prerequisites
- Add --profile flag to all CLI examples
- Document DATABRICKS_CONFIG_PROFILE and DATABRICKS_TARGET env vars
- Note that analysis results are persisted as Delta tables
- Fix architecture diagram fleet size (configurable, default=6)
- Add "no module named toml" troubleshooting entry
- Fix app name in manual examples (data-drifter-regatta-v3)
- Fix typos in main.py banner ("possbile" → "possible", "Zeorbus" → "Zerobus")
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dened deploy Major improvements to the Data Drifter Regatta demo: **App (Streamlit)** - Split monolithic app.py (1384 lines) into 3 modules: app.py, db_connection.py, components.py - Add partial refresh via @st.fragment — only dashboard re-renders, not the whole page - Center map on fleet centroid with auto-zoom based on boat spread - Add configurable refresh interval slider (5-60s, default 30s) - Add "Start New Race" button — dynamically discovers and truncates all schema tables - Add race speed controls (0.5x, 1x, 2x, 4x) via shared control table - Show effective speed under buttons with base × multiplier breakdown - Display both telemetry and weather table names in Data Source sidebar - Grant schema-level SELECT/MODIFY for app service principal **Telemetry Generator (main.py)** - Add speed control: polls race_control table every 5s, adjusts emission interval - Replace exit(1) with retry logic (3 retries, exponential backoff) - Fix typos in ASCII banner **Notebooks (Analysis)** - Extract get_schema_prefix() and categorize_wind() to race_utils.py (DRY) - Replace hardcoded wind thresholds with configurable values from config.toml [analysis] - Replace hardcoded consistency thresholds with config-driven values - All 4 notebooks updated to use shared utilities **Infrastructure** - Add race_control table to databricks.yml create_tables job - Add [analysis] section to config.toml for configurable thresholds - Harden deploy.sh: prerequisite validation, increased polling timeout - Grant schema-level permissions instead of per-table Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Major improvements to the Data Drifter Regatta demo covering the Streamlit app, telemetry generator, analysis notebooks, and deployment infrastructure.
App (Streamlit)
app.py(1,384 lines) into 3 modules:app.py(layout),db_connection.py(SQL),components.py(map/leaderboard/stats)@st.fragment(run_every=...)re-renders only the dashboard, not the whole pageSHOW TABLESand truncates themrace_controltable; generator polls and adjusts emission speed in real timeGRANT SELECT, MODIFY ON SCHEMAinstead of per-table grantsTelemetry Generator (
main.py)race_controltable every 5s, adjusts sleep interval by multiplierAnalysis Notebooks
get_schema_prefix()andcategorize_wind()torace_utils.pyconfig.toml [analysis]section instead of hardcodedInfrastructure
race_controltable — New Delta table + notebook for speed control signaling between app and generatorconfig.toml— Added[analysis]section with configurable thresholdsdeploy.sh— Prerequisite validation (databricks CLI, jq, python3, TOML parser), increased polling timeout from 60s to 120sdatabricks.yml— Addedcreate_control_tabletask to setup jobgrant_permissions.py— Schema-level grants for all current and future tablesTest plan
./deploy.sh— verify tables created, app deployed, permissions grantedpython3 main.py --client-id ... --client-secret ...— verify telemetry streamsdatabricks bundle run sailboat_analysis— verify all 4 notebooks complete with persisted tables🤖 Generated with Claude Code