Skip to content

Improve Data Drifter: app refactor, live controls, DRY notebooks#1

Open
artsheiko wants to merge 5 commits intodatabricks-solutions:mainfrom
artsheiko:fix/sailboat-analysis-temp-views-to-tables
Open

Improve Data Drifter: app refactor, live controls, DRY notebooks#1
artsheiko wants to merge 5 commits intodatabricks-solutions:mainfrom
artsheiko:fix/sailboat-analysis-temp-views-to-tables

Conversation

@artsheiko
Copy link
Copy Markdown

@artsheiko artsheiko commented Mar 10, 2026

Summary

Major improvements to the Data Drifter Regatta demo covering the Streamlit app, telemetry generator, analysis notebooks, and deployment infrastructure.

App (Streamlit)

  • Modular architecture — Split monolithic app.py (1,384 lines) into 3 modules: app.py (layout), db_connection.py (SQL), components.py (map/leaderboard/stats)
  • Partial refresh@st.fragment(run_every=...) re-renders only the dashboard, not the whole page
  • Live map centering — Map follows the fleet centroid with auto-zoom based on boat spread
  • Configurable refresh — Slider in sidebar (5–60s, default 30s)
  • Start New Race — Dynamically discovers all tables via SHOW TABLES and truncates them
  • Race speed controls — 0.5x / 1x / 2x / 4x buttons write to a shared race_control table; generator polls and adjusts emission speed in real time
  • Schema-level permissionsGRANT SELECT, MODIFY ON SCHEMA instead of per-table grants

Telemetry Generator (main.py)

  • Speed control — Polls race_control table every 5s, adjusts sleep interval by multiplier
  • Typo fixes — "possbile" → "possible", "Zeorbus" → "Zerobus" in banner

Analysis Notebooks

  • DRY utilities — Extracted get_schema_prefix() and categorize_wind() to race_utils.py
  • Configurable thresholds — Wind categories (8/15 kt) and consistency ratings (0.1/0.2/0.3 CV) read from config.toml [analysis] section instead of hardcoded
  • All 4 notebooks updated to use shared functions

Infrastructure

  • race_control table — New Delta table + notebook for speed control signaling between app and generator
  • config.toml — Added [analysis] section with configurable thresholds
  • deploy.sh — Prerequisite validation (databricks CLI, jq, python3, TOML parser), increased polling timeout from 60s to 120s
  • databricks.yml — Added create_control_table task to setup job
  • grant_permissions.py — Schema-level grants for all current and future tables

Test plan

  • Run ./deploy.sh — verify tables created, app deployed, permissions granted
  • Run python3 main.py --client-id ... --client-secret ... — verify telemetry streams
  • Open app — verify map centers on boats, auto-refreshes every 30s
  • Click speed buttons (0.5x, 2x, 4x) — verify generator logs speed change within ~5s
  • Click "Start New Race" — verify all tables truncated, speed reset to 1x
  • Run databricks bundle run sailboat_analysis — verify all 4 notebooks complete with persisted tables

🤖 Generated with Claude Code

artemsheiko and others added 4 commits March 10, 2026 03:20
Analysis notebooks saved intermediate results as Spark temp views, which
don't persist across DAB job tasks running in separate Spark contexts.
Replace all 18 temp views with persisted Delta tables using
write.mode("overwrite").saveAsTable() with fully-qualified names.

Also fixes:
- Missing config_file_path variable in databricks.yml
- NameError on df_with_weather in 04_race_summary
- Hardcoded warehouse ID replaced with ${var.sql_warehouse_id}
- deploy.sh --profile now uses $DATABRICKS_CONFIG_PROFILE env var
- Reduce real_time_duration_seconds from 300 to 120

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Let the --profile flag determine the workspace instead of hardcoding
it in databricks.yml, so deploy.sh works across different environments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Document service principal setup in Prerequisites
- Add --profile flag to all CLI examples
- Document DATABRICKS_CONFIG_PROFILE and DATABRICKS_TARGET env vars
- Note that analysis results are persisted as Delta tables
- Fix architecture diagram fleet size (configurable, default=6)
- Add "no module named toml" troubleshooting entry
- Fix app name in manual examples (data-drifter-regatta-v3)
- Fix typos in main.py banner ("possbile" → "possible", "Zeorbus" → "Zerobus")

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dened deploy

Major improvements to the Data Drifter Regatta demo:

**App (Streamlit)**
- Split monolithic app.py (1384 lines) into 3 modules: app.py, db_connection.py, components.py
- Add partial refresh via @st.fragment — only dashboard re-renders, not the whole page
- Center map on fleet centroid with auto-zoom based on boat spread
- Add configurable refresh interval slider (5-60s, default 30s)
- Add "Start New Race" button — dynamically discovers and truncates all schema tables
- Add race speed controls (0.5x, 1x, 2x, 4x) via shared control table
- Show effective speed under buttons with base × multiplier breakdown
- Display both telemetry and weather table names in Data Source sidebar
- Grant schema-level SELECT/MODIFY for app service principal

**Telemetry Generator (main.py)**
- Add speed control: polls race_control table every 5s, adjusts emission interval
- Replace exit(1) with retry logic (3 retries, exponential backoff)
- Fix typos in ASCII banner

**Notebooks (Analysis)**
- Extract get_schema_prefix() and categorize_wind() to race_utils.py (DRY)
- Replace hardcoded wind thresholds with configurable values from config.toml [analysis]
- Replace hardcoded consistency thresholds with config-driven values
- All 4 notebooks updated to use shared utilities

**Infrastructure**
- Add race_control table to databricks.yml create_tables job
- Add [analysis] section to config.toml for configurable thresholds
- Harden deploy.sh: prerequisite validation, increased polling timeout
- Grant schema-level permissions instead of per-table

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@artsheiko artsheiko changed the title Fix sailboat analysis job and remove hardcoded values Improve Data Drifter: app refactor, live controls, DRY notebooks Mar 10, 2026
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants