Skip to content

App MVP#1090

Merged
mwojtyczka merged 52 commits intomainfrom
feature/app-refactor
Apr 30, 2026
Merged

App MVP#1090
mwojtyczka merged 52 commits intomainfrom
feature/app-refactor

Conversation

@laurencewells
Copy link
Copy Markdown
Contributor

@laurencewells laurencewells commented Mar 20, 2026

Changes

Linked issues

This pull request introduces major improvements to the DQX App's documentation and deployment configuration, focusing on clarifying the architecture, authentication model, and deployment steps, as well as enhancing the Databricks Asset Bundle (databricks.yml) to provision all required resources automatically. The changes also update development dependencies and provide a comprehensive product overview in a new CLAUDE.md file.

Key changes include:

1. Documentation and Product Context

  • Added CLAUDE.md with a detailed product overview, architecture diagram, personas, user journeys, and internal storage design for the DQX App. This helps new contributors and users understand the app's purpose, scope, and design decisions.
  • Expanded and reorganized README.md to clarify the authentication model (OBO vs. SP), async job pattern for Spark operations, deployment steps, troubleshooting, and removed outdated configuration steps. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10]

2. Deployment Automation and Resource Provisioning

  • Enhanced databricks.yml to:
    • Define variables for catalog and schema names, and centralize app environment configuration.
    • Automatically provision required resources: SQL warehouse, Delta schema, and Databricks Job for async Spark tasks (profiler and dry-run), and wire their IDs into the app config. [1] [2]
    • Grant necessary permissions and set up API scopes for workspace and catalog access.

3. Dependency Updates

  • Replaced databricks-connect with databricks-sql-connector[pyarrow] in pyproject.toml, reflecting the new architecture where Spark operations are offloaded to a Databricks Job rather than running in-process.

4. Development and Troubleshooting

  • Updated documentation to cover the new async job pattern, environment variables, and troubleshooting steps for profiler/dry-run operations. [1] [2]

These changes make the app easier to deploy, maintain, and understand, and set a strong foundation for future development.


References:
[1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]

Resolves #..

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • added end-to-end tests
  • added performance tests

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 20, 2026

✅ 722/722 passed, 1 flaky, 39 skipped, 5h20m41s total

Flaky tests:

  • 🤪 test_e2e_workflow_for_patterns_exclude_patterns (4m13.413s)

Running from acceptance #4537

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 20, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.11%. Comparing base (c83bbe7) to head (c0f094b).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1090      +/-   ##
==========================================
+ Coverage   91.86%   92.11%   +0.25%     
==========================================
  Files         101      101              
  Lines        9546     9546              
==========================================
+ Hits         8769     8793      +24     
+ Misses        777      753      -24     
Flag Coverage Δ
anomaly 53.54% <ø> (+30.68%) ⬆️
anomaly-serverless 53.55% <ø> (+30.63%) ⬆️
integration 50.36% <ø> (+0.84%) ⬆️
integration-serverless 50.61% <ø> (+0.24%) ⬆️
unit 55.88% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@laurencewells
Copy link
Copy Markdown
Contributor Author

Few issues to sort outstanding
The App needs to give user Create table permissions to the temp schema
One more scope niggle in the LLM Generate rules for live deployment

mwojtyczka and others added 11 commits April 24, 2026 22:51
…r, and streamline docs

- Allow VIEWER role to access run history (dryrun.py)
- Fix tmp schema creation to use SP credentials instead of user OBO token,
  preventing PERMISSION_DENIED for non-admin users on dry run
- Fix sql_expression checks treating col_name '*' as a literal column,
  which disabled all tables in the target table picker
- Add job retry config (max_retries: 1) to task runner in databricks.yml
- Grant USE CATALOG to account users at app startup
- Streamline README.md, DEVELOPMENT.md, DEPLOYMENT.md to remove duplication
- Update DEPLOYMENT.md with SP/app principal permission grant instructions
- Regenerate yarn.lock to remove internal npm proxy URLs
- Fix TestViewService unit tests for module-level _tmp_schema_ready flag
- Add public reset_tmp_schema_ready() to view_service to avoid
  protected-access and import-outside-toplevel pylint violations
- Downgrade orval from 8.0.0 to 7.21.0 to fix CI build failure
  where -i and -c flags conflict in orval 8.0.0
Comment thread app/app.yml Outdated
Comment thread app/src/databricks_labs_dqx_app/backend/migrations/__init__.py
Comment thread app/tasks/src/dqx_task_runner/runner.py
Comment thread app/databricks.yml
Comment thread app/src/databricks_labs_dqx_app/backend/common/authorization.py
Comment thread app/src/databricks_labs_dqx_app/backend/sql_executor.py
Comment thread app/src/databricks_labs_dqx_app/backend/cache.py Outdated
Comment thread app/src/databricks_labs_dqx_app/backend/cache.py
Comment thread app/src/databricks_labs_dqx_app/backend/spa_static.py Outdated
mwojtyczka and others added 6 commits April 29, 2026 12:34
…mprove robustness

- Fix app permissions in databricks.yml: use account users group for
  org-wide CAN_USE access; add comments explaining CREATE_TABLE grant
  and user_api_scopes baseline vs all-apis requirement
- Add run_id validation in task runner to prevent SQL identifier injection
- Make migrations idempotent: ADD COLUMN IF NOT EXISTS, split multi-statement
  DDL, catch already-applied errors (COLUMN_ALREADY_EXISTS, liquid clustering)
- Use escape_sql_string for migration history INSERT consistency
- Fix cache MISS sentinel so None return values are cached correctly,
  preventing repeated upstream calls for missing-data lookups
- Fix SPA static file fallback: use asset extension allowlist instead of
  dot-in-segment heuristic to avoid false 404s on routes like /rules/v1.2
- Fix deprecated regex to pattern in quarantine export Query param
- Add post_deploy_grants.sh script and update DEPLOYMENT.md
- Update unit tests for cache sentinel and multi-statement migration counts
Copy link
Copy Markdown
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved to Merge When PR is reviewed and approved. To be merged once all tests pass

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: No-Spark dependency for the DQX App (Reduce API scopes)

3 participants