Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 24 additions & 24 deletions databricks_job_executor/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ A Streamlit application for executing and monitoring Databricks migration jobs.
- Python 3.8+
- Streamlit
- Databricks workspace access
- Databricks personal access token
- Databricks service principal with OAuth M2M credentials (client ID and client secret)

### Installation

Expand All @@ -28,14 +28,16 @@ pip install -r requirements.txt
2. Set environment variables:
```bash
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-personal-access-token"
export DATABRICKS_CLIENT_ID="your-client-id"
export DATABRICKS_CLIENT_SECRET="your-client-secret"
export DATABRICKS_JOB_ID="123456" # Optional: specific job ID to run
```

Or create a `.env` file:
```
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=your-personal-access-token
DATABRICKS_CLIENT_ID=your-client-id
DATABRICKS_CLIENT_SECRET=your-client-secret
DATABRICKS_JOB_ID=123456
```

Expand Down Expand Up @@ -86,10 +88,15 @@ This application can be deployed to Databricks using Databricks Asset Bundles.

The application requires the following environment variables:

- **DATABRICKS_HOST** (required): Your Databricks workspace URL (e.g., `https://your-workspace.cloud.databricks.com`)
- **DATABRICKS_TOKEN** (required): Your Databricks personal access token
- **DATABRICKS_HOST** (required for local): Your Databricks workspace URL (e.g., `https://your-workspace.cloud.databricks.com`)
- **DATABRICKS_CLIENT_ID** (required for local): Your service principal client ID
- **DATABRICKS_CLIENT_SECRET** (required for local): Your service principal client secret
- **DATABRICKS_JOB_ID** (required): The specific job ID to run

**Authentication Methods:**
- **Local Development**: Uses OAuth M2M (service principal) with `DATABRICKS_CLIENT_ID` and `DATABRICKS_CLIENT_SECRET`
- **Databricks Runtime**: Automatically uses built-in authentication (no credentials needed)

These credentials are read from environment variables at startup. The connection status is displayed in the sidebar.

## Usage
Expand All @@ -104,7 +111,7 @@ These credentials are read from environment variables at startup. The connection

## Security Note

Never commit your `DATABRICKS_TOKEN` to version control. Always use environment variables or secure credential management systems.
Never commit your `DATABRICKS_CLIENT_SECRET` to version control. Always use environment variables or secure credential management systems (e.g., Databricks Secrets).

### Setting Environment Variables and Secrets on Databricks

Expand All @@ -126,33 +133,26 @@ When deploying and running the Streamlit app on Databricks, you can configure th
# MY_CUSTOM_VAR: "value"
```

2. **Databricks Widgets (for `DATABRICKS_HOST`, `DATABRICKS_TOKEN`, `DATABRICKS_JOB_ID`)**:
When you launch a Databricks App, you can pass parameters as widgets. The Streamlit app is configured to read `databricks_host`, `databricks_token`, and `databricks_job_id` from these widgets if they are present.
2. **Databricks App Configuration**:
When deploying to Databricks as an app, authentication is handled automatically using the Databricks runtime's built-in authentication. No explicit credentials (client ID/secret) are needed when running on Databricks.

To set widgets when launching the app:
* Go to your Databricks workspace.
* Navigate to "Apps" (or the equivalent section where deployed apps are listed).
* Select your deployed app (e.g., `databricks-job-executor-streamlit`).
* Click "Launch" or "Run App".
* In the launch dialog, you may find options to set parameters. If not directly available, you might need to configure them in the `databricks.yml` or rely on secrets.
* `databricks_host`: `https://your-workspace.cloud.databricks.com`
* `databricks_token`: `dapixxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx` (your personal access token)
* `databricks_job_id`: `123456` (the ID of the job you want to execute)
For local development configuration, you can optionally use Databricks Widgets to pass `databricks_host`, `databricks_client_id`, `databricks_client_secret`, and `databricks_job_id` if needed.

3. **Databricks Secrets (for `DATABRICKS_TOKEN`)**:
For enhanced security, it is recommended to store your `DATABRICKS_TOKEN` in Databricks Secrets. The application will attempt to retrieve the token from a secret scope if it's not provided via environment variables or widgets.
3. **Databricks Secrets (for Local Development)**:
For enhanced security during local development, you can store your OAuth credentials in Databricks Secrets and retrieve them programmatically.

To set up Databricks Secrets:
* **Create a Secret Scope**:
```bash
databricks secrets create-scope --scope databricks-token-scope
databricks secrets create-scope --scope oauth-credentials
```
(You might need to configure ACLs for this scope to allow users/groups to read it.)
* **Put the Secret**:
* **Put the Secrets**:
```bash
databricks secrets put --scope databricks-token-scope --key databricks-token-key
databricks secrets put --scope oauth-credentials --key client-id
databricks secrets put --scope oauth-credentials --key client-secret
```
When prompted, paste your Databricks personal access token.
When prompted, enter your service principal credentials.

The application will then automatically attempt to retrieve the token using `dbutils.secrets.get("databricks-token-scope", "databricks-token-key")` when running in the Databricks environment.
**Note**: When running on Databricks as an app, the runtime automatically handles authentication, so explicit credential storage is not required.

Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,8 @@ def configure_page(bundle_environment: str = 'dev'):
def initialize_config_state(db_env: dict):
"""Initialize configuration state from environment variables or Databricks environment."""
st.session_state.databricks_host = db_env.get('host', '')
st.session_state.databricks_token = db_env.get('token', '')
st.session_state.databricks_client_id = db_env.get('client_id', '')
st.session_state.databricks_client_secret = db_env.get('client_secret', '')
st.session_state.bundle_environment = db_env.get('bundle_environment', 'dev')

job_id_str = os.getenv('DATABRICKS_JOB_ID') # Still allow .env override
Expand Down
188 changes: 126 additions & 62 deletions databricks_job_executor/streamlit_app/components/ui/renders.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,59 +7,107 @@
from streamlit_app.utils.databricks_env import validate_connection


def _get_session_config():
"""Extract connection configuration from session state."""
return {
'host': st.session_state.get('databricks_host', ''),
'client_id': st.session_state.get('databricks_client_id', ''),
'client_secret': st.session_state.get('databricks_client_secret', ''),
'is_runtime': st.session_state.get('databricks_env', {}).get('is_databricks_runtime', False),
'job_id': st.session_state.get('databricks_job_id'),
}


def _render_connection_status_runtime(job_id):
"""Render connection status for Databricks runtime environment."""
is_valid, error_msg = validate_connection()
if is_valid:
st.success("✅ Connected to Databricks")
st.info("**Environment:** Databricks Runtime")
if job_id:
st.info(f"**Job ID:**\n`{job_id}`")
else:
st.warning("⚠️ No Job ID configured")
else:
st.error("❌ Connection Failed")
st.error(f"**Error:** {error_msg}")


def _render_connection_status_local(host, client_id, client_secret, job_id):
"""Render connection status for local development environment."""
if host and client_id and client_secret:
is_valid, error_msg = validate_connection(host, client_id, client_secret)
if is_valid:
st.success("✅ Connected to Databricks")
st.info(f"**Workspace:**\n{host}")
if job_id:
st.info(f"**Job ID:**\n`{job_id}`")
else:
st.warning("⚠️ No Job ID configured")
else:
st.error("❌ Connection Failed")
st.error(f"**Error:** {error_msg}")
else:
st.warning("⚠️ Configuration Missing")
missing = []
if not host:
missing.append("`DATABRICKS_HOST`")
if not client_id:
missing.append("`DATABRICKS_CLIENT_ID`")
if not client_secret:
missing.append("`DATABRICKS_CLIENT_SECRET`")
if not job_id:
missing.append("`DATABRICKS_JOB_ID`")
st.markdown("Please set the following environment variables:\n- " + "\n- ".join(missing))


def _render_about_section(is_runtime):
"""Render the About section in the sidebar."""
st.markdown("### ℹ️ About")
st.markdown("""
**Data Migration Accelerator**

This tool helps you:
- Execute the configured migration job
- Monitor job runs and progress
- View job logs and diagnostics
- Cancel running jobs if needed
""")

if is_runtime:
st.markdown("""
**Deployed in Databricks Runtime**
- Authentication: Automatic
- Configure `DATABRICKS_JOB_ID` to set default job
""")
else:
st.markdown("""
**Local Development Configuration:**
- `DATABRICKS_HOST` - Workspace URL
- `DATABRICKS_CLIENT_ID` - Service principal client ID
- `DATABRICKS_CLIENT_SECRET` - Service principal client secret
- `DATABRICKS_JOB_ID` - Job ID to run
""")


def render_sidebar():
"""Render the sidebar with connection status."""
config = _get_session_config()

with st.sidebar:
st.markdown("## ⚙️ Configuration")

st.markdown("### Connection Status")

host = st.session_state.get('databricks_host', '')
token = st.session_state.get('databricks_token', '')

job_id = st.session_state.get('databricks_job_id')

if host and token:
is_valid, error_msg = validate_connection(host, token)
if is_valid:
st.success("✅ Connected to Databricks")
st.info(f"**Workspace:**\n{host}")
if job_id:
st.info(f"**Job ID:**\n`{job_id}`")
else:
st.warning("⚠️ No Job ID configured")
else:
st.error("❌ Connection Failed")
st.error(f"**Error:** {error_msg}")
if config['is_runtime']:
_render_connection_status_runtime(config['job_id'])
else:
st.warning("⚠️ Configuration Missing")
missing = []
if not host:
missing.append("`DATABRICKS_HOST`")
if not token:
missing.append("`DATABRICKS_TOKEN`")
if not job_id:
missing.append("`DATABRICKS_JOB_ID`")
st.markdown(f"Please set the following environment variables:\n- " + "\n- ".join(missing))
_render_connection_status_local(
config['host'], config['client_id'],
config['client_secret'], config['job_id']
)

st.divider()

st.markdown("### ℹ️ About")
st.markdown("""
**Data Migration Accelerator**

This tool helps you:
- Execute the configured migration job
- Monitor job runs and progress
- View job logs and diagnostics
- Cancel running jobs if needed

**Configuration:**
Set via environment variables:
- `DATABRICKS_HOST` - Workspace URL
- `DATABRICKS_TOKEN` - Access token
- `DATABRICKS_JOB_ID` - Job ID to run
""")
_render_about_section(config['is_runtime'])


def render_header():
Expand All @@ -82,35 +130,52 @@ def render_header():
""", unsafe_allow_html=True)


def render_main_content():
"""Render the main content area of the application."""
render_sidebar()
render_header()

host = st.session_state.get('databricks_host', '')
token = st.session_state.get('databricks_token', '')
def _check_connection_and_render_errors(config) -> bool:
"""Check connection and render appropriate error messages. Returns True if connected."""
if config['is_runtime']:
is_valid, error_msg = validate_connection()
if not is_valid:
st.error("❌ **Connection Failed**")
st.error(f"Unable to connect to Databricks: {error_msg}")
return False
return True

if not host or not token:
if not all([config['host'], config['client_id'], config['client_secret']]):
st.error("⚠️ **Configuration Required**")
st.markdown("""
Please set the following environment variables before running the application:
Please set the following environment variables:

- `DATABRICKS_HOST` - Your Databricks workspace URL (e.g., `https://your-workspace.cloud.databricks.com`)
- `DATABRICKS_TOKEN` - Your Databricks personal access token
- `DATABRICKS_HOST` - Your Databricks workspace URL
- `DATABRICKS_CLIENT_ID` - Your service principal client ID
- `DATABRICKS_CLIENT_SECRET` - Your service principal client secret

You can set these in your environment or in a `.env` file.
""")
return
return False

is_valid, error_msg = validate_connection(host, token)
is_valid, error_msg = validate_connection(
config['host'], config['client_id'], config['client_secret']
)
if not is_valid:
st.error(f"❌ **Connection Failed**")
st.error("❌ **Connection Failed**")
st.error(f"Unable to connect to Databricks: {error_msg}")
st.info("Please check your `DATABRICKS_HOST` and `DATABRICKS_TOKEN` environment variables.")
st.info("Please verify your environment variables are correct.")
return False

return True


def render_main_content():
"""Render the main content area of the application."""
render_sidebar()
render_header()

config = _get_session_config()

if not _check_connection_and_render_errors(config):
return

job_interface = JobInterface()
job_interface.render()
JobInterface().render()


def render_footer():
Expand All @@ -122,4 +187,3 @@ def render_footer():
"</div>",
unsafe_allow_html=True
)

Loading