Skip to content

Commit d8efd40

Browse files
authored
Merge pull request #34 from thisisqubika/DC-319
Switch from access token to client_id and client_secret implemented
2 parents 7f64952 + 4b40a77 commit d8efd40

9 files changed

Lines changed: 278 additions & 231 deletions

File tree

databricks_job_executor/README.md

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ A Streamlit application for executing and monitoring Databricks migration jobs.
1616
- Python 3.8+
1717
- Streamlit
1818
- Databricks workspace access
19-
- Databricks personal access token
19+
- Databricks service principal with OAuth M2M credentials (client ID and client secret)
2020

2121
### Installation
2222

@@ -28,14 +28,16 @@ pip install -r requirements.txt
2828
2. Set environment variables:
2929
```bash
3030
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
31-
export DATABRICKS_TOKEN="your-personal-access-token"
31+
export DATABRICKS_CLIENT_ID="your-client-id"
32+
export DATABRICKS_CLIENT_SECRET="your-client-secret"
3233
export DATABRICKS_JOB_ID="123456" # Optional: specific job ID to run
3334
```
3435

3536
Or create a `.env` file:
3637
```
3738
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
38-
DATABRICKS_TOKEN=your-personal-access-token
39+
DATABRICKS_CLIENT_ID=your-client-id
40+
DATABRICKS_CLIENT_SECRET=your-client-secret
3941
DATABRICKS_JOB_ID=123456
4042
```
4143

@@ -86,10 +88,15 @@ This application can be deployed to Databricks using Databricks Asset Bundles.
8688

8789
The application requires the following environment variables:
8890

89-
- **DATABRICKS_HOST** (required): Your Databricks workspace URL (e.g., `https://your-workspace.cloud.databricks.com`)
90-
- **DATABRICKS_TOKEN** (required): Your Databricks personal access token
91+
- **DATABRICKS_HOST** (required for local): Your Databricks workspace URL (e.g., `https://your-workspace.cloud.databricks.com`)
92+
- **DATABRICKS_CLIENT_ID** (required for local): Your service principal client ID
93+
- **DATABRICKS_CLIENT_SECRET** (required for local): Your service principal client secret
9194
- **DATABRICKS_JOB_ID** (required): The specific job ID to run
9295

96+
**Authentication Methods:**
97+
- **Local Development**: Uses OAuth M2M (service principal) with `DATABRICKS_CLIENT_ID` and `DATABRICKS_CLIENT_SECRET`
98+
- **Databricks Runtime**: Automatically uses built-in authentication (no credentials needed)
99+
93100
These credentials are read from environment variables at startup. The connection status is displayed in the sidebar.
94101

95102
## Usage
@@ -104,7 +111,7 @@ These credentials are read from environment variables at startup. The connection
104111

105112
## Security Note
106113

107-
Never commit your `DATABRICKS_TOKEN` to version control. Always use environment variables or secure credential management systems.
114+
Never commit your `DATABRICKS_CLIENT_SECRET` to version control. Always use environment variables or secure credential management systems (e.g., Databricks Secrets).
108115

109116
### Setting Environment Variables and Secrets on Databricks
110117

@@ -126,33 +133,26 @@ When deploying and running the Streamlit app on Databricks, you can configure th
126133
# MY_CUSTOM_VAR: "value"
127134
```
128135

129-
2. **Databricks Widgets (for `DATABRICKS_HOST`, `DATABRICKS_TOKEN`, `DATABRICKS_JOB_ID`)**:
130-
When you launch a Databricks App, you can pass parameters as widgets. The Streamlit app is configured to read `databricks_host`, `databricks_token`, and `databricks_job_id` from these widgets if they are present.
136+
2. **Databricks App Configuration**:
137+
When deploying to Databricks as an app, authentication is handled automatically using the Databricks runtime's built-in authentication. No explicit credentials (client ID/secret) are needed when running on Databricks.
131138
132-
To set widgets when launching the app:
133-
* Go to your Databricks workspace.
134-
* Navigate to "Apps" (or the equivalent section where deployed apps are listed).
135-
* Select your deployed app (e.g., `databricks-job-executor-streamlit`).
136-
* Click "Launch" or "Run App".
137-
* In the launch dialog, you may find options to set parameters. If not directly available, you might need to configure them in the `databricks.yml` or rely on secrets.
138-
* `databricks_host`: `https://your-workspace.cloud.databricks.com`
139-
* `databricks_token`: `dapixxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx` (your personal access token)
140-
* `databricks_job_id`: `123456` (the ID of the job you want to execute)
139+
For local development configuration, you can optionally use Databricks Widgets to pass `databricks_host`, `databricks_client_id`, `databricks_client_secret`, and `databricks_job_id` if needed.
141140
142-
3. **Databricks Secrets (for `DATABRICKS_TOKEN`)**:
143-
For enhanced security, it is recommended to store your `DATABRICKS_TOKEN` in Databricks Secrets. The application will attempt to retrieve the token from a secret scope if it's not provided via environment variables or widgets.
141+
3. **Databricks Secrets (for Local Development)**:
142+
For enhanced security during local development, you can store your OAuth credentials in Databricks Secrets and retrieve them programmatically.
144143
145144
To set up Databricks Secrets:
146145
* **Create a Secret Scope**:
147146
```bash
148-
databricks secrets create-scope --scope databricks-token-scope
147+
databricks secrets create-scope --scope oauth-credentials
149148
```
150149
(You might need to configure ACLs for this scope to allow users/groups to read it.)
151-
* **Put the Secret**:
150+
* **Put the Secrets**:
152151
```bash
153-
databricks secrets put --scope databricks-token-scope --key databricks-token-key
152+
databricks secrets put --scope oauth-credentials --key client-id
153+
databricks secrets put --scope oauth-credentials --key client-secret
154154
```
155-
When prompted, paste your Databricks personal access token.
155+
When prompted, enter your service principal credentials.
156156
157-
The application will then automatically attempt to retrieve the token using `dbutils.secrets.get("databricks-token-scope", "databricks-token-key")` when running in the Databricks environment.
157+
**Note**: When running on Databricks as an app, the runtime automatically handles authentication, so explicit credential storage is not required.
158158

databricks_job_executor/streamlit_app/components/ui/initializers.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,8 @@ def configure_page(bundle_environment: str = 'dev'):
2424
def initialize_config_state(db_env: dict):
2525
"""Initialize configuration state from environment variables or Databricks environment."""
2626
st.session_state.databricks_host = db_env.get('host', '')
27-
st.session_state.databricks_token = db_env.get('token', '')
27+
st.session_state.databricks_client_id = db_env.get('client_id', '')
28+
st.session_state.databricks_client_secret = db_env.get('client_secret', '')
2829
st.session_state.bundle_environment = db_env.get('bundle_environment', 'dev')
2930

3031
job_id_str = os.getenv('DATABRICKS_JOB_ID') # Still allow .env override

databricks_job_executor/streamlit_app/components/ui/renders.py

Lines changed: 126 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -7,59 +7,107 @@
77
from streamlit_app.utils.databricks_env import validate_connection
88

99

10+
def _get_session_config():
11+
"""Extract connection configuration from session state."""
12+
return {
13+
'host': st.session_state.get('databricks_host', ''),
14+
'client_id': st.session_state.get('databricks_client_id', ''),
15+
'client_secret': st.session_state.get('databricks_client_secret', ''),
16+
'is_runtime': st.session_state.get('databricks_env', {}).get('is_databricks_runtime', False),
17+
'job_id': st.session_state.get('databricks_job_id'),
18+
}
19+
20+
21+
def _render_connection_status_runtime(job_id):
22+
"""Render connection status for Databricks runtime environment."""
23+
is_valid, error_msg = validate_connection()
24+
if is_valid:
25+
st.success("✅ Connected to Databricks")
26+
st.info("**Environment:** Databricks Runtime")
27+
if job_id:
28+
st.info(f"**Job ID:**\n`{job_id}`")
29+
else:
30+
st.warning("⚠️ No Job ID configured")
31+
else:
32+
st.error("❌ Connection Failed")
33+
st.error(f"**Error:** {error_msg}")
34+
35+
36+
def _render_connection_status_local(host, client_id, client_secret, job_id):
37+
"""Render connection status for local development environment."""
38+
if host and client_id and client_secret:
39+
is_valid, error_msg = validate_connection(host, client_id, client_secret)
40+
if is_valid:
41+
st.success("✅ Connected to Databricks")
42+
st.info(f"**Workspace:**\n{host}")
43+
if job_id:
44+
st.info(f"**Job ID:**\n`{job_id}`")
45+
else:
46+
st.warning("⚠️ No Job ID configured")
47+
else:
48+
st.error("❌ Connection Failed")
49+
st.error(f"**Error:** {error_msg}")
50+
else:
51+
st.warning("⚠️ Configuration Missing")
52+
missing = []
53+
if not host:
54+
missing.append("`DATABRICKS_HOST`")
55+
if not client_id:
56+
missing.append("`DATABRICKS_CLIENT_ID`")
57+
if not client_secret:
58+
missing.append("`DATABRICKS_CLIENT_SECRET`")
59+
if not job_id:
60+
missing.append("`DATABRICKS_JOB_ID`")
61+
st.markdown("Please set the following environment variables:\n- " + "\n- ".join(missing))
62+
63+
64+
def _render_about_section(is_runtime):
65+
"""Render the About section in the sidebar."""
66+
st.markdown("### ℹ️ About")
67+
st.markdown("""
68+
**Data Migration Accelerator**
69+
70+
This tool helps you:
71+
- Execute the configured migration job
72+
- Monitor job runs and progress
73+
- View job logs and diagnostics
74+
- Cancel running jobs if needed
75+
""")
76+
77+
if is_runtime:
78+
st.markdown("""
79+
**Deployed in Databricks Runtime**
80+
- Authentication: Automatic
81+
- Configure `DATABRICKS_JOB_ID` to set default job
82+
""")
83+
else:
84+
st.markdown("""
85+
**Local Development Configuration:**
86+
- `DATABRICKS_HOST` - Workspace URL
87+
- `DATABRICKS_CLIENT_ID` - Service principal client ID
88+
- `DATABRICKS_CLIENT_SECRET` - Service principal client secret
89+
- `DATABRICKS_JOB_ID` - Job ID to run
90+
""")
91+
92+
1093
def render_sidebar():
1194
"""Render the sidebar with connection status."""
95+
config = _get_session_config()
96+
1297
with st.sidebar:
1398
st.markdown("## ⚙️ Configuration")
14-
1599
st.markdown("### Connection Status")
16100

17-
host = st.session_state.get('databricks_host', '')
18-
token = st.session_state.get('databricks_token', '')
19-
20-
job_id = st.session_state.get('databricks_job_id')
21-
22-
if host and token:
23-
is_valid, error_msg = validate_connection(host, token)
24-
if is_valid:
25-
st.success("✅ Connected to Databricks")
26-
st.info(f"**Workspace:**\n{host}")
27-
if job_id:
28-
st.info(f"**Job ID:**\n`{job_id}`")
29-
else:
30-
st.warning("⚠️ No Job ID configured")
31-
else:
32-
st.error("❌ Connection Failed")
33-
st.error(f"**Error:** {error_msg}")
101+
if config['is_runtime']:
102+
_render_connection_status_runtime(config['job_id'])
34103
else:
35-
st.warning("⚠️ Configuration Missing")
36-
missing = []
37-
if not host:
38-
missing.append("`DATABRICKS_HOST`")
39-
if not token:
40-
missing.append("`DATABRICKS_TOKEN`")
41-
if not job_id:
42-
missing.append("`DATABRICKS_JOB_ID`")
43-
st.markdown(f"Please set the following environment variables:\n- " + "\n- ".join(missing))
104+
_render_connection_status_local(
105+
config['host'], config['client_id'],
106+
config['client_secret'], config['job_id']
107+
)
44108

45109
st.divider()
46-
47-
st.markdown("### ℹ️ About")
48-
st.markdown("""
49-
**Data Migration Accelerator**
50-
51-
This tool helps you:
52-
- Execute the configured migration job
53-
- Monitor job runs and progress
54-
- View job logs and diagnostics
55-
- Cancel running jobs if needed
56-
57-
**Configuration:**
58-
Set via environment variables:
59-
- `DATABRICKS_HOST` - Workspace URL
60-
- `DATABRICKS_TOKEN` - Access token
61-
- `DATABRICKS_JOB_ID` - Job ID to run
62-
""")
110+
_render_about_section(config['is_runtime'])
63111

64112

65113
def render_header():
@@ -82,35 +130,52 @@ def render_header():
82130
""", unsafe_allow_html=True)
83131

84132

85-
def render_main_content():
86-
"""Render the main content area of the application."""
87-
render_sidebar()
88-
render_header()
89-
90-
host = st.session_state.get('databricks_host', '')
91-
token = st.session_state.get('databricks_token', '')
133+
def _check_connection_and_render_errors(config) -> bool:
134+
"""Check connection and render appropriate error messages. Returns True if connected."""
135+
if config['is_runtime']:
136+
is_valid, error_msg = validate_connection()
137+
if not is_valid:
138+
st.error("❌ **Connection Failed**")
139+
st.error(f"Unable to connect to Databricks: {error_msg}")
140+
return False
141+
return True
92142

93-
if not host or not token:
143+
if not all([config['host'], config['client_id'], config['client_secret']]):
94144
st.error("⚠️ **Configuration Required**")
95145
st.markdown("""
96-
Please set the following environment variables before running the application:
146+
Please set the following environment variables:
97147
98-
- `DATABRICKS_HOST` - Your Databricks workspace URL (e.g., `https://your-workspace.cloud.databricks.com`)
99-
- `DATABRICKS_TOKEN` - Your Databricks personal access token
148+
- `DATABRICKS_HOST` - Your Databricks workspace URL
149+
- `DATABRICKS_CLIENT_ID` - Your service principal client ID
150+
- `DATABRICKS_CLIENT_SECRET` - Your service principal client secret
100151
101152
You can set these in your environment or in a `.env` file.
102153
""")
103-
return
154+
return False
104155

105-
is_valid, error_msg = validate_connection(host, token)
156+
is_valid, error_msg = validate_connection(
157+
config['host'], config['client_id'], config['client_secret']
158+
)
106159
if not is_valid:
107-
st.error(f"❌ **Connection Failed**")
160+
st.error("❌ **Connection Failed**")
108161
st.error(f"Unable to connect to Databricks: {error_msg}")
109-
st.info("Please check your `DATABRICKS_HOST` and `DATABRICKS_TOKEN` environment variables.")
162+
st.info("Please verify your environment variables are correct.")
163+
return False
164+
165+
return True
166+
167+
168+
def render_main_content():
169+
"""Render the main content area of the application."""
170+
render_sidebar()
171+
render_header()
172+
173+
config = _get_session_config()
174+
175+
if not _check_connection_and_render_errors(config):
110176
return
111177

112-
job_interface = JobInterface()
113-
job_interface.render()
178+
JobInterface().render()
114179

115180

116181
def render_footer():
@@ -122,4 +187,3 @@ def render_footer():
122187
"</div>",
123188
unsafe_allow_html=True
124189
)
125-

0 commit comments

Comments
 (0)