diff --git a/features/src/llm-context/skills/DASHBOARD_BUILDER.md b/features/src/llm-context/skills/DASHBOARD_BUILDER.md index 9c5f19f1..fe0cd9ec 100644 --- a/features/src/llm-context/skills/DASHBOARD_BUILDER.md +++ b/features/src/llm-context/skills/DASHBOARD_BUILDER.md @@ -2,7 +2,7 @@ **Build interactive web apps, dashboards, and visualizations that run on a port in Workbench.** -> **Triggers:** +> **Triggers:** > - "Create a dashboard", "visualize data", "build charts" > - "Run a Flask/Streamlit/FastAPI app" > - "Display data in the browser", "interactive UI" @@ -10,7 +10,7 @@ --- -## 🌐 Workbench Proxy & Web Apps Best Practices +## Workbench Proxy & Web Apps Best Practices ### Proxy URL Format @@ -19,106 +19,65 @@ All web apps in Workbench are accessed via: https://workbench.verily.com/app/[APP_UUID]/proxy/[PORT]/[PATH] ``` -### ⚠️ How to Get the App UUID (CRITICAL) +### How to Get the App UUID (CRITICAL) **You MUST automatically get the app UUID - NEVER ask the user for it.** +**Option A (preferred):** Use MCP tools — no shell needed, no permission prompt: +``` +mcp__wb__app_get_url(appId=) +``` +To find the app resource name, use `mcp__wb__workspace_list_resources` and filter for +EC2 instances (`AWS_EC2_INSTANCE`) or GCE instances owned by the current user. + +**Option B:** Use the CLI: ```bash -# Run this command and use the output: wb app list --format=json | jq -r '.[] | select(.status == "RUNNING") | .id' | head -1 ``` +Note: this can intermittently return 401 on some workspaces. If it fails, fall back to Option A. -**⚡ LLM INSTRUCTION:** When constructing dashboard/proxy URLs: -1. First run the command above to get the running app UUID +**LLM INSTRUCTION:** When constructing dashboard/proxy URLs: +1. Use one of the methods above to get the running app UUID 2. Use that actual UUID in the URL you provide 3. Do NOT use placeholders like `[APP_UUID]` in your final response 4. Do NOT ask the user to find/replace the UUID themselves -### ✅ Correct URL Examples +### Correct URL Examples ``` https://workbench.verily.com/app/abc123-def456-789/proxy/8080/ https://workbench.verily.com/app/abc123-def456-789/proxy/8501/index.html -https://workbench.verily.com/app/abc123-def456-789/proxy/8000/dashboard.html ``` -### ❌ WRONG URL Formats (These WILL fail) +### WRONG URL Formats (These WILL fail) ``` -https://abc123-def456.workbench-app.verily.com/ ← WRONG: "Bad Request" error -https://workbench-app.verily.com/abc123-def456/ ← WRONG: Invalid domain -http://localhost:8080/ ← WRONG: Not accessible externally -https://abc123-def456/workbench.verily.com/ ← WRONG: Reversed format -file:///home/jupyter/dashboard.html ← WRONG: JavaScript blocked +https://abc123-def456.workbench-app.verily.com/ <- WRONG: "Bad Request" error +http://localhost:8080/ <- WRONG: Not accessible externally +file:///home/jupyter/dashboard.html <- WRONG: JavaScript blocked ``` -### ⚠️ Common Issue: JavaScript API Calls Failing +### Common Issue: JavaScript API Calls Failing -**Problem:** JavaScript using absolute paths fails through Workbench proxy +**Problem:** JavaScript using absolute paths fails through Workbench proxy. -**Symptoms:** -- Dashboard loads but shows no data -- Charts remain empty with "-" placeholders -- Browser console shows 404 errors for API calls -- Flask/server logs show requests for `/` but NOT `/api/*` endpoints - -### ✅ Solution: Use Relative Paths (TESTED & CONFIRMED) - -**Always use relative paths (no leading `/`) for fetch/AJAX calls:** +Note: This rule applies to **JavaScript `fetch()` calls only**. Flask/FastAPI route +decorators still require a leading slash (e.g., `@app.route('/api/data')`). ```javascript -// ✅ CORRECT - relative paths work through proxy +// CORRECT - relative paths work through proxy fetch('api/metadata') fetch('api/data?filter=value') -// ❌ WRONG - absolute paths fail -fetch('/api/metadata') +// WRONG - absolute paths fail through proxy +fetch('/api/metadata') fetch('/api/data?filter=value') ``` -### Why Absolute Paths Fail - +**Why:** ``` -User visits: https://workbench.verily.com/app/UUID/proxy/8080/ - -Absolute path: fetch('/api/data') - → Browser resolves to: https://workbench.verily.com/api/data ❌ (404!) - -Relative path: fetch('api/data') - → Browser resolves to: https://workbench.verily.com/app/UUID/proxy/8080/api/data ✅ -``` - -### Alternative: Embed Data in HTML (For Static Dashboards) - -If you don't need dynamic filtering, embed data directly in the template: - -**Python (Flask):** -```python -@app.route('/') -def index(): - data = get_data_from_bigquery() - return render_template('dashboard.html', data_json=json.dumps(data)) -``` - -**HTML Template:** -```html - +Absolute: fetch('/api/data') -> https://workbench.verily.com/api/data (404) +Relative: fetch('api/data') -> https://workbench.verily.com/app/UUID/proxy/8080/api/data (OK) ``` -**When to use:** Static dashboards, large datasets that don't change, or when filters can be client-side only. - -### Testing Checklist - -Before deploying any web app: - -- [ ] **Relative paths** - All `fetch()` calls use `'api/...'` not `'/api/...'` -- [ ] **Test locally** - `curl http://localhost:PORT/api/endpoint` returns data -- [ ] **Server logs** - Verify API requests arrive: `tail -f server.log` -- [ ] **Browser DevTools** - Network tab shows 200 status for API calls -- [ ] **App UUID obtained** - Not using placeholder `[APP_UUID]` - --- ## Workflow @@ -126,41 +85,37 @@ Before deploying any web app: ### Step 1: Understand Requirements Ask the user: -1. **Data source?** BigQuery table, CSV in bucket, or local file? +1. **Data source?** Aurora database, S3 file (CSV, Parquet), BigQuery, or local file? 2. **Visualizations?** Charts (bar, line, scatter), tables, filters? 3. **Interactivity?** Static display or dynamic filtering? ### Step 2: Auto-Detect Environment -**Always run these commands first:** +Get the app UUID using MCP tools (see "How to Get the App UUID" above). +**Prefer MCP tools over `wb app list`** to avoid permission prompts. -```bash -# Get app UUID (REQUIRED for final URL) -APP_UUID=$(wb app list --format=json | jq -r '.[] | select(.status == "RUNNING") | .id' | head -1) -echo "App UUID: $APP_UUID" +### Step 3: Check Dependencies -# Verify Python -python3 --version - -# Check working directory -pwd -``` - -### Step 3: Install Dependencies +The following packages are **pre-installed** in the Workbench Jupyter+LLM image: +`fastapi`, `uvicorn`, `flask`, `flask-cors`, `plotly`, `pandas`, `boto3`, `psycopg2-binary` +**Do NOT run `pip install` unless a specific import fails.** To verify: ```bash -pip install flask flask-cors pandas plotly google-cloud-bigquery db-dtypes +python3 -c "import flask; import fastapi; import plotly; print('OK')" ``` +Only install if the check above fails. -> **Note:** `db-dtypes` is required for BigQuery to properly convert data types for pandas. +> **Note (GCP/BigQuery):** If using BigQuery with pandas, also install `db-dtypes` — it is +> required for proper data type conversion and causes cryptic errors if missing: +> `pip install --no-cache-dir db-dtypes` ### Step 4: Create Dashboard Structure ``` dashboard/ -├── app.py # Flask server +├── app.py # Flask/FastAPI server ├── templates/ -│ └── index.html # Dashboard HTML +│ └── index.html # Dashboard HTML with Plotly.js └── static/ └── style.css # Optional styling ``` @@ -169,32 +124,117 @@ dashboard/ ## Working Templates -### Template 1: Simple BigQuery Dashboard +### Template 1: Aurora PostgreSQL Dashboard (AWS) + +Aurora in Workbench uses **IAM database authentication** — you cannot connect with a +static password. The correct flow is: + +1. Get temporary AWS credentials via `wb resource credentials` +2. Generate an IAM auth token via boto3 (token is valid for 15 minutes) +3. Connect with `sslmode='require'` — **SSL is mandatory** + +**Preferred: Use MCP tools for data queries** to avoid the IAM auth complexity entirely: +``` +mcp__wb__aurora_query(resourceName="my-db", query="SELECT * FROM table LIMIT 100") +mcp__wb__aurora_list_tables(resourceName="my-db") +mcp__wb__aurora_describe_table(resourceName="my-db", tableName="my_table") +``` + +Query via MCP, embed results in the template, and serve with Flask/FastAPI. +This avoids IAM auth in the app code entirely. + +**If live database queries are needed in the app:** + +```python +import json, subprocess, boto3, psycopg2, os + +def get_aurora_connection(resource_id, username): + result = subprocess.run( + ['wb', 'resource', 'credentials', + f'--id={resource_id}', '--scope=READ_ONLY', '--format=json'], + capture_output=True, text=True, check=True + ) + creds = json.loads(result.stdout) + + conn_str = os.environ.get(f'WORKBENCH_{resource_id.replace("-", "_")}', '') + host_part, _, dbname = conn_str.partition('/') + host, _, port = host_part.partition(':') + port = int(port) if port else 5432 + + session = boto3.Session( + aws_access_key_id=creds['AccessKeyId'], + aws_secret_access_key=creds['SecretAccessKey'], + aws_session_token=creds['SessionToken'], + region_name='us-west-2' + ) + token = session.client('rds').generate_db_auth_token( + DBHostname=host, Port=port, DBUsername=username, Region='us-west-2' + ) + return psycopg2.connect( + host=host, port=port, database=dbname, + user=username, password=token, + sslmode='require' + ) +``` + +### Template 2: S3 Data Dashboard (AWS) -**app.py:** ```python from flask import Flask, render_template, jsonify from flask_cors import CORS -from google.cloud import bigquery +import pandas as pd +import boto3 import os app = Flask(__name__) CORS(app) -# Cache for data +_data_cache = None + +def get_data_from_s3(): + global _data_cache + if _data_cache is not None: + return _data_cache + bucket = os.environ.get('WORKBENCH_my_bucket', 'your-bucket-name') + s3 = boto3.client('s3') + obj = s3.get_object(Bucket=bucket, Key='path/to/data.csv') + df = pd.read_csv(obj['Body']) + _data_cache = df.to_dict(orient='records') + return _data_cache + +@app.route('/') +def index(): + return render_template('index.html') + +@app.route('/api/data') +def get_data(): + try: + return jsonify(get_data_from_s3()) + except Exception as e: + return jsonify({"error": str(e)}), 500 + +if __name__ == '__main__': + app.run(host='0.0.0.0', port=8080, debug=False, threaded=True) +``` + +### Template 3: BigQuery Dashboard (GCP) + +```python +from flask import Flask, render_template, jsonify, request +from flask_cors import CORS +from google.cloud import bigquery + +app = Flask(__name__) +CORS(app) + _data_cache = None def get_bigquery_data(): global _data_cache if _data_cache is not None: return _data_cache - client = bigquery.Client() - query = """ - SELECT * - FROM `YOUR_PROJECT.YOUR_DATASET.YOUR_TABLE` - LIMIT 1000 - """ + query = "SELECT * FROM `project.dataset.table` LIMIT 1000" df = client.query(query).to_dataframe() _data_cache = df.to_dict(orient='records') return _data_cache @@ -203,33 +243,58 @@ def get_bigquery_data(): def index(): return render_template('index.html') -@app.route('api/data') # NO leading slash! +@app.route('/api/data') def get_data(): try: data = get_bigquery_data() + column = request.args.get('filter_column') + value = request.args.get('filter_value') + if column and value: + data = [row for row in data if str(row.get(column, '')) == value] return jsonify(data) except Exception as e: return jsonify({"error": str(e)}), 500 -@app.route('api/metadata') +@app.route('/api/metadata') def get_metadata(): try: data = get_bigquery_data() if data: - return jsonify({ - "columns": list(data[0].keys()), - "row_count": len(data) - }) + return jsonify({"columns": list(data[0].keys()), "row_count": len(data)}) return jsonify({"columns": [], "row_count": 0}) except Exception as e: return jsonify({"error": str(e)}), 500 if __name__ == '__main__': - # CRITICAL: host='0.0.0.0' required for Workbench proxy access app.run(host='0.0.0.0', port=8080, debug=False, threaded=True) ``` -**templates/index.html:** +> **Note:** Requires `google-cloud-bigquery` and `db-dtypes`. Install with: +> `pip install --no-cache-dir google-cloud-bigquery db-dtypes` + +### Alternative: Embed Data in HTML (For Static Dashboards) + +Query data via MCP or Python, then embed directly in the template. No API calls needed. + +```python +import json +@app.route('/') +def index(): + data = get_data() + return render_template('dashboard.html', data_json=json.dumps(data)) +``` + +```html + +``` + +### Dashboard Frontend Template (index.html) + +Use this with any backend template above. All `fetch()` calls use **relative paths** (no leading `/`). + ```html @@ -247,7 +312,7 @@ if __name__ == '__main__':
-

📊 Data Dashboard

+

Data Dashboard

Dataset Info

Loading metadata...
@@ -263,12 +328,9 @@ if __name__ == '__main__':