|
| 1 | +--- |
| 2 | +hide: |
| 3 | + - navigation |
| 4 | +--- |
| 5 | + |
| 6 | +<div class="qs" markdown> |
| 7 | + |
| 8 | +# Quickstart |
| 9 | + |
| 10 | +## <span class="num">1</span> Install the AI Dev Kit <span class="dur">2 min</span> |
| 11 | + |
| 12 | +You need **uv**, the **Databricks CLI** (authenticated), and an AI coding assistant. |
| 13 | + |
| 14 | +=== "Mac / Linux" |
| 15 | + |
| 16 | + ```bash |
| 17 | + bash <(curl -sL https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.sh) |
| 18 | + ``` |
| 19 | + |
| 20 | +=== "Windows" |
| 21 | + |
| 22 | + ```powershell |
| 23 | + irm https://raw.githubusercontent.com/databricks-solutions/ai-dev-kit/main/install.ps1 | iex |
| 24 | + ``` |
| 25 | + |
| 26 | +Follow the interactive prompts, then open your AI assistant from the same directory. |
| 27 | + |
| 28 | +!!! success "What you'll see" |
| 29 | + Your project now has `.claude/skills/` (27 Databricks skills) and `.claude/mcp.json` (MCP server config). Your AI assistant can talk to Databricks. |
| 30 | + |
| 31 | +<div class="step-screenshot"> |
| 32 | +<img src="../assets/step1-install.png" alt="Install output"> |
| 33 | +</div> |
| 34 | + |
| 35 | +--- |
| 36 | + |
| 37 | +## <span class="num">2</span> Explore your data <span class="dur">5 min</span> |
| 38 | + |
| 39 | +Discover what's in your workspace. Open your AI assistant and paste: |
| 40 | + |
| 41 | +!!! example "Prompt" |
| 42 | + ``` |
| 43 | + What catalogs and schemas are available in my Databricks workspace? |
| 44 | + Show me the tables in each schema with their column names and row counts. |
| 45 | + ``` |
| 46 | + |
| 47 | +Now pick a table and profile it: |
| 48 | + |
| 49 | +!!! example "Prompt" |
| 50 | + ``` |
| 51 | + Profile the table `main.default.my_table`. Show total rows, null counts |
| 52 | + per column, value distributions for categoricals (top 10), and |
| 53 | + min/max/mean for numerics. Include 5 sample rows. |
| 54 | + ``` |
| 55 | + |
| 56 | +!!! tip |
| 57 | + Replace `main.default.my_table` with a real table. If your workspace is empty, the next step generates sample data. |
| 58 | + |
| 59 | +<div class="step-screenshot"> |
| 60 | +<img src="../assets/step2-explore.png" alt="Table exploration results"> |
| 61 | +</div> |
| 62 | + |
| 63 | +--- |
| 64 | + |
| 65 | +## <span class="num">3</span> Generate sample data <span class="dur">5 min</span> |
| 66 | + |
| 67 | +If you need data for the rest of this quickstart: |
| 68 | + |
| 69 | +!!! example "Prompt" |
| 70 | + ``` |
| 71 | + Generate a realistic e-commerce dataset in my workspace: |
| 72 | + - main.quickstart.customers — 10,000 rows (name, email, signup_date, segment) |
| 73 | + - main.quickstart.orders — 50,000 rows (order_id, customer_id, order_date, total_amount, status) |
| 74 | + - main.quickstart.products — 500 rows (product_id, name, category, price) |
| 75 | + |
| 76 | + Use realistic distributions, not uniform random. Make sure foreign keys are valid. |
| 77 | + ``` |
| 78 | + |
| 79 | +!!! success "What you'll see" |
| 80 | + Three tables created in Unity Catalog with realistic distributions, immediately queryable. |
| 81 | + |
| 82 | +Try asking a question: |
| 83 | + |
| 84 | +!!! example "Prompt" |
| 85 | + ``` |
| 86 | + What are the top 10 product categories by total revenue? |
| 87 | + Break it down by month for the last 6 months. |
| 88 | + ``` |
| 89 | + |
| 90 | +--- |
| 91 | + |
| 92 | +## <span class="num">4</span> Build a data pipeline <span class="dur">15 min</span> |
| 93 | + |
| 94 | +Create a production-ready Spark Declarative Pipeline with the medallion architecture. |
| 95 | + |
| 96 | +!!! example "Prompt" |
| 97 | + ``` |
| 98 | + Create a new Spark Declarative Pipeline using Databricks Asset Bundles: |
| 99 | + |
| 100 | + - Python (not SQL) |
| 101 | + - Medallion architecture: bronze → silver → gold |
| 102 | + - Serverless compute |
| 103 | + - Target: main.quickstart schema |
| 104 | + |
| 105 | + Bronze: ingest from orders and customers tables |
| 106 | + Silver: clean nulls, join orders with customers, add order_year/month columns |
| 107 | + Gold: materialized views for monthly_revenue (by month + segment) |
| 108 | + and customer_lifetime_value |
| 109 | + |
| 110 | + Initialize with `databricks pipelines init`, then deploy and run it. |
| 111 | + ``` |
| 112 | + |
| 113 | +!!! success "What you'll see" |
| 114 | + The assistant scaffolds a DAB project with bronze/silver/gold Python files, deploys it, triggers a run, and shows pipeline status as each table processes. |
| 115 | + |
| 116 | +!!! info "How skills help" |
| 117 | + The assistant loaded `databricks-spark-declarative-pipelines` and `databricks-bundles` skills, which taught it correct SDP patterns, serverless defaults, and Asset Bundle structure. Without these skills, it would guess — and often get it wrong. |
| 118 | + |
| 119 | +<div class="step-screenshot"> |
| 120 | +<img src="../assets/step4-pipeline.png" alt="Pipeline status"> |
| 121 | +</div> |
| 122 | + |
| 123 | +--- |
| 124 | + |
| 125 | +## <span class="num">5</span> Create a dashboard <span class="dur">10 min</span> |
| 126 | + |
| 127 | +Build an AI/BI dashboard from the gold tables your pipeline just created. |
| 128 | + |
| 129 | +!!! example "Prompt" |
| 130 | + ``` |
| 131 | + Create an AI/BI dashboard called "Quickstart: Sales Overview" using main.quickstart: |
| 132 | + |
| 133 | + 1. Counter: total revenue |
| 134 | + 2. Counter: total orders |
| 135 | + 3. Line chart: monthly revenue trend (last 12 months) |
| 136 | + 4. Bar chart: revenue by customer segment |
| 137 | + 5. Table: top 20 customers by lifetime value |
| 138 | + 6. Date range filter on all charts |
| 139 | + |
| 140 | + Test all SQL queries before deploying. |
| 141 | + ``` |
| 142 | + |
| 143 | +!!! success "What you'll see" |
| 144 | + The assistant follows the mandatory validation workflow: get schemas → write SQL → **test every query** → build dashboard JSON → deploy. Returns a URL to the live dashboard. |
| 145 | + |
| 146 | +Iterate by asking: |
| 147 | + |
| 148 | +!!! example "Prompt" |
| 149 | + ``` |
| 150 | + Update the dashboard: change the monthly chart to a stacked area by segment, |
| 151 | + and add a second page "Customers" with a scatter plot of order frequency |
| 152 | + vs average order value. |
| 153 | + ``` |
| 154 | + |
| 155 | +!!! info "Why validation matters" |
| 156 | + The `databricks-aibi-dashboards` skill enforces SQL testing before deployment. Without it, widgets show "Invalid widget definition" errors. Skills encode hard-won best practices. |
| 157 | + |
| 158 | +<div class="step-screenshot"> |
| 159 | +<img src="../assets/step5-dashboard.png" alt="Dashboard preview"> |
| 160 | +</div> |
| 161 | + |
| 162 | +--- |
| 163 | + |
| 164 | +## <span class="num">6</span> Deploy an AI agent <span class="dur">15 min</span> |
| 165 | + |
| 166 | +Create a Knowledge Assistant — a RAG-based agent that answers questions from documents. |
| 167 | + |
| 168 | +!!! example "Prompt" |
| 169 | + ``` |
| 170 | + Create a Knowledge Assistant called "Quickstart FAQ Bot": |
| 171 | + |
| 172 | + 1. Generate 20 sample FAQ documents (pricing, features, returns, shipping, support) |
| 173 | + 2. Upload to UC volume main.quickstart.volumes.faq_docs |
| 174 | + 3. Create a Vector Search endpoint and index for the documents |
| 175 | + 4. Create the Knowledge Assistant using Foundation Model APIs |
| 176 | + 5. Deploy to a serving endpoint |
| 177 | + 6. System prompt: "Answer questions based only on the FAQ documents. If unsure, say so." |
| 178 | + |
| 179 | + Test it with: "What is your return policy?" and "How much does enterprise cost?" |
| 180 | + ``` |
| 181 | + |
| 182 | +!!! success "What you'll see" |
| 183 | + The assistant creates the knowledge base, vector index, and agent, deploys it, then runs test queries showing responses with source attribution. |
| 184 | + |
| 185 | +<div class="step-screenshot"> |
| 186 | +<img src="../assets/step6-agent.png" alt="Agent test results"> |
| 187 | +</div> |
| 188 | + |
| 189 | +**Alternative** — for SQL-based data Q&A, try a Genie Space instead: |
| 190 | + |
| 191 | +!!! example "Prompt" |
| 192 | + ``` |
| 193 | + Create a Genie Space called "Sales Genie" that lets users ask natural |
| 194 | + language questions about the quickstart tables (orders, customers, products). |
| 195 | + Add sample questions and curation instructions. |
| 196 | + ``` |
| 197 | + |
| 198 | +--- |
| 199 | + |
| 200 | +## <span class="num">7</span> Build a full-stack app <span class="dur">10 min</span> |
| 201 | + |
| 202 | +Bring everything together in a Databricks App. |
| 203 | + |
| 204 | +!!! example "Prompt" |
| 205 | + ``` |
| 206 | + Create a Databricks App called "quickstart-explorer" with FastAPI + React (APX pattern): |
| 207 | + |
| 208 | + - Page 1 "Explorer": catalog/schema browser + SQL query editor with results table |
| 209 | + - Page 2 "Dashboard": line chart of monthly_revenue from the gold table, |
| 210 | + filterable by segment, auto-refreshes every 30s |
| 211 | + - Page 3 "Chat": chat interface connected to the FAQ Bot serving endpoint, |
| 212 | + with streaming responses and source document cards |
| 213 | + |
| 214 | + Set up app.yaml with SQL warehouse and serving endpoint resources, |
| 215 | + then deploy to Databricks Apps. |
| 216 | + ``` |
| 217 | + |
| 218 | +!!! success "What you'll see" |
| 219 | + The assistant scaffolds a complete project (FastAPI backend + React frontend), configures app.yaml with resource permissions, deploys it, and returns the live app URL. |
| 220 | + |
| 221 | +<div class="step-screenshot"> |
| 222 | +<img src="../assets/step7-app.png" alt="App preview"> |
| 223 | +</div> |
| 224 | + |
| 225 | +--- |
| 226 | + |
| 227 | +<div class="completion-banner" markdown> |
| 228 | + |
| 229 | +## You're done. |
| 230 | + |
| 231 | +**Data exploration** → **Pipeline** → **Dashboard** → **AI Agent** → **Full-stack App** — all through conversation. |
| 232 | + |
| 233 | +The AI Dev Kit has [27 skills](reference/skills.md) and [50+ MCP tools](reference/mcp-tools.md). Just ask your assistant to build something — skills activate automatically. |
| 234 | + |
| 235 | +!!! example "Ideas to try next" |
| 236 | + ``` |
| 237 | + Create a scheduled Databricks job that runs my pipeline every hour |
| 238 | + and sends a Slack notification on failure. |
| 239 | + ``` |
| 240 | + |
| 241 | + ``` |
| 242 | + Set up MLflow evaluation for my FAQ Bot. Create 10 test questions and |
| 243 | + measure correctness, retrieval relevance, and faithfulness. |
| 244 | + ``` |
| 245 | + |
| 246 | + ``` |
| 247 | + Add a Lakebase PostgreSQL database to my app for storing user preferences |
| 248 | + and query history. |
| 249 | + ``` |
| 250 | + |
| 251 | +</div> |
| 252 | + |
| 253 | +</div> |
0 commit comments