55 Agent Skills that teach your AI coding assistant to build fully governed Databricks data products — from a schema CSV to production AI agents.
Built on the open SKILL.md format — works with Cursor, Claude Code, Windsurf, Copilot, and Codex.
A comprehensive skills-based framework for building production-grade Databricks Lakehouse solutions. Drop a schema CSV, point your AI coding assistant at the skills, and go from raw tables to a fully governed data product with dimensional models, quality pipelines, semantic interfaces, observability, ML, and GenAI agents.
| Component | Description | Count |
|---|---|---|
| Agent Skills | Structured knowledge packages for AI-assisted Databricks development | 55 skills across 12 domains |
| AGENTS.md | Universal entry point with routing table and common skills index | 1 file |
| Context Files | Customer schema CSV (the starting input for the pipeline) | 1 file |
- Databricks workspace with Unity Catalog enabled
- AI coding assistant with file reference support (Cursor, Claude Code, Windsurf, Copilot, Codex, etc.)
- Databricks CLI v0.200+ installed and configured
- Git for version control
- Clone the repository
git clone https://github.com/databricks-solutions/vibe-coding-workshop-template.git
cd vibe-coding-workshop-template-
Place your schema CSV in the
data_product_accelerator/context/directory -
Open in your AI coding assistant and start with the first prompt:
I have a customer schema at @data_product_accelerator/context/Wanderbricks_Schema.csv.
Please design the Gold layer using @data_product_accelerator/skills/gold/00-gold-layer-design/SKILL.md
Note: Most IDEs support
@for file references. If yours doesn't, ask the agent to "read the file at data_product_accelerator/skills/gold/00-gold-layer-design/SKILL.md" instead.
-
Follow the QUICKSTART guide for all 9 stages
-
Explore the visual guides (optional):
- Interactive Skill Navigation — animated walkthrough
- Skill Hierarchy Tree — visual organization map
The framework uses a skills-first architecture: a single AGENTS.md entry point routes the AI assistant to 55 Agent Skills organized by domain. Each skill contains production-tested patterns, reference documentation, executable scripts, and starter templates. Skills use the open SKILL.md format — portable across any AI coding assistant.
One prompt per stage. One new agent conversation per stage.
data_product_accelerator/context/*.csv
→ Gold Design (1) — dimensional model, ERDs, YAML schemas
→ Bronze (2) — source tables + test data
→ Silver (3) — DLT pipelines + data quality
→ Gold Impl (4) — tables, merges, constraints
→ Planning (5) — phase plans + manifest contracts
→ Semantic (6) — Metric Views, TVFs, Genie Spaces
→ Observability (7) — monitors, dashboards, alerts
→ ML (8) — experiments, training, inference
→ GenAI Agents (9) — agents, evaluation, deployment
Design the target Gold dimensional model first (from the customer's schema CSV), then build the data layers (Bronze → Silver) to feed it. This is the opposite of the traditional bottom-up approach.
This module lives inside a multi-module repository. The framework (skills, docs, context) is in data_product_accelerator/, while generated artifacts are created at the repository root.
repo-root/
│
├── data_product_accelerator/ # Framework module
│ ├── AGENTS.md # Universal entry point (routing table + common skills)
│ ├── QUICKSTART.md # One-prompt-per-stage guide
│ ├── README.md # This file
│ │
│ ├── skills/ # 55 Agent Skills (open SKILL.md format)
│ │ ├── admin/ # Skill creation, auditing, docs (4)
│ │ ├── bronze/ # Bronze layer + Faker data (2)
│ │ ├── common/ # Cross-cutting shared skills (8)
│ │ ├── exploration/ # Ad-hoc notebooks (1)
│ │ ├── genai-agents/ # GenAI agent patterns (10)
│ │ ├── gold/ # Gold design + implementation (14)
│ │ ├── ml/ # MLflow pipelines (1)
│ │ ├── monitoring/ # Monitors, dashboards, alerts (5)
│ │ ├── planning/ # Project planning (1)
│ │ ├── semantic-layer/ # Metric Views, TVFs, Genie (5)
│ │ ├── silver/ # DLT pipelines, DQ rules (3)
│ │ └── skill-navigator/ # Master routing system (1)
│ │
│ ├── context/
│ │ └── Wanderbricks_Schema.csv # Customer schema input
│ │
│ └── docs/ # Framework documentation
│ └── framework-design/ # Complete design documentation
│
├── gold_layer_design/ # GENERATED: dimensional model YAML schemas and ERDs
├── src/ # GENERATED: notebooks and scripts (Bronze, Silver, Gold, etc.)
├── plans/ # GENERATED: phase plans and YAML manifest contracts
├── resources/ # GENERATED: DAB job and pipeline YAML
└── databricks.yml # GENERATED: Asset Bundle root configuration
Note: Generated artifact directories (
gold_layer_design/,src/,plans/,resources/,databricks.yml) are created by skills during pipeline execution. They do not exist until you run the first stages. Other modules may coexist at the repository root.
Skills follow an orchestrator/worker pattern: orchestrators (prefixed 00-) manage end-to-end workflows, workers (prefixed 01-, 02-, etc.) handle specific patterns.
| Domain | Orchestrator | Workers | Focus |
|---|---|---|---|
| Gold (Design) | 00-gold-layer-design |
7 | ERDs, YAML schemas, dimensional modeling |
| Bronze | 00-bronze-layer-setup |
1 | Table DDLs, Faker data, source copy |
| Silver | 00-silver-layer-setup |
2 | DLT expectations, DQX diagnostics |
| Gold (Impl) | 01-gold-layer-setup |
5 | MERGE scripts, FK constraints |
| Planning | 00-project-planning |
0 | Phase plans, YAML manifest contracts |
| Semantic | 00-semantic-layer-setup |
4 | Metric Views, TVFs, Genie Spaces, export/import API |
| Monitoring | 00-observability-setup |
4 | Monitors, dashboards, SQL alerts |
| ML | 00-ml-pipeline-setup |
0 | MLflow, Feature Store, inference |
| GenAI | 00-course-orchestrator |
9 | GenAI course routing, Track A/B/C agents, evaluation, deployment, monitoring |
| Exploration | 00-adhoc-exploration-notebooks |
0 | Ad-hoc analysis notebooks |
| Common | — | 8 | Asset Bundles, naming, constraints, imports |
After using this framework:
- Complete Medallion Architecture — Bronze, Silver, and Gold tables with governance
- Data Quality — DLT expectations with quarantine patterns, stored in Unity Catalog
- Dimensional Model — Fact and dimension tables with PK/FK constraints
- Semantic Layer — Metric Views and TVFs for Genie natural language queries
- Observability — Lakehouse Monitors, AI/BI Dashboards, and SQL Alerts
- ML Pipelines — MLflow experiments, model training, and batch inference
- GenAI Agents — ResponsesAgent with multi-agent Genie orchestration
- Governance — PII tags, data classification, rich comments, full lineage
| Approach | Time | Savings |
|---|---|---|
| Using Framework (core) | 17-28 hours | — |
| Using Framework (full stack) | 31-56 hours | — |
| From Scratch | 80-120 hours | — |
| Savings | — | 4-6x faster |
- Databricks Unity Catalog — Governance, lineage, and data classification
- Delta Lake — ACID transactions, time travel, and Change Data Feed
- Delta Live Tables (DLT) — Streaming pipelines with data quality expectations
- Databricks Asset Bundles — Infrastructure as Code for all resources
- Metric Views — Semantic metadata layer for AI/BI and Genie
- Genie Spaces — Natural language query interface
- Lakehouse Monitoring — Data profiling with custom business metrics
- MLflow — ML experiment tracking, model registry, and GenAI agents
| Document | Purpose | Type |
|---|---|---|
| AGENTS.md | Universal entry point for any AI coding assistant | Quick Reference |
| QUICKSTART.md | One-prompt-per-stage guide (start here) | Tutorial |
| Framework Design | ||
| Framework Index | Complete architecture and design documentation | Design Docs |
| Parallel Execution Guide | Run independent stages concurrently (30-40% time savings) | Advanced |
| Visual Guides | ||
| Interactive Skill Navigation | Animated walkthrough (open in browser) | Interactive |
| Skill Hierarchy Tree | Visual skill organization map | Interactive |
| Orchestrator Walkthroughs | ||
| Gold Design Orchestrator | Progressive disclosure for dimensional modeling | Deep Dive |
| Silver Orchestrator | Context-aware DLT pipeline generation | Deep Dive |
| Gold Pipeline Orchestrator | YAML-to-implementation patterns | Deep Dive |
| Semantic Layer Orchestrator | Manifest-driven semantic layer setup | Deep Dive |
| Genie Optimization Agent | Long-running optimization loop navigation | Deep Dive |
| Skill System | ||
| Skill Navigator | Master skill routing system | Reference |
These patterns work for any industry:
| Original | Healthcare | Finance | Manufacturing |
|---|---|---|---|
| store | hospital | branch | facility |
| product | medication | financial_product | component |
| transaction | encounter | transaction | production_run |
| revenue | reimbursement | interest_income | output_value |
- One prompt per stage — Each orchestrator skill handles the full workflow
- New conversation per stage — Start fresh to keep context clean
- Start with Gold Design — Design the target model before building Bronze/Silver
- Let the skill do the thinking — The orchestrator loads its workers and common skills automatically
- Test with Faker — Generate synthetic data before connecting real sources
- Deploy to dev first — Use
databricks bundle deploy -t dev - If something fails — The autonomous operations skill handles troubleshooting
- Databricks Documentation
- Unity Catalog
- Delta Lake
- DLT Expectations
- Metric Views
- Lakehouse Monitoring
- MLflow
- See the Skill Navigator for routing to the right skill
- Run
@data_product_accelerator/skills/admin/self-improvement/SKILL.mdto capture learnings from errors - Run
@data_product_accelerator/skills/admin/skill-freshness-audit/SKILL.mdto verify skills are current
This repository is intended for educational and development purposes. Please review and customize the patterns for your specific use case and compliance requirements.
# Clone the repository
git clone https://github.com/databricks-solutions/vibe-coding-workshop-template.git
# Open in your AI coding assistant
cd vibe-coding-workshop-template
# Follow the QUICKSTART guide
# → QUICKSTART.mdCore platform (stages 1-7): 17-28 hours Full stack with ML and GenAI (stages 1-9): 31-56 hours