Skip to content

Commit 94d2eab

Browse files
easelclaude
andcommitted
Add demo script, screencast, acceptance test, and narrated walkthrough
- examples/demo.py: 11-section end-to-end demo with assertions (acceptance test) - examples/scene.py: per-scene dispatcher for screencast recording - examples/screencast.sh: director script with narration and pacing - examples/narrate.sh: TTS pipeline (piper) with timestamp-aligned audio - examples/Dockerfile.demo: Docker image with PySpark 4.0 for portable demo - examples/demo.tape: VHS tape file for GIF/MP4 recording - examples/tablespec-demo.gif: animated GIF for README - examples/tablespec-demo.mp4: silent MP4 - examples/tablespec-demo-narrated.mp4: MP4 with British female voice narration - examples/tablespec-demo.cast: asciinema recording (scrollback + copy-paste) - tests/integration/test_demo.py: pytest wrapper (exits non-zero on failure) - Makefile: add test-demo target - README.md: add demo section with embedded GIF Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 74bdb48 commit 94d2eab

14 files changed

Lines changed: 2482 additions & 0 deletions

Makefile

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,9 @@ test-unit: ## Run unit tests only
4747
test-integration: ## Run integration tests only
4848
uv run pytest tests/integration/
4949

50+
test-demo: ## Run demo script as acceptance test
51+
uv run python examples/demo.py
52+
5053
coverage: ## Run tests with coverage report
5154
uv run pytest --cov=src --cov-report=term-missing --cov-report=html
5255

README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,24 @@ Python library for working with table schemas in Universal Metadata Format (UMF)
1818
- **Domain Type Inference**: Automatic detection of domain types (SSN, NPI, phone, state codes, etc.)
1919
- **Change Management**: UMF diffing, atomic change application, and git-based changelogs
2020

21+
## Demo
22+
23+
![tablespec demo](examples/tablespec-demo.gif)
24+
25+
The demo walks through loading a UMF schema, generating SQL/PySpark/JSON schemas, type mappings, domain type inference, Great Expectations baseline generation, LLM prompt generation, UMF diffing, and PySpark validation with sample data generation.
26+
27+
Run it yourself:
28+
29+
```bash
30+
# Run the demo (requires tablespec[spark])
31+
uv run python examples/demo.py
32+
33+
# Run as acceptance test
34+
uv run pytest tests/integration/test_demo.py
35+
```
36+
37+
A [narrated screencast](examples/tablespec-demo-narrated.mp4) and [asciinema recording](examples/tablespec-demo.cast) are also available.
38+
2139
## Installation
2240

2341
### Using uv (recommended)

examples/Dockerfile.demo

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
FROM eclipse-temurin:21-jre-noble
2+
3+
# System deps
4+
RUN apt-get update && apt-get install -y --no-install-recommends \
5+
python3 python3-pip python3-venv curl ca-certificates git \
6+
&& rm -rf /var/lib/apt/lists/*
7+
8+
# Install uv
9+
COPY --from=ghcr.io/astral-sh/uv:latest /uv /usr/local/bin/uv
10+
11+
WORKDIR /app
12+
13+
# Copy project files
14+
COPY pyproject.toml uv.lock README.md ./
15+
COPY src/ src/
16+
COPY examples/ examples/
17+
18+
# Install tablespec with spark extra
19+
RUN uv sync --extra spark --no-dev --frozen
20+
21+
# Suppress Spark noise
22+
ENV SPARK_LOG_LEVEL=ERROR
23+
ENV PYTHONUNBUFFERED=1
24+
25+
# Redirect Spark stderr noise to /dev/null in the entrypoint
26+
ENTRYPOINT ["sh", "-c", "uv run python examples/demo.py 2>/dev/null"]

examples/SCREENCAST_SCRIPT.md

Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
# tablespec Screencast Script
2+
3+
**Runtime:** ~3 minutes
4+
**Tools:** Docker, VHS (charmbracelet/vhs)
5+
**Record:** `vhs examples/demo.tape`
6+
7+
---
8+
9+
## COLD OPEN
10+
11+
> Terminal appears with a dark theme. Two comment lines fade in:
12+
13+
```
14+
# tablespec — Universal Metadata Format for table schemas
15+
# A complete walkthrough: schema loading, generation, validation, and PySpark
16+
```
17+
18+
**NARRATOR:** tablespec is a Python library for defining, validating, and
19+
generating table schemas using a single YAML-based format called UMF —
20+
Universal Metadata Format. Let's see what it can do.
21+
22+
---
23+
24+
## SCENE 1 — "The Schema"
25+
26+
> The example UMF YAML file is displayed on screen.
27+
28+
```yaml
29+
version: "1.0"
30+
table_name: Medical_Claims
31+
description: Healthcare claims and billing information
32+
columns:
33+
- name: claim_id
34+
data_type: VARCHAR
35+
length: 50
36+
nullable:
37+
MD: false # Medicaid
38+
MP: false # Medicare Part D
39+
ME: false # Medicare
40+
- name: claim_amount
41+
data_type: DECIMAL
42+
precision: 10
43+
scale: 2
44+
nullable:
45+
MD: true
46+
MP: true
47+
ME: true
48+
- name: provider_id
49+
data_type: VARCHAR
50+
length: 20
51+
```
52+
53+
**NARRATOR:** This is a UMF schema for a healthcare claims table. Three
54+
columns, each with a data type, length, and nullable configuration per
55+
Line of Business — Medicaid, Medicare Part D, and Medicare. One YAML file
56+
is the single source of truth for everything downstream.
57+
58+
---
59+
60+
## SCENE 2 — "The Build"
61+
62+
> Docker image builds with PySpark 4.0, Java 21, and tablespec.
63+
64+
**NARRATOR:** We're building a Docker image with PySpark 4.0 and
65+
tablespec installed. This mirrors what you'd have on Databricks — same
66+
Spark, same library, same behavior.
67+
68+
---
69+
70+
## SCENE 3 — "The Demo"
71+
72+
The demo runs in Docker. Each section appears sequentially:
73+
74+
### ACT 1: Load & Inspect (Section 1)
75+
76+
> Output shows table name, description, columns, nullable config.
77+
78+
**NARRATOR:** We load the UMF YAML into a Pydantic model. Every field is
79+
type-checked. The nullable config tells us which columns are required in
80+
which LOB — claim_id is required everywhere, but claim_amount can be null.
81+
82+
---
83+
84+
### ACT 2: Schema Generation (Section 2)
85+
86+
> SQL DDL, PySpark StructType, and JSON Schema appear in sequence.
87+
88+
**NARRATOR:** From one UMF file, we generate three schema formats. SQL DDL
89+
for data warehouses. PySpark StructType for Spark jobs. JSON Schema for API
90+
validation. One source, many targets.
91+
92+
---
93+
94+
### ACT 3: Type Mappings (Section 3)
95+
96+
> A table shows VARCHAR -> StringType -> string -> StringType across systems.
97+
98+
**NARRATOR:** The type mapping engine converts between UMF, PySpark, JSON
99+
Schema, and Great Expectations. VARCHAR becomes StringType in Spark, string
100+
in JSON, StringType in GX. DECIMAL stays DECIMAL with precision preserved.
101+
102+
---
103+
104+
### ACT 4: Domain Type Inference (Section 4)
105+
106+
> Column names are matched to domain types with confidence scores.
107+
108+
**NARRATOR:** tablespec ships with 42 domain types. Feed it a column name
109+
like "provider_npi" and it recognizes it as an NPI — National Provider
110+
Identifier — with 100% confidence. It even knows the validation rule:
111+
a 10-digit regex. state_code maps to US state codes. member_email maps to
112+
email. All automatic.
113+
114+
---
115+
116+
### ACT 5: Great Expectations Baseline (Section 5)
117+
118+
> 13 expectations are generated with severity levels.
119+
120+
**NARRATOR:** From the same UMF, we generate a baseline Great Expectations
121+
suite. 13 expectations: column existence, type validation, nullability
122+
constraints, length checks. Each tagged with a severity — critical for
123+
data integrity, warning for quality, info for structural checks. No manual
124+
GX authoring needed.
125+
126+
---
127+
128+
### ACT 6: LLM Prompt Generation (Section 6)
129+
130+
> Prompt lengths and a preview are displayed.
131+
132+
**NARRATOR:** tablespec generates structured prompts for LLMs. A
133+
documentation prompt asks an AI to analyze the table's business purpose,
134+
data flow, and compliance considerations. A validation prompt asks it to
135+
generate multi-column GX rules that go beyond what baseline can do
136+
automatically. The prompts include all column metadata, sample values, and
137+
domain context.
138+
139+
---
140+
141+
### ACT 7: UMF Diffing (Section 7)
142+
143+
> Two changes detected: a new column and a modified description.
144+
145+
**NARRATOR:** Schema evolution tracking. We modified the claims table —
146+
added a service_date column and updated a description. UMFDiff detects
147+
both changes instantly. This powers changelog generation and schema review
148+
workflows.
149+
150+
---
151+
152+
### ACT 8: Spark Session (Section 8)
153+
154+
> Spark 4.0.1 session is created. A DataFrame with 5 claims is displayed.
155+
156+
**NARRATOR:** Now we enter PySpark territory. We create a local Spark
157+
session — the same factory function works on Databricks, it auto-detects
158+
the environment. We create a sample DataFrame with five claims, including
159+
one with a NULL amount.
160+
161+
---
162+
163+
### ACT 9: Profiling (Section 9)
164+
165+
> SparkToUmfMapper infers column types from the DataFrame.
166+
167+
**NARRATOR:** SparkToUmfMapper goes the other direction — from a Spark
168+
DataFrame back to UMF. It infers column names, types, and nullability.
169+
Useful for onboarding existing tables that don't have a UMF spec yet.
170+
171+
---
172+
173+
### ACT 10: Validation (Section 10)
174+
175+
> One validation error: claim_amount has the wrong data type.
176+
177+
**NARRATOR:** TableValidator checks the DataFrame against the UMF spec.
178+
It found one issue — claim_amount is a double in Spark but DECIMAL in the
179+
spec. This is exactly the kind of type drift that causes silent data
180+
corruption in pipelines. The validator returns a structured error DataFrame
181+
you can write to a monitoring table.
182+
183+
---
184+
185+
### ACT 11: Sample Data Generation (Section 11)
186+
187+
> Split-format UMF is prepared. 100 rows of claims and providers are generated.
188+
189+
**NARRATOR:** Finally, sample data generation. We save the UMF specs in
190+
split format — the git-friendly directory structure — and generate 100 rows
191+
for each table. The generator respects column types, nullable rules, and
192+
produces realistic values. Provider NPIs are 10 digits. State codes are
193+
real states. Foreign keys are coherent across tables.
194+
195+
---
196+
197+
## CLOSING
198+
199+
> "Demo complete!" banner appears.
200+
201+
**NARRATOR:** That's tablespec. One YAML schema drives SQL generation,
202+
Spark schemas, Great Expectations, domain inference, validation, profiling,
203+
LLM prompts, and sample data. Define once, use everywhere.
204+
205+
---
206+
207+
## Production Notes
208+
209+
**To record the screencast:**
210+
211+
```bash
212+
# Build the Docker image first (one-time)
213+
docker build -t tablespec-demo -f examples/Dockerfile.demo .
214+
215+
# Record with VHS
216+
vhs examples/demo.tape
217+
```
218+
219+
**Outputs:**
220+
- `examples/demo.gif` — animated GIF for README / docs
221+
- `examples/demo.mp4` — video for presentations
222+
223+
**To customize:**
224+
- Edit `examples/demo.tape` for timing, theme, font
225+
- Edit `examples/demo.py` to add/remove sections
226+
- The VHS tape runs the demo inside Docker for reproducibility

0 commit comments

Comments
 (0)