Skip to content

Commit 648003d

Browse files
easelclaude
andcommitted
Port SQL plan generation from pulseflow, add to screencast
New modules: - schemas/sql_generator.py: SQLPlanGenerator generates full SQL execution plans from UMF metadata (base views, sequential joins, aggregations, window functions, UNPIVOT, derivation/survivorship column mapping) - schemas/relationship_resolver.py: RelationshipResolver infers join sequences, strategies, and base tables from UMF relationships All pulseflow-specific code removed (CatalogResolver, PipelineDiscovery, hardcoded healthcare patterns). Generic table_resolver callback replaces cross-pipeline resolution. Public API: SQLPlanGenerator, generate_sql_plan() Tests: 34 new tests covering basic generation, joins, derivations, relationship resolution, and convenience function Screencast: new SQL plan generation scene with voice narration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent de829e8 commit 648003d

12 files changed

Lines changed: 5061 additions & 1044 deletions

examples/narrate.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ gen_clip "domains" "42 domain types ship built in. Feed it a column name like p
4141
gen_clip "gx" "Generate a full Great Expectations suite deterministically from metadata alone. 13 expectations covering column existence, types, nullability, and length constraints."
4242
gen_clip "prompts" "Generate structured prompts for LLMs. Documentation prompts. Validation rule prompts. All the column metadata and domain context is included automatically."
4343
gen_clip "diff" "Schema evolution tracking. Modify a table and see exactly what changed. Added columns. Modified descriptions."
44+
gen_clip "sql_plan" "Generate full SQL execution plans from UMF metadata. Joins, column derivations, survivorship logic, aggregations. All computed automatically from the schema relationships."
4445
gen_clip "spark" "Now the PySpark features. Starting a Spark session. Creating DataFrames. Profiling schemas. Validating data against UMF specs. And generating sample data. All from the same UMF metadata."
4546
gen_clip "close" "That's tablespec. Define once. Use everywhere."
4647

@@ -59,6 +60,7 @@ CLIP_domains=${CLIP_DUR[domains]}
5960
CLIP_gx=${CLIP_DUR[gx]}
6061
CLIP_prompts=${CLIP_DUR[prompts]}
6162
CLIP_diff=${CLIP_DUR[diff]}
63+
CLIP_sql_plan=${CLIP_DUR[sql_plan]}
6264
CLIP_spark=${CLIP_DUR[spark]}
6365
CLIP_close=${CLIP_DUR[close]}
6466
EOF

examples/scene.py

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -294,6 +294,128 @@ def scene_spark():
294294
spark.stop()
295295

296296

297+
def scene_sql_plan():
298+
from tablespec import (
299+
UMF,
300+
UMFColumn,
301+
UMFColumnDerivation,
302+
DerivationCandidate,
303+
Nullable,
304+
Relationships,
305+
OutgoingRelationship,
306+
generate_sql_plan,
307+
)
308+
309+
# Build a derived table that joins claims + providers
310+
target = UMF(
311+
version="1.0",
312+
table_name="Claims_Summary",
313+
description="Enriched claims with provider info",
314+
table_type="generated",
315+
columns=[
316+
UMFColumn(
317+
name="claim_id",
318+
data_type="VARCHAR",
319+
length=50,
320+
description="Unique claim identifier",
321+
nullable=Nullable(MD=False, MP=False, ME=False),
322+
derivation=UMFColumnDerivation(
323+
strategy="primary_key",
324+
candidates=[
325+
DerivationCandidate(
326+
table="Medical_Claims",
327+
column="claim_id",
328+
priority=1,
329+
)
330+
],
331+
),
332+
),
333+
UMFColumn(
334+
name="claim_amount",
335+
data_type="DECIMAL",
336+
precision=10,
337+
scale=2,
338+
description="Claim amount",
339+
derivation=UMFColumnDerivation(
340+
candidates=[
341+
DerivationCandidate(
342+
table="Medical_Claims",
343+
column="claim_amount",
344+
priority=1,
345+
)
346+
],
347+
),
348+
),
349+
UMFColumn(
350+
name="provider_name",
351+
data_type="VARCHAR",
352+
length=200,
353+
description="Provider full name",
354+
derivation=UMFColumnDerivation(
355+
candidates=[
356+
DerivationCandidate(
357+
table="Providers",
358+
column="provider_name",
359+
priority=1,
360+
)
361+
],
362+
),
363+
),
364+
UMFColumn(
365+
name="state_code",
366+
data_type="VARCHAR",
367+
length=2,
368+
description="Provider state",
369+
derivation=UMFColumnDerivation(
370+
candidates=[
371+
DerivationCandidate(
372+
table="Providers",
373+
column="state_code",
374+
priority=1,
375+
)
376+
],
377+
),
378+
),
379+
],
380+
relationships=Relationships(
381+
outgoing=[
382+
OutgoingRelationship(
383+
target_table="Medical_Claims",
384+
source_column="claim_id",
385+
target_column="claim_id",
386+
type="foreign_to_primary",
387+
confidence=1.0,
388+
),
389+
OutgoingRelationship(
390+
target_table="Providers",
391+
source_column="provider_id",
392+
target_column="provider_id",
393+
type="foreign_to_primary",
394+
confidence=1.0,
395+
),
396+
]
397+
),
398+
)
399+
400+
from tablespec import load_umf_from_yaml
401+
402+
claims = load_umf_from_yaml(str(CLAIMS_YAML))
403+
providers = load_umf_from_yaml(str(PROVIDERS_YAML))
404+
405+
related = {
406+
"Medical_Claims": claims,
407+
"Providers": providers,
408+
}
409+
410+
sql = generate_sql_plan(target, related)
411+
# Show first 40 lines
412+
lines = sql.splitlines()
413+
for line in lines[:40]:
414+
print(line)
415+
if len(lines) > 40:
416+
print(f"... ({len(lines)} total lines)")
417+
418+
297419
# ─── Dispatch ─────────────────────────────────────────────────────
298420

299421
SCENES = {
@@ -305,6 +427,7 @@ def scene_spark():
305427
"gx": scene_gx,
306428
"prompts": scene_prompts,
307429
"diff": scene_diff,
430+
"sql_plan": scene_sql_plan,
308431
"spark": scene_spark,
309432
}
310433

examples/screencast.sh

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -149,7 +149,16 @@ echo
149149
run uv run python examples/scene.py diff
150150
wait_for_clip "${CLIP_diff:-5}" 3
151151

152-
# ─── Scene 9: Spark ──────────────────────────────────────────────
152+
# ─── Scene 9: SQL Plan Generation ────────────────────────────────
153+
154+
divider
155+
mark "sql_plan"
156+
narrate "Generate full SQL execution plans from UMF — joins, derivations, survivorship, all automatic."
157+
echo
158+
run uv run python examples/scene.py sql_plan
159+
wait_for_clip "${CLIP_sql_plan:-8}" 4
160+
161+
# ─── Scene 10: Spark ─────────────────────────────────────────────
153162

154163
divider
155164
mark "spark"
265 KB
Binary file not shown.

0 commit comments

Comments
 (0)