Skip to content

Commit 5acee45

Browse files
easelclaude
andcommitted
Initial release: tablespec - Python library for UMF table schemas
Introducing tablespec, a comprehensive Python library for working with table schemas in Universal Metadata Format (UMF). This library provides type-safe models, validation, profiling integration, and schema generation tools for data quality workflows. ## Core Features ### UMF Schema Models - Pydantic-based models for UMF format with full validation - Support for column metadata, validation rules, and relationships - LOB-specific nullable configuration (MD/MP/ME) - Comprehensive type system: VARCHAR, CHAR, TEXT, INTEGER, DECIMAL, FLOAT, DATE, DATETIME, BOOLEAN ### Great Expectations Integration - Baseline expectation generation from UMF metadata (deterministic, no profiling required) - Constraint extraction from existing GX suites back into UMF format - Schema validation using Great Expectations - Full round-trip: UMF ↔ Great Expectations ### Schema Generators - SQL DDL generation with LOB-specific nullable columns - PySpark schema generation - JSON Schema export - Comprehensive type mappings between formats ### Profiling Support - Spark DataFrame profiling → UMF conversion - Deequ profiling results → UMF conversion - Automatic column type inference and statistics extraction ### Table Validation (PySpark) - Full DataFrame validation against UMF specifications - Detailed validation reports with row-level failures - Integration with Great Expectations validation engine - Support for custom expectations and validation rules ### LLM Prompt Generators - Documentation generation prompts - Validation rule generation prompts - Relationship discovery prompts - Column-level validation guidance ## Project Structure - `src/tablespec/models/` - Pydantic UMF models - `src/tablespec/schemas/` - Schema generators and JSON schemas - `src/tablespec/profiling/` - Profiling integrations (Spark, Deequ) - `src/tablespec/validation/` - Table validation engine - `src/tablespec/prompts/` - LLM prompt generators - `src/tablespec/gx_*.py` - Great Expectations integration modules - `tests/unit/` - Unit tests (pure Python) - `tests/integration/` - Integration tests (requires external dependencies) ## Development Setup - Python 3.12+ required - Uses `uv` for fast dependency management - Ruff for formatting and linting - Pyright for type checking - Pytest with coverage reporting - Pre-commit hooks for automated quality checks ## Optional Dependencies - `tablespec[spark]` - Adds PySpark support for profiling and validation features ## CI/CD - Pre-commit workflow: Runs ruff format, ruff check, pyright, and pytest - Coverage workflow: Generates and uploads test coverage to Codecov - All checks automated via GitHub Actions ## Testing - Comprehensive unit test suite covering all core functionality - Integration tests for end-to-end workflows - Test fixtures for GX suites and validation reports - 100% coverage target for critical modules 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 03f4794 commit 5acee45

56 files changed

Lines changed: 13557 additions & 3 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/pre-commit.yml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
name: pre-commit
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
pull_request:
7+
branches: [ main ]
8+
9+
jobs:
10+
pre-commit:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
15+
- name: Install uv
16+
uses: astral-sh/setup-uv@v5
17+
with:
18+
version: "latest"
19+
20+
- name: Set up Python 3.12
21+
run: uv python install 3.12
22+
23+
- name: Install dependencies
24+
run: uv sync --all-extras --group dev
25+
26+
- uses: pre-commit/action@v3.0.1
27+
with:
28+
extra_args: --all-files

.github/workflows/release.yml

Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
name: Release & Publish
2+
3+
on:
4+
push:
5+
tags:
6+
- 'v*.*.*'
7+
8+
permissions:
9+
contents: write
10+
pages: write
11+
id-token: write
12+
13+
concurrency:
14+
group: release-${{ github.ref }}
15+
cancel-in-progress: false
16+
17+
jobs:
18+
build:
19+
name: Build Distribution
20+
runs-on: ubuntu-latest
21+
steps:
22+
- name: Checkout code
23+
uses: actions/checkout@v4
24+
with:
25+
fetch-depth: 0 # Full history for version calculation
26+
27+
- name: Set up Python
28+
uses: actions/setup-python@v5
29+
with:
30+
python-version: '3.12'
31+
32+
- name: Install uv
33+
uses: astral-sh/setup-uv@v5
34+
with:
35+
enable-cache: true
36+
37+
- name: Build package
38+
run: uv build
39+
40+
- name: Verify build
41+
run: |
42+
ls -lh dist/
43+
echo "Built packages:"
44+
ls -1 dist/
45+
46+
- name: Upload build artifacts
47+
uses: actions/upload-artifact@v4
48+
with:
49+
name: dist
50+
path: dist/
51+
retention-days: 90
52+
53+
release:
54+
name: Create GitHub Release
55+
needs: build
56+
runs-on: ubuntu-latest
57+
steps:
58+
- name: Checkout code
59+
uses: actions/checkout@v4
60+
61+
- name: Download build artifacts
62+
uses: actions/download-artifact@v4
63+
with:
64+
name: dist
65+
path: dist/
66+
67+
- name: Extract release notes
68+
id: release_notes
69+
run: |
70+
TAG=${GITHUB_REF#refs/tags/}
71+
echo "tag=$TAG" >> $GITHUB_OUTPUT
72+
73+
# Create release notes
74+
cat > release_notes.md << EOF
75+
## tablespec $TAG
76+
77+
### Installation
78+
79+
\`\`\`bash
80+
# From GitHub Pages index
81+
pip install tablespec==${TAG#v} --index-url https://easel.github.io/tablespec/simple/
82+
83+
# With PySpark support
84+
pip install tablespec[spark]==${TAG#v} --index-url https://easel.github.io/tablespec/simple/
85+
\`\`\`
86+
87+
### What's Changed
88+
89+
See the [commit history](https://github.com/easel/tablespec/commits/$TAG) for details.
90+
91+
### Artifacts
92+
93+
- Source distribution (sdist): \`tablespec-${TAG#v}.tar.gz\`
94+
- Wheel distribution: \`tablespec-${TAG#v}-py3-none-any.whl\`
95+
EOF
96+
97+
- name: Create GitHub Release
98+
uses: softprops/action-gh-release@v2
99+
with:
100+
files: dist/*
101+
body_path: release_notes.md
102+
draft: false
103+
prerelease: ${{ contains(steps.release_notes.outputs.tag, 'rc') || contains(steps.release_notes.outputs.tag, 'alpha') || contains(steps.release_notes.outputs.tag, 'beta') }}
104+
env:
105+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
106+
107+
build-index:
108+
name: Build PyPI Index for GitHub Pages
109+
needs: build
110+
runs-on: ubuntu-latest
111+
steps:
112+
- name: Checkout code
113+
uses: actions/checkout@v4
114+
115+
- name: Download build artifacts
116+
uses: actions/download-artifact@v4
117+
with:
118+
name: dist
119+
path: dist/
120+
121+
- name: Generate PyPI-compatible index
122+
run: |
123+
mkdir -p pages/simple/tablespec
124+
125+
# Create root index.html
126+
cat > pages/index.html << 'EOF'
127+
<!DOCTYPE html>
128+
<html>
129+
<head>
130+
<title>tablespec - Private PyPI Repository</title>
131+
<style>
132+
body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif; margin: 40px; }
133+
h1 { color: #0366d6; }
134+
code { background: #f6f8fa; padding: 2px 6px; border-radius: 3px; }
135+
pre { background: #f6f8fa; padding: 16px; border-radius: 6px; overflow-x: auto; }
136+
.container { max-width: 800px; margin: 0 auto; }
137+
.badge { display: inline-block; padding: 4px 8px; background: #0366d6; color: white; border-radius: 3px; font-size: 12px; margin: 4px; }
138+
</style>
139+
</head>
140+
<body>
141+
<div class="container">
142+
<h1>📊 tablespec</h1>
143+
<p><span class="badge">Python 3.12+</span> <span class="badge">Apache 2.0</span></p>
144+
<p>Python library for working with table schemas in Universal Metadata Format (UMF)</p>
145+
146+
<h2>Installation</h2>
147+
<pre>pip install tablespec --index-url https://easel.github.io/tablespec/simple/</pre>
148+
149+
<h2>Quick Start</h2>
150+
<pre>
151+
# Load and validate UMF schema
152+
from tablespec import load_umf_from_yaml, generate_sql_ddl
153+
154+
umf = load_umf_from_yaml("schema.yaml")
155+
ddl = generate_sql_ddl(umf)
156+
print(ddl)
157+
158+
# With PySpark support
159+
pip install tablespec[spark] --index-url https://easel.github.io/tablespec/simple/
160+
</pre>
161+
162+
<h2>Available Packages</h2>
163+
<ul>
164+
<li><a href="simple/tablespec/">tablespec</a></li>
165+
</ul>
166+
167+
<h2>Links</h2>
168+
<ul>
169+
<li><a href="https://github.com/easel/tablespec">GitHub Repository</a></li>
170+
</ul>
171+
</div>
172+
</body>
173+
</html>
174+
EOF
175+
176+
# Create simple/index.html (PEP 503)
177+
cat > pages/simple/index.html << 'EOF'
178+
<!DOCTYPE html>
179+
<html>
180+
<head><title>Simple Index</title></head>
181+
<body>
182+
<h1>Simple Index</h1>
183+
<a href="tablespec/">tablespec</a><br/>
184+
</body>
185+
</html>
186+
EOF
187+
188+
# Create package index with all versions
189+
cat > pages/simple/tablespec/index.html << 'PKGEOF'
190+
<!DOCTYPE html>
191+
<html>
192+
<head><title>Links for tablespec</title></head>
193+
<body>
194+
<h1>Links for tablespec</h1>
195+
PKGEOF
196+
197+
# Add links to all distribution files
198+
for file in dist/*; do
199+
filename=$(basename "$file")
200+
# Calculate SHA256 hash
201+
hash=$(sha256sum "$file" | cut -d' ' -f1)
202+
echo " <a href=\"../../../releases/download/${GITHUB_REF#refs/tags/}/$filename#sha256=$hash\">$filename</a><br/>" >> pages/simple/tablespec/index.html
203+
done
204+
205+
# Close HTML
206+
cat >> pages/simple/tablespec/index.html << 'PKGEOF'
207+
</body>
208+
</html>
209+
PKGEOF
210+
211+
# Create .nojekyll to disable Jekyll processing
212+
touch pages/.nojekyll
213+
214+
echo "Generated index structure:"
215+
find pages -type f -exec echo " {}" \;
216+
217+
- name: Upload Pages artifact
218+
uses: actions/upload-pages-artifact@v3
219+
with:
220+
path: pages/
221+
222+
deploy-pages:
223+
name: Deploy to GitHub Pages
224+
needs: build-index
225+
runs-on: ubuntu-latest
226+
permissions:
227+
pages: write
228+
id-token: write
229+
environment:
230+
name: github-pages
231+
url: ${{ steps.deployment.outputs.page_url }}
232+
steps:
233+
- name: Deploy to GitHub Pages
234+
id: deployment
235+
uses: actions/deploy-pages@v4
236+
237+
verify:
238+
name: Verify Release
239+
needs: [release, deploy-pages]
240+
runs-on: ubuntu-latest
241+
steps:
242+
- name: Set up Python
243+
uses: actions/setup-python@v5
244+
with:
245+
python-version: '3.12'
246+
247+
- name: Wait for GitHub Pages propagation
248+
run: sleep 60
249+
250+
- name: Verify GitHub Pages installation
251+
run: |
252+
TAG=${GITHUB_REF#refs/tags/}
253+
VERSION=${TAG#v}
254+
255+
# Try installing from GitHub Pages index
256+
pip install --index-url https://easel.github.io/tablespec/simple/ tablespec==$VERSION || echo "GitHub Pages index not yet available"
257+
258+
- name: Summary
259+
run: |
260+
TAG=${GITHUB_REF#refs/tags/}
261+
cat << EOF
262+
✅ Release $TAG complete!
263+
264+
📄 GitHub Pages: https://easel.github.io/tablespec/
265+
🏷️ GitHub Release: https://github.com/easel/tablespec/releases/tag/$TAG
266+
EOF

.github/workflows/test.yml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
name: Coverage
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
pull_request:
7+
branches: [ main ]
8+
9+
jobs:
10+
coverage:
11+
runs-on: ubuntu-latest
12+
13+
steps:
14+
- uses: actions/checkout@v4
15+
16+
- name: Install uv
17+
uses: astral-sh/setup-uv@v5
18+
with:
19+
version: "latest"
20+
21+
- name: Set up Python 3.12
22+
run: uv python install 3.12
23+
24+
- name: Install dependencies
25+
run: uv sync --all-extras --group dev
26+
27+
- name: Run tests with coverage
28+
run: |
29+
uv run pytest --cov=src/tablespec --cov-report=xml --cov-report=term
30+
31+
- name: Upload coverage reports to Codecov
32+
uses: codecov/codecov-action@v5
33+
with:
34+
file: ./coverage.xml
35+
fail_ci_if_error: false
36+
continue-on-error: true

.gitignore

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -182,9 +182,9 @@ cython_debug/
182182
.abstra/
183183

184184
# Visual Studio Code
185-
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
185+
# Visual Studio Code specific template is maintained in a separate VisualStudioCode.gitignore
186186
# that can be found at https://github.com/github/gitignore/blob/main/Global/VisualStudioCode.gitignore
187-
# and can be added to the global gitignore or merged into this file. However, if you prefer,
187+
# and can be added to the global gitignore or merged into this file. However, if you prefer,
188188
# you could uncomment the following to ignore the entire vscode folder
189189
# .vscode/
190190

@@ -205,3 +205,5 @@ cython_debug/
205205
marimo/_static/
206206
marimo/_lsp/
207207
__marimo__/
208+
209+
.claude/settings.local.json

.pre-commit-config.yaml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
repos:
2+
- repo: https://github.com/astral-sh/ruff-pre-commit
3+
rev: v0.13.3
4+
hooks:
5+
- id: ruff-format
6+
name: ruff format
7+
- id: ruff
8+
name: ruff check
9+
args: [--fix]
10+
11+
- repo: local
12+
hooks:
13+
- id: pyright
14+
name: pyright
15+
entry: uv run pyright src/
16+
language: system
17+
types: [python]
18+
pass_filenames: false
19+
20+
- id: pytest
21+
name: pytest
22+
entry: uv run pytest
23+
language: system
24+
types: [python]
25+
pass_filenames: false
26+
stages: [commit]

0 commit comments

Comments
 (0)