Skip to content

Bump ORD_SCHEMA_TAG to v0.6.1 and add parquet support#240

Merged
skearnes merged 3 commits into
mainfrom
update-ord-schema-v0.6.1
May 12, 2026
Merged

Bump ORD_SCHEMA_TAG to v0.6.1 and add parquet support#240
skearnes merged 3 commits into
mainfrom
update-ord-schema-v0.6.1

Conversation

@skearnes
Copy link
Copy Markdown
Member

@skearnes skearnes commented May 9, 2026

Summary

  • Bumps ORD_SCHEMA_TAG from v0.3.93 to v0.6.1 in both submission.yml and validation.yml (3-major-version jump).
  • Adds *.parquet to .gitattributes so parquet datasets are tracked via Git LFS.
  • Extends submission file-type checks and process_dataset.py invocation to accept *.parquet alongside *.pb/*.pbtxt.
  • Adds a parquet pass to the validation matrix (no-op until *.parquet files exist in data/).
  • scripts/upload_to_huggingface.py is already extension-agnostic (diffs data/**), so no change needed there.

Test plan

  • Validation workflow passes across all data/* shards on the existing *.pb* files.
  • Validation workflow's new *.parquet pass exits cleanly with "Found 0 datasets" per shard.
  • Submission workflow passes (no data changes — exercises install + processing path only).

skearnes and others added 2 commits May 8, 2026 23:26
Updates the validation and submission workflows to use the latest
ord-schema release (was v0.3.93).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Track *.parquet via Git LFS.
- Accept *.parquet files in submission file-type checks and dataset
  processing.
- Add a parquet pass to the validation matrix (no-op until parquet
  datasets exist).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@skearnes skearnes changed the title Bump ORD_SCHEMA_TAG to v0.6.1 Bump ORD_SCHEMA_TAG to v0.6.1 and add parquet support May 9, 2026
@skearnes skearnes requested a review from bdeadman May 9, 2026 03:47
Drops the unscoped 'push' trigger and the dead push-only checkout step
in process_submission. Every job in this workflow already gates its real
work on pull_request via if: conditions; the push trigger only re-ran
the validation step on direct branch pushes (and on every merge-to-main),
duplicating work the PR run already covered and burning LFS bandwidth.

Also collapses the now-redundant `github.event_name == 'pull_request'`
clauses out of per-job and per-step if: conditions for clarity.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@skearnes skearnes merged commit bbc5e9d into main May 12, 2026
13 checks passed
@skearnes skearnes deleted the update-ord-schema-v0.6.1 branch May 12, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants