Skip to content

feat: bec CLI — agent-first repo tooling, LMS-aligned schemas#4547

Draft
Asi0Flammeus wants to merge 7 commits into
devfrom
feat/bec-cli
Draft

feat: bec CLI — agent-first repo tooling, LMS-aligned schemas#4547
Asi0Flammeus wants to merge 7 commits into
devfrom
feat/bec-cli

Conversation

@Asi0Flammeus

Copy link
Copy Markdown
Collaborator

What

bec — a Python CLI (scripts/bec/, click + jsonschema) replacing the scattered validation/report/proofreading scripts with one tool:

  • bec validate [path|--all] — full-repo validation in ~40s, --json, exit codes 0/1/2
  • bec new course|tutorial|professor|event|resource — scaffolds that pass validation (all 10 resource types)
  • bec add part|chapter|quiz|language — UUID part/chapter IDs, quiz numbering, language-file skeletons
  • bec proofread update|reward|batch-add|status — proofreading metadata + reward calculation
  • bec report translation|images|video|proofreading|analytics [--all] — the 5 HTML/JSON reports
  • bec agent-setup — symlinks AGENTS.md / CLAUDE.md to repo root

Superseded scripts moved to to_delete/ (validation-format, proofreading-metadata, proofreading_report, course_report, video_deployment_overview); docs/PBN-template-repo/ removed (scaffolds replace it). Canonical schemas now live in scripts/bec/src/bec/schemas/ (root schemas/ symlink kept).

Schema alignment — LMS is the source of truth

All 24 JSON schemas were audited against what the LMS sync actually consumes (importer parsers > Zod > Drizzle, bitcoin-learning-management-system@9fc35b345) by an 8-group multi-agent audit with adversarial verification of every edit (~110 edits applied).

Full evidence-per-edit report: https://static.crqpt.com/bec-alignment-report.html · summary in docs/reports/bec-lms-alignment-2026-06.md

Validation impact on the 2,677 current content items:

before after
passing 645 2,547
errors 1,567 87 (genuine content defects, inventoried in the report — deliberately not fixed here)
warnings 465 43

Notable: chapter/part IDs are generated as UUIDs (the LMS uuidValidates them; the originally planned BIP39 word-IDs would have broken sync), quiz question.yml now writes its id (LMS primary key), and the glossary content schema was flattened — it previously false-failed every glossary markdown file.

Code review

44 adversarially-confirmed findings fixed (crash-on-malformed-YAML, schema-aware null handling, case-insensitive enum prompts, valid placeholder webp, proofreading reward case bug, analytics part-collapsing, packaging deps). 444 tests pass.

Follow-ups (documented in the report, out of scope here)

  • validate professor translation .yml files (traversal gap)
  • schema for assignment.yml
  • fix the 87 invalid content items (mostly unquoted-colon YAML in machine-translated quizzes)
  • courses/dev301 testnet_onlytest_only typo

…t commands

Python package at scripts/bec (click + jsonschema + pyyaml), installed via
pip install -e. Content registry at content-types.yml (14 content types,
categories, discipline codes, tags, languages). Canonical JSON schemas at
scripts/bec/src/bec/schemas with root schemas/ symlink. 405 tests.
Agent-first repo orientation: content types, bec command reference,
conventions (course IDs, BIP39 chapter IDs, image rules), validation
workflow. bec agent-setup symlinks them to repo root.
- scripts/validation-format, proofreading-metadata, proofreading_report,
  course_report, video_deployment_overview, one-off scripts → to_delete/
- docs/PBN-template-repo removed (bec new scaffolds replace templates)
- docs/50-planb-tags.md absorbed into schemas/tags-definitions.json
- plans/bec-cli.md: implementation plan
LMS (bitcoin-learning-management-system) is the source of truth for
content format — precedence: importer parsers > Zod schemas > Drizzle
tables. ~110 verified edits from an 8-group adversarially-verified
audit: relax required fields the importer treats as optional, open
the tag vocabulary (importer lowercases and upserts any string),
generic-UUID patterns instead of v4-strict, nullable unions matching
nullable columns, add importer-consumed fields BEC never documented
(course assignments/pricing, tutorial last_update_date, professor
links/tips, conference id/original_language/proofreading), flatten
word-content-scheme to the layout every other content schema uses.
tags-definitions.json stays as the documented vocabulary.
- validate: malformed YAML no longer crashes the run; schema-aware
  null handling (no false warnings on nullable fields); content files
  validated with real jsonschema instead of a shallow field loop;
  missing metadata file is an error; per-file paths in --all output
- add: chapter/part IDs are uuid4 (LMS uuidValidates them — BIP39
  wordlist removed); quiz question.yml writes its id; lang and
  chapter-id validated; CRLF-safe markdown append; regional language
  codes (zh-Hans, sr-Latn, nb-NO) recognized
- new: interactive enum prompts match case-insensitively (license was
  impossible to enter); valid placeholder webp; book scaffold writes
  required id; enums loaded from schemas instead of hardcoded copies
- proofread: language-factor lookup case-consistency; project.yml
  reachable (stale builder.yml removed); no literal 'None' in YAML;
  tests no longer mutate real repo content
- report: course analytics splits parts correctly; chapterId/partId
  values excluded from word counts; null video providers guarded
- packaging: jsonschema>=4.18 + referencing declared; schemas shipped
  in package-data; bec report without subcommand shows help, exits 2

Tests: 444 passed (was 405). Full-repo validation: 2547/2677 pass
(was 645), remaining 87 errors are genuine content defects.
The LMS importer computes Math.round(p.reward * 100) unguarded — a
missing reward crashes the sync insert, same as course/tutorial.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant