Initial design + implementation done via spec-kit + claude code#4
Conversation
|
Some notes on e27e009, as leading from comment thread, on how the detection of potentially erroneously inconsistent metadata might be determined: The algorithmic logic by which this should be done would be best designed around the same logic as the handling of the Inheritance Principle. Eg. Just because "task name" has a few different values across a dataset doesn't mean there's acquisition inconsistency if there's actually multiple |
|
ok, to make progress, I initiated right here the "Constitution" with/for spec-kit. On that process, to capture both provenance of my prompts but also better commit messages from |
|
|
||
| - **Package management**: `uv` with `pyproject.toml` as single source of truth. | ||
| - **Testing**: `pytest` orchestrated by `tox` (with `tox-uv`). | ||
| - **Linting**: `ruff` for formatting and linting. |
There was a problem hiding this comment.
I will add on DRY principle code duplication detection
just-meng
left a comment
There was a problem hiding this comment.
i'd be thrilled if the rename-tool can flow into this command line tool :D
1881dd1 to
20dca93
Compare
Co-authored-by: Robert Smith <robert.smith@florey.edu.au>
Co-authored-by: Robert Smith <robert.smith@florey.edu.au>
Set up .specify/ directory structure and establish the bids-utils constitution (v1.1.0) defining 9 core principles: Do No Harm, Schema-Driven and Version-Flexible, Library-First, CLI Excellence, Test-First, Performance at Scale, VCS Awareness, Observability, and Simplicity. Includes constitution update checklist for maintaining template consistency. Co-Authored-By: Claude Code 2.1.81 / Claude Opus 4.6 <noreply@anthropic.com>
…idator details - PyBIDS: consult interfaces but require very significant benefit to adopt; bids2table is a lighter alternative to evaluate first - bids-validator: primary is Deno-based (bids-validator-deno on PyPI); WiP Python validator may be adopted later Co-Authored-By: Claude Code 2.1.81 / Claude Opus 4.6 <noreply@anthropic.com>
…onstitution - Add principle X (Versioning & Breaking Changes): SemVer, migration guides, deprecation warnings - Add principle XI (DRY): duplication detection via pylint duplicate-code (AST-aware, pragma support) and jscpd (token-based, CI threshold gating), both as tox testenvs - Add Releases section: mandate automated releases via intuit/auto or comparable tooling with PR-label-driven versioning and changelog generation - List duplication detection in Tooling section Co-Authored-By: Claude Code 2.1.81 / Claude Opus 4.6 <noreply@anthropic.com>
=== Do not change lines below ===
{
"chain": [],
"cmd": "yolo -- '/speckit.constitution something went not right before but we did get constitution in .specify/memory/constitution.md and checklist. Verify that constitution as it should be so we could go to the next step'",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
=== Do not change lines below ===
{
"chain": [],
"cmd": "yolo '/speckit.specify Build a Python application/library following what is described in docs/design/00-initial-design.md file'",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
Entire-Checkpoint: c472dec6f48b
=== Do not change lines below ===
{
"chain": [],
"cmd": "yolo '/speckit.clarify - in prior commit I added some changes to user stories -- please analyze and potentially adjust user stories to reflect better. Also please add a user story for migrate within 1.x series. Review bids-specification deprecations through the versions (specification is under /home/yoh/proj/bids/bids-specification-master and git grep there for DEPRECATED). So add a prominent user story that we want to add migrations within 1.x (default version to migrate to is a current released) and handling deprecations (e.g. renaming metadata variables and/or migrating values)'",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
Entire-Checkpoint: ba11cad252d1
=== Do not change lines below ===
{
"chain": [],
"cmd": "yolo '/speckit.clarify - in prior commit I added some changes to user stories -- please analyze and potentially adjust user stories to reflect better.'",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
Entire-Checkpoint: cd9954849052
20dca93 to
a9333c9
Compare
=== Do not change lines below ===
{
"chain": [],
"cmd": "yolo '/speckit.plan proceed while also doing research on prototypes and related projects listed in docs/design/00-initial-design.md file'",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
Entire-Checkpoint: 42c05a1a8f38
=== Do not change lines below ===
{
"chain": [],
"cmd": "yolo /speckit.tasks",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
Entire-Checkpoint: 538c9cb6369d
Full implementation of all 10 user stories from the spec: - Core infrastructure: BIDSPath parsing, BIDSDataset discovery, schema wrapper (bidsschematools), VCS backends (git/git-annex/datalad), sidecar discovery, _scans.tsv and participants.tsv management - rename: file + sidecar rename with scans update and conflict detection - migrate: schema-driven 1.x deprecation fixes (field renames, enum renames, path→BIDS URI, DOI format, ScanDate→scans.tsv) - subject-rename / remove: full dataset-wide subject operations - session-rename: including move-into-session - metadata aggregate/segregate/audit: inheritance-aware manipulation - remove-run: with --shift/--no-shift reindexing - merge: combine datasets with --into-sessions and conflict handling - split: extract subsets by suffix or datatype Also includes: - Project scaffolding (pyproject.toml, tox.ini, CI, mkdocs, .autorc) - bids-examples git submodule for integration testing - 128 tests covering all modules and CLI commands Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 1abc11550506
- Fix all ruff lint errors (line length, unused imports/vars, import sorting, set comprehension, zip strict) - Fix mypy errors (variable shadowing in except/for blocks) - Eliminate duplicate code by extracting shared helpers: - _tsv.py: common TSV read/write used by _scans.py and _participants.py - cli/_common.py: output_result() and load_dataset() used by all CLI cmds - _types.py: rename_change(), normalize_subject_id(), require_subject_dir() - Remove TCH lint rules (false positives for runtime Path usage) - Add CLAUDE.md with mandatory tox-before-commit rule - Add tox gate rule to .specify/memory/constitution.md 136 tests pass, ruff/mypy/pylint all clean. Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 1e95ab52403e
The click subcommand modules were never imported in cli/__init__.py, so `bids-utils --help` showed no commands. Add imports after the `main` group definition and add a test that asserts all 9 expected commands are present in --help output. Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: a97c91242c0b
Entire-Checkpoint: 2ac300b095a4
Remediations for the top issues found by /speckit.analyze: - C2: Create tests/integration/test_bids_examples.py with sweep tests for rename (dry-run), subject-rename (dry-run), migrate (dry-run), and one mutating rename test. 276 integration tests across all bids-examples datasets. - Fix migrate crash on datasets with BIDSVersion "n/a" by handling InvalidVersion gracefully in _get_rules(). - H1: Add interactive confirmation prompt to `bids-utils remove` when --force is not passed (constitution Principle I). - H3: Remove "deduplicate" ghost reference from plan.md Phase 6 (not in spec, tasks, or implementation). - M7: Unmark 19 tasks in tasks.md that were incorrectly marked [X] (bids-examples sweeps, 2.0 migration, multi-schema testing, performance profiling, suffix deprecation handler). 416 tests pass (140 unit + 276 integration), all linters clean. Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: be9fe515efd3
- Search for EEG (.edf, .vhdr, .set, .bdf), MEG (.fif), fNIRS (.snirf), microscopy (.ome.tif, .ome.zarr), TSV and JSON files — not just .nii.gz — so more bids-examples datasets are tested - Use `reason=` kwarg in all pytest.skip() calls for clear output - Results: 304 pass (was 276), 18 skip (was 46, now only atlas-* datasets which use tpl-* not sub-*) Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: d905ac840283
So the skip output shows *why* a dataset cannot be loaded (e.g. "Missing BIDSVersion" vs "No dataset_description.json") rather than a generic "cannot load dataset". Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 51478c10f63e
=== Do not change lines below ===
{
"chain": [],
"cmd": "yolo '/speckit.clarify there should be easy ways to enable shell completion for bids-utils commands. https://click.palletsprojects.com/en/stable/shell-completion/ since click is used could be the relevant documentation'",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
Entire-Checkpoint: 69e313ea2bc1
=== Do not change lines below ===
{
"chain": [],
"cmd": "yolo /speckit.tasks",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
Entire-Checkpoint: 0ecffc352f3b
=== Do not change lines below ===
{
"chain": [],
"cmd": "yolo -p '/speckit.implement T034 T035 — complete 1.x migration'",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
Entire-Checkpoint: 76a8b5466a9d
=== Do not change lines below ===
{
"chain": [],
"cmd": "yolo -p '/speckit.implement T054 T061 T068 T072 — sweep tests'",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
Entire-Checkpoint: 8a2d1c723bd9
=== Do not change lines below ===
{
"chain": [],
"cmd": "yolo -p /speckit.implement",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
Entire-Checkpoint: 0816729e7f79
Cross-artifact analysis identified 1 CRITICAL, 2 HIGH, 9 MEDIUM, and 4 LOW findings. This commit applies concrete remediations: - Add Phase 11 (tasks T083-T085) for shell completion (FR-019/020/021) which had zero coverage in tasks.md - Expand US10 (Split) from 1 to 4 acceptance scenarios - Resolve all 11 unanswered edge case questions with dispositions - Add remove_run() and split_dataset() to library API contract - Add MigrationResult dataclass to data-model.md - Make SC-003 measurable (5-second benchmark target) - Mark Phase 4 (BIDS 2.0) as PROVISIONAL pending schema stabilization - Update plan.md to reflect actual code structure (_tsv.py, test_cli_common.py) - Annotate FR-016 as specific application of FR-009 Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: ac75835aa773
Add `bids-utils completion [bash|zsh|fish]` command (FR-019) that outputs shell-specific activation scripts, with auto-detection from $SHELL. Add BIDS-aware custom completions (FR-020, FR-021): SubjectCompletion, SessionCompletion, EntityKeyCompletion, and BIDSFileCompletion types wired into rename, subject-rename, session-rename, and remove commands. Dataset root resolved by walking up from CWD to dataset_description.json. 26 tests in tests/test_completion.py cover shell detection, activation script output, subject/session/entity/file completions, and dataset root discovery. Co-Authored-By: Claude Code 2.1.94 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 0604dcdcb54a
Add Phase 1b (T086-T091) for content-aware I/O on git-annex/DataLad datasets. Group-level --annexed option (error/get/skip-warning/skip) controls missing content policy; BIDS_UTILS_ANNEXED env var for persistent preference. VCS protocol extended with four new methods: has_content/get_content for reads, unlock/add for writes. The _io.py layer provides ensure_content (gated by --annexed mode), ensure_writable and mark_modified (always active for annex backends — correctness, not policy). Full lifecycle: get → unlock → read → modify → write → add. Updates: spec (FR-022, edge cases, clarifications, assumptions), library-api contract (VCSBackend protocol, _io API, BIDSDataset.annexed_mode), plan (Phase 1b steps), tasks (T086-T091 + dependency graph). Co-Authored-By: Claude Code 2.1.98 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 902a7cfc1d2a
Add --annexed group-level option (error/get/skip-warning/skip) with BIDS_UTILS_ANNEXED env var for git-annex/DataLad datasets where file content may not be locally available. VCS protocol extended with has_content/get_content (reads) and unlock/add (writes). Content-aware I/O layer (_io.py) provides ensure_content, ensure_writable, mark_modified, read_json, write_json. Full lifecycle for annexed files: get → unlock → read → modify → write → add. The --annexed mode controls missing-content policy; unlock/add are always active for annex backends (correctness). Wired through all file readers/writers: _tsv, _scans, _participants, session, subject, rename, metadata (~9 sites), migrate (~17 sites). load_dataset() applies annexed_mode from CLI context automatically. 38 new tests (test_io.py, test_vcs.py, test_cli_common.py additions). All 1288 tests pass across py310-py314, lint, type, duplication. Co-Authored-By: Claude Code 2.1.98 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 439757156f50
…2-T098) BUG FIX: Replace is_file() with not is_dir() for file iteration in session.py, subject.py, run.py, split.py, merge.py, _sidecars.py, migrate.py. Path.is_file() follows symlinks and returns False for annexed files without content, silently skipping them during rename. ENHANCEMENT: --dry-run now accepts optional value: --dry-run (overview, default) shows summary; --dry-run=detailed lists every file operation. Session rename now enumerates per-file changes before the dry_run check so both modes have full information. Overview mode filters out indented detail lines; detailed mode shows "action: source → target" per file. ENHANCEMENT: Annex operations logged via Python logging — INFO for content fetches (--annexed=get), DEBUG for unlock/add. CLI verbosity (-v, -q) wired to logging levels. Add tmp_annex_dataset fixture (git-annex repo with locked symlinks). 6 regression tests verify all files (including symlinks) renamed correctly. 8 tests for --dry-run modes and annex log messages. 1297 tests pass across py310-py314, lint, type, duplication. Co-Authored-By: Claude Code 2.1.98 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: c963bb384041
|
FWIW -- we have initial basic implementation. Buggy but doing many things already ;) I will merge since PR is no longer reviewable. |

Submitting as a PR so it is easier to adopt -- use suggestions to add new commands/features or clarifications!
I would like to invite @effigies @tsalo @Lestropie @CodyCBakerPhD and all other @bids-standard/maintainers and @bids-standard/steering and @bids-standard/bep_leads so we could distill the best desired list of features for such a tool I hope we would develop and maintain long run.
TODOs as this potentially will just grow into a design and initial implementation ;)