Initial design + implementation done via spec-kit + claude code by yarikoptic · Pull Request #4 · bids-standard/bids-utils

yarikoptic · 2026-03-16T13:32:07Z

Taken from Should this repo get 2nd life as bids-cmdline or smth like that? #2
Submitting as a PR so it is easier to adopt -- use suggestions to add new commands/features or clarifications!

I would like to invite @effigies @tsalo @Lestropie @CodyCBakerPhD and all other @bids-standard/maintainers and @bids-standard/steering and @bids-standard/bep_leads so we could distill the best desired list of features for such a tool I hope we would develop and maintain long run.

TODOs as this potentially will just grow into a design and initial implementation ;)

I moved spec-kit templates and entire.io config into main branch so here we have only the design itself
initial design doc discussed
spec-kit driven initial development
- constitution, seeking feedback
- specification
- plan
- todos
- tasks
- actual implementation

Lestropie · 2026-03-18T23:56:54Z

Some notes on e27e009, as leading from comment thread, on how the detection of potentially erroneously inconsistent metadata might be determined:

The algorithmic logic by which this should be done would be best designed around the same logic as the handling of the Inheritance Principle. Eg. Just because "task name" has a few different values across a dataset doesn't mean there's acquisition inconsistency if there's actually multiple _task- entities. Included within existing logic within IP-Freely is that the absence of a metadata field associated with a data file is itself effectively treated as a unique value for that metadata field. So any determination of commonality of metadata across data files, whether for IP exploitation or for detecting inconsistency potentially arising from erroneous acquisition, needs to treat metadata absence as information requisite for preservation.

yarikoptic · 2026-03-21T23:50:05Z

ok, to make progress, I initiated right here the "Constitution" with/for spec-kit. On that process, to capture both provenance of my prompts but also better commit messages from claude I used our WiP

NF: make run produce a merge commit when command created commits while running datalad/datalad#7821

and it worked great: now I have 'RUNCMD' datalad commit with the command with the prompt as a merge commit to bring back commit where I ran it from and changes/commits which were done by `claude`

yarikoptic · 2026-04-02T13:29:05Z

+
+- **Package management**: `uv` with `pyproject.toml` as single source of truth.
+- **Testing**: `pytest` orchestrated by `tox` (with `tox-uv`).
+- **Linting**: `ruff` for formatting and linting.


I will add on DRY principle code duplication detection

just-meng

i'd be thrilled if the rename-tool can flow into this command line tool :D

Co-authored-by: Robert Smith <robert.smith@florey.edu.au>

Set up .specify/ directory structure and establish the bids-utils constitution (v1.1.0) defining 9 core principles: Do No Harm, Schema-Driven and Version-Flexible, Library-First, CLI Excellence, Test-First, Performance at Scale, VCS Awareness, Observability, and Simplicity. Includes constitution update checklist for maintaining template consistency. Co-Authored-By: Claude Code 2.1.81 / Claude Opus 4.6 <noreply@anthropic.com>

…idator details - PyBIDS: consult interfaces but require very significant benefit to adopt; bids2table is a lighter alternative to evaluate first - bids-validator: primary is Deno-based (bids-validator-deno on PyPI); WiP Python validator may be adopted later Co-Authored-By: Claude Code 2.1.81 / Claude Opus 4.6 <noreply@anthropic.com>

…onstitution - Add principle X (Versioning & Breaking Changes): SemVer, migration guides, deprecation warnings - Add principle XI (DRY): duplication detection via pylint duplicate-code (AST-aware, pragma support) and jscpd (token-based, CI threshold gating), both as tox testenvs - Add Releases section: mandate automated releases via intuit/auto or comparable tooling with PR-label-driven versioning and changelog generation - List duplication detection in Tooling section Co-Authored-By: Claude Code 2.1.81 / Claude Opus 4.6 <noreply@anthropic.com>

=== Do not change lines below === { "chain": [], "cmd": "yolo -- '/speckit.constitution something went not right before but we did get constitution in .specify/memory/constitution.md and checklist. Verify that constitution as it should be so we could go to the next step'", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

=== Do not change lines below === { "chain": [], "cmd": "yolo '/speckit.specify Build a Python application/library following what is described in docs/design/00-initial-design.md file'", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ Entire-Checkpoint: c472dec6f48b

=== Do not change lines below === { "chain": [], "cmd": "yolo '/speckit.clarify - in prior commit I added some changes to user stories -- please analyze and potentially adjust user stories to reflect better. Also please add a user story for migrate within 1.x series. Review bids-specification deprecations through the versions (specification is under /home/yoh/proj/bids/bids-specification-master and git grep there for DEPRECATED). So add a prominent user story that we want to add migrations within 1.x (default version to migrate to is a current released) and handling deprecations (e.g. renaming metadata variables and/or migrating values)'", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ Entire-Checkpoint: ba11cad252d1

=== Do not change lines below === { "chain": [], "cmd": "yolo '/speckit.clarify - in prior commit I added some changes to user stories -- please analyze and potentially adjust user stories to reflect better.'", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ Entire-Checkpoint: cd9954849052

=== Do not change lines below === { "chain": [], "cmd": "yolo '/speckit.plan proceed while also doing research on prototypes and related projects listed in docs/design/00-initial-design.md file'", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ Entire-Checkpoint: 42c05a1a8f38

=== Do not change lines below === { "chain": [], "cmd": "yolo /speckit.tasks", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ Entire-Checkpoint: 538c9cb6369d

Full implementation of all 10 user stories from the spec: - Core infrastructure: BIDSPath parsing, BIDSDataset discovery, schema wrapper (bidsschematools), VCS backends (git/git-annex/datalad), sidecar discovery, _scans.tsv and participants.tsv management - rename: file + sidecar rename with scans update and conflict detection - migrate: schema-driven 1.x deprecation fixes (field renames, enum renames, path→BIDS URI, DOI format, ScanDate→scans.tsv) - subject-rename / remove: full dataset-wide subject operations - session-rename: including move-into-session - metadata aggregate/segregate/audit: inheritance-aware manipulation - remove-run: with --shift/--no-shift reindexing - merge: combine datasets with --into-sessions and conflict handling - split: extract subsets by suffix or datatype Also includes: - Project scaffolding (pyproject.toml, tox.ini, CI, mkdocs, .autorc) - bids-examples git submodule for integration testing - 128 tests covering all modules and CLI commands Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 1abc11550506

- Fix all ruff lint errors (line length, unused imports/vars, import sorting, set comprehension, zip strict) - Fix mypy errors (variable shadowing in except/for blocks) - Eliminate duplicate code by extracting shared helpers: - _tsv.py: common TSV read/write used by _scans.py and _participants.py - cli/_common.py: output_result() and load_dataset() used by all CLI cmds - _types.py: rename_change(), normalize_subject_id(), require_subject_dir() - Remove TCH lint rules (false positives for runtime Path usage) - Add CLAUDE.md with mandatory tox-before-commit rule - Add tox gate rule to .specify/memory/constitution.md 136 tests pass, ruff/mypy/pylint all clean. Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 1e95ab52403e

The click subcommand modules were never imported in cli/__init__.py, so `bids-utils --help` showed no commands. Add imports after the `main` group definition and add a test that asserts all 9 expected commands are present in --help output. Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: a97c91242c0b

Entire-Checkpoint: 2ac300b095a4

Remediations for the top issues found by /speckit.analyze: - C2: Create tests/integration/test_bids_examples.py with sweep tests for rename (dry-run), subject-rename (dry-run), migrate (dry-run), and one mutating rename test. 276 integration tests across all bids-examples datasets. - Fix migrate crash on datasets with BIDSVersion "n/a" by handling InvalidVersion gracefully in _get_rules(). - H1: Add interactive confirmation prompt to `bids-utils remove` when --force is not passed (constitution Principle I). - H3: Remove "deduplicate" ghost reference from plan.md Phase 6 (not in spec, tasks, or implementation). - M7: Unmark 19 tasks in tasks.md that were incorrectly marked [X] (bids-examples sweeps, 2.0 migration, multi-schema testing, performance profiling, suffix deprecation handler). 416 tests pass (140 unit + 276 integration), all linters clean. Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: be9fe515efd3

- Search for EEG (.edf, .vhdr, .set, .bdf), MEG (.fif), fNIRS (.snirf), microscopy (.ome.tif, .ome.zarr), TSV and JSON files — not just .nii.gz — so more bids-examples datasets are tested - Use `reason=` kwarg in all pytest.skip() calls for clear output - Results: 304 pass (was 276), 18 skip (was 46, now only atlas-* datasets which use tpl-* not sub-*) Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: d905ac840283

So the skip output shows *why* a dataset cannot be loaded (e.g. "Missing BIDSVersion" vs "No dataset_description.json") rather than a generic "cannot load dataset". Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 51478c10f63e

=== Do not change lines below === { "chain": [], "cmd": "yolo '/speckit.clarify there should be easy ways to enable shell completion for bids-utils commands. https://click.palletsprojects.com/en/stable/shell-completion/ since click is used could be the relevant documentation'", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ Entire-Checkpoint: 69e313ea2bc1

=== Do not change lines below === { "chain": [], "cmd": "yolo /speckit.tasks", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ Entire-Checkpoint: 0ecffc352f3b

=== Do not change lines below === { "chain": [], "cmd": "yolo -p '/speckit.implement T034 T035 — complete 1.x migration'", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ Entire-Checkpoint: 76a8b5466a9d

=== Do not change lines below === { "chain": [], "cmd": "yolo -p '/speckit.implement T054 T061 T068 T072 — sweep tests'", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ Entire-Checkpoint: 8a2d1c723bd9

=== Do not change lines below === { "chain": [], "cmd": "yolo -p /speckit.implement", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ Entire-Checkpoint: 0816729e7f79

Cross-artifact analysis identified 1 CRITICAL, 2 HIGH, 9 MEDIUM, and 4 LOW findings. This commit applies concrete remediations: - Add Phase 11 (tasks T083-T085) for shell completion (FR-019/020/021) which had zero coverage in tasks.md - Expand US10 (Split) from 1 to 4 acceptance scenarios - Resolve all 11 unanswered edge case questions with dispositions - Add remove_run() and split_dataset() to library API contract - Add MigrationResult dataclass to data-model.md - Make SC-003 measurable (5-second benchmark target) - Mark Phase 4 (BIDS 2.0) as PROVISIONAL pending schema stabilization - Update plan.md to reflect actual code structure (_tsv.py, test_cli_common.py) - Annotate FR-016 as specific application of FR-009 Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: ac75835aa773

Add `bids-utils completion [bash|zsh|fish]` command (FR-019) that outputs shell-specific activation scripts, with auto-detection from $SHELL. Add BIDS-aware custom completions (FR-020, FR-021): SubjectCompletion, SessionCompletion, EntityKeyCompletion, and BIDSFileCompletion types wired into rename, subject-rename, session-rename, and remove commands. Dataset root resolved by walking up from CWD to dataset_description.json. 26 tests in tests/test_completion.py cover shell detection, activation script output, subject/session/entity/file completions, and dataset root discovery. Co-Authored-By: Claude Code 2.1.94 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 0604dcdcb54a

Add Phase 1b (T086-T091) for content-aware I/O on git-annex/DataLad datasets. Group-level --annexed option (error/get/skip-warning/skip) controls missing content policy; BIDS_UTILS_ANNEXED env var for persistent preference. VCS protocol extended with four new methods: has_content/get_content for reads, unlock/add for writes. The _io.py layer provides ensure_content (gated by --annexed mode), ensure_writable and mark_modified (always active for annex backends — correctness, not policy). Full lifecycle: get → unlock → read → modify → write → add. Updates: spec (FR-022, edge cases, clarifications, assumptions), library-api contract (VCSBackend protocol, _io API, BIDSDataset.annexed_mode), plan (Phase 1b steps), tasks (T086-T091 + dependency graph). Co-Authored-By: Claude Code 2.1.98 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 902a7cfc1d2a

Add --annexed group-level option (error/get/skip-warning/skip) with BIDS_UTILS_ANNEXED env var for git-annex/DataLad datasets where file content may not be locally available. VCS protocol extended with has_content/get_content (reads) and unlock/add (writes). Content-aware I/O layer (_io.py) provides ensure_content, ensure_writable, mark_modified, read_json, write_json. Full lifecycle for annexed files: get → unlock → read → modify → write → add. The --annexed mode controls missing-content policy; unlock/add are always active for annex backends (correctness). Wired through all file readers/writers: _tsv, _scans, _participants, session, subject, rename, metadata (~9 sites), migrate (~17 sites). load_dataset() applies annexed_mode from CLI context automatically. 38 new tests (test_io.py, test_vcs.py, test_cli_common.py additions). All 1288 tests pass across py310-py314, lint, type, duplication. Co-Authored-By: Claude Code 2.1.98 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 439757156f50

…2-T098) BUG FIX: Replace is_file() with not is_dir() for file iteration in session.py, subject.py, run.py, split.py, merge.py, _sidecars.py, migrate.py. Path.is_file() follows symlinks and returns False for annexed files without content, silently skipping them during rename. ENHANCEMENT: --dry-run now accepts optional value: --dry-run (overview, default) shows summary; --dry-run=detailed lists every file operation. Session rename now enumerates per-file changes before the dry_run check so both modes have full information. Overview mode filters out indented detail lines; detailed mode shows "action: source → target" per file. ENHANCEMENT: Annex operations logged via Python logging — INFO for content fetches (--annexed=get), DEBUG for unlock/add. CLI verbosity (-v, -q) wired to logging levels. Add tmp_annex_dataset fixture (git-annex repo with locked symlinks). 6 regression tests verify all files (including symlinks) renamed correctly. 8 tests for --dry-run modes and annex log messages. 1297 tests pass across py310-py314, lint, type, duplication. Co-Authored-By: Claude Code 2.1.98 / Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: c963bb384041

yarikoptic · 2026-04-10T19:29:28Z

FWIW -- we have initial basic implementation. Buggy but doing many things already ;) I will merge since PR is no longer reviewable.

yarikoptic mentioned this pull request Mar 16, 2026

Should this repo get 2nd life as bids-cmdline or smth like that? #2

Closed

CodyCBakerPhD reviewed Mar 16, 2026

View reviewed changes

Comment thread docs/design/00-initial-design.md Outdated

yarikoptic commented Mar 16, 2026

View reviewed changes

Comment thread docs/design/00-initial-design.md Outdated

Lestropie reviewed Mar 16, 2026

View reviewed changes

Comment thread docs/design/00-initial-design.md

Lestropie reviewed Mar 16, 2026

View reviewed changes

Comment thread docs/design/00-initial-design.md Outdated

Lestropie reviewed Mar 18, 2026

View reviewed changes

Comment thread docs/design/00-initial-design.md Outdated

yarikoptic commented Mar 18, 2026

View reviewed changes

Comment thread docs/design/00-initial-design.md

yarikoptic changed the title ~~Initial notes taken and extended from #2~~ Initial design notes Mar 18, 2026

This was referenced Mar 19, 2026

ENH: Make ./run_tests.sh to optionally store bids-validator outputs under derivatives/bids-validator bids-standard/bids-examples#547

Draft

What if all validation records were in .jsonl files?! bids-standard/bids-examples#548

Closed

yarikoptic mentioned this pull request Mar 22, 2026

NF: make run produce a merge commit when command created commits while running datalad/datalad#7821

Merged

10 tasks

yarikoptic commented Apr 2, 2026

View reviewed changes

Comment thread docs/design/00-initial-design.md

yarikoptic commented Apr 2, 2026

View reviewed changes

Comment thread .specify/memory/constitution.md Outdated

just-meng reviewed Apr 2, 2026

View reviewed changes

Comment thread docs/design/00-initial-design.md

Comment thread docs/design/00-initial-design.md

yarikoptic force-pushed the 00-initial-design branch from 1881dd1 to 20dca93 Compare April 3, 2026 22:19

yarikoptic and others added 12 commits April 3, 2026 18:23

Initial notes taken and extended from #2

23441de

Clarifications and "deduplicate" as the mode

e6a414c

note that we need to adjust participants.tsv

924dbbe

Co-authored-by: Robert Smith <robert.smith@florey.edu.au>

add "audit" idea

8ce518b

Co-authored-by: Robert Smith <robert.smith@florey.edu.au>

clarification on potential trickiness in 'aggregate'

edfbef9

mention rename-tool

ee60e74

Apply suggestions from code review

888a641

Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>

yarikoptic added 5 commits April 3, 2026 18:23

Reviewed/adjusted a few aspects of user stories

526163c

some notes on the user stories

4963045

pointer to file-mapper @Lestropie mentioned

a9333c9

yarikoptic force-pushed the 00-initial-design branch from 20dca93 to a9333c9 Compare April 3, 2026 22:23

yarikoptic and others added 19 commits April 3, 2026 20:03

[DATALAD RUNCMD] yolo /speckit.tasks

95288bd

=== Do not change lines below === { "chain": [], "cmd": "yolo /speckit.tasks", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ Entire-Checkpoint: 538c9cb6369d

gitignore uv.lock and .duct for now

f8cb41a

Entire-Checkpoint: 2ac300b095a4

[DATALAD RUNCMD] yolo /speckit.tasks

0daa94c

=== Do not change lines below === { "chain": [], "cmd": "yolo /speckit.tasks", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ Entire-Checkpoint: 0ecffc352f3b

[DATALAD RUNCMD] yolo -p /speckit.implement

12e1ab3

=== Do not change lines below === { "chain": [], "cmd": "yolo -p /speckit.implement", "exit": 0, "extra_inputs": [], "inputs": [], "outputs": [], "pwd": "." } ^^^ Do not change lines above ^^^ Entire-Checkpoint: 0816729e7f79

yarikoptic changed the title ~~Initial design notes~~ Initial design + implementation done via spec-kit + claude code Apr 10, 2026

yarikoptic merged commit 94874bc into main Apr 10, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial design + implementation done via spec-kit + claude code#4

Initial design + implementation done via spec-kit + claude code#4
yarikoptic merged 36 commits into
mainfrom
00-initial-design

yarikoptic commented Mar 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lestropie commented Mar 18, 2026

Uh oh!

yarikoptic commented Mar 21, 2026

Uh oh!

Uh oh!

yarikoptic Apr 2, 2026

Uh oh!

Uh oh!

just-meng left a comment

Uh oh!

Uh oh!

Uh oh!

yarikoptic commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

yarikoptic commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Lestropie commented Mar 18, 2026

Uh oh!

yarikoptic commented Mar 21, 2026

Uh oh!

Uh oh!

yarikoptic Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

just-meng left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

yarikoptic commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yarikoptic commented Mar 16, 2026 •

edited

Loading