Skip to content

Initial design + implementation done via spec-kit + claude code#4

Merged
yarikoptic merged 36 commits into
mainfrom
00-initial-design
Apr 10, 2026
Merged

Initial design + implementation done via spec-kit + claude code#4
yarikoptic merged 36 commits into
mainfrom
00-initial-design

Conversation

@yarikoptic
Copy link
Copy Markdown
Contributor

@yarikoptic yarikoptic commented Mar 16, 2026

I would like to invite @effigies @tsalo @Lestropie @CodyCBakerPhD and all other @bids-standard/maintainers and @bids-standard/steering and @bids-standard/bep_leads so we could distill the best desired list of features for such a tool I hope we would develop and maintain long run.

TODOs as this potentially will just grow into a design and initial implementation ;)

  • I moved spec-kit templates and entire.io config into main branch so here we have only the design itself
  • initial design doc discussed
  • spec-kit driven initial development
    • constitution, seeking feedback
    • specification
    • plan
    • todos
    • tasks
    • actual implementation

Comment thread docs/design/00-initial-design.md Outdated
Comment thread docs/design/00-initial-design.md Outdated
Comment thread docs/design/00-initial-design.md
Comment thread docs/design/00-initial-design.md Outdated
Comment thread docs/design/00-initial-design.md Outdated
Comment thread docs/design/00-initial-design.md
@yarikoptic yarikoptic changed the title Initial notes taken and extended from #2 Initial design notes Mar 18, 2026
@Lestropie
Copy link
Copy Markdown
Contributor

Some notes on e27e009, as leading from comment thread, on how the detection of potentially erroneously inconsistent metadata might be determined:

The algorithmic logic by which this should be done would be best designed around the same logic as the handling of the Inheritance Principle. Eg. Just because "task name" has a few different values across a dataset doesn't mean there's acquisition inconsistency if there's actually multiple _task- entities. Included within existing logic within IP-Freely is that the absence of a metadata field associated with a data file is itself effectively treated as a unique value for that metadata field. So any determination of commonality of metadata across data files, whether for IP exploitation or for detecting inconsistency potentially arising from erroneous acquisition, needs to treat metadata absence as information requisite for preservation.

@yarikoptic
Copy link
Copy Markdown
Contributor Author

ok, to make progress, I initiated right here the "Constitution" with/for spec-kit. On that process, to capture both provenance of my prompts but also better commit messages from claude I used our WiP

and it worked great: now I have 'RUNCMD' datalad commit with the command with the prompt as a merge commit to bring back commit where I ran it from and changes/commits which were done by `claude` image

Comment thread docs/design/00-initial-design.md

- **Package management**: `uv` with `pyproject.toml` as single source of truth.
- **Testing**: `pytest` orchestrated by `tox` (with `tox-uv`).
- **Linting**: `ruff` for formatting and linting.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add on DRY principle code duplication detection

Comment thread .specify/memory/constitution.md Outdated
Copy link
Copy Markdown

@just-meng just-meng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd be thrilled if the rename-tool can flow into this command line tool :D

Comment thread docs/design/00-initial-design.md
Comment thread docs/design/00-initial-design.md
yarikoptic and others added 12 commits April 3, 2026 18:23
Co-authored-by: Robert Smith <robert.smith@florey.edu.au>
Co-authored-by: Robert Smith <robert.smith@florey.edu.au>
Set up .specify/ directory structure and establish the bids-utils
constitution (v1.1.0) defining 9 core principles: Do No Harm,
Schema-Driven and Version-Flexible, Library-First, CLI Excellence,
Test-First, Performance at Scale, VCS Awareness, Observability,
and Simplicity. Includes constitution update checklist for
maintaining template consistency.

Co-Authored-By: Claude Code 2.1.81 / Claude Opus 4.6 <noreply@anthropic.com>
…idator details

- PyBIDS: consult interfaces but require very significant benefit to adopt;
  bids2table is a lighter alternative to evaluate first
- bids-validator: primary is Deno-based (bids-validator-deno on PyPI);
  WiP Python validator may be adopted later

Co-Authored-By: Claude Code 2.1.81 / Claude Opus 4.6 <noreply@anthropic.com>
…onstitution

- Add principle X (Versioning & Breaking Changes): SemVer, migration guides,
  deprecation warnings
- Add principle XI (DRY): duplication detection via pylint duplicate-code
  (AST-aware, pragma support) and jscpd (token-based, CI threshold gating),
  both as tox testenvs
- Add Releases section: mandate automated releases via intuit/auto or
  comparable tooling with PR-label-driven versioning and changelog generation
- List duplication detection in Tooling section

Co-Authored-By: Claude Code 2.1.81 / Claude Opus 4.6 <noreply@anthropic.com>
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "yolo -- '/speckit.constitution something went not right before but we did get constitution in .specify/memory/constitution.md and checklist. Verify that constitution as it should be so we could go to the next step'",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^
Co-authored-by: Yaroslav Halchenko <debian@onerussian.com>
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "yolo '/speckit.specify Build a Python application/library following what is described in docs/design/00-initial-design.md file'",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

Entire-Checkpoint: c472dec6f48b
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "yolo '/speckit.clarify - in prior commit I added some changes to user stories -- please analyze and potentially adjust user stories to reflect better. Also please add a user story for migrate within 1.x series.  Review bids-specification deprecations through the versions (specification is under /home/yoh/proj/bids/bids-specification-master and git grep there for DEPRECATED). So add a prominent user story that we want to add migrations within 1.x (default version to migrate to is a current released) and handling deprecations (e.g. renaming metadata variables and/or migrating values)'",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

Entire-Checkpoint: ba11cad252d1
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "yolo '/speckit.clarify - in prior commit I added some changes to user stories -- please analyze and potentially adjust user stories to reflect better.'",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

Entire-Checkpoint: cd9954849052
yarikoptic and others added 19 commits April 3, 2026 20:03
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "yolo '/speckit.plan proceed while also doing research on prototypes and related projects listed in  docs/design/00-initial-design.md file'",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

Entire-Checkpoint: 42c05a1a8f38
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "yolo /speckit.tasks",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

Entire-Checkpoint: 538c9cb6369d
Full implementation of all 10 user stories from the spec:

- Core infrastructure: BIDSPath parsing, BIDSDataset discovery,
  schema wrapper (bidsschematools), VCS backends (git/git-annex/datalad),
  sidecar discovery, _scans.tsv and participants.tsv management
- rename: file + sidecar rename with scans update and conflict detection
- migrate: schema-driven 1.x deprecation fixes (field renames, enum
  renames, path→BIDS URI, DOI format, ScanDate→scans.tsv)
- subject-rename / remove: full dataset-wide subject operations
- session-rename: including move-into-session
- metadata aggregate/segregate/audit: inheritance-aware manipulation
- remove-run: with --shift/--no-shift reindexing
- merge: combine datasets with --into-sessions and conflict handling
- split: extract subsets by suffix or datatype

Also includes:
- Project scaffolding (pyproject.toml, tox.ini, CI, mkdocs, .autorc)
- bids-examples git submodule for integration testing
- 128 tests covering all modules and CLI commands

Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 1abc11550506
- Fix all ruff lint errors (line length, unused imports/vars, import
  sorting, set comprehension, zip strict)
- Fix mypy errors (variable shadowing in except/for blocks)
- Eliminate duplicate code by extracting shared helpers:
  - _tsv.py: common TSV read/write used by _scans.py and _participants.py
  - cli/_common.py: output_result() and load_dataset() used by all CLI cmds
  - _types.py: rename_change(), normalize_subject_id(), require_subject_dir()
- Remove TCH lint rules (false positives for runtime Path usage)
- Add CLAUDE.md with mandatory tox-before-commit rule
- Add tox gate rule to .specify/memory/constitution.md

136 tests pass, ruff/mypy/pylint all clean.

Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 1e95ab52403e
The click subcommand modules were never imported in cli/__init__.py,
so `bids-utils --help` showed no commands. Add imports after the
`main` group definition and add a test that asserts all 9 expected
commands are present in --help output.

Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: a97c91242c0b
Entire-Checkpoint: 2ac300b095a4
Remediations for the top issues found by /speckit.analyze:

- C2: Create tests/integration/test_bids_examples.py with sweep tests
  for rename (dry-run), subject-rename (dry-run), migrate (dry-run),
  and one mutating rename test. 276 integration tests across all
  bids-examples datasets.
- Fix migrate crash on datasets with BIDSVersion "n/a" by handling
  InvalidVersion gracefully in _get_rules().
- H1: Add interactive confirmation prompt to `bids-utils remove`
  when --force is not passed (constitution Principle I).
- H3: Remove "deduplicate" ghost reference from plan.md Phase 6
  (not in spec, tasks, or implementation).
- M7: Unmark 19 tasks in tasks.md that were incorrectly marked [X]
  (bids-examples sweeps, 2.0 migration, multi-schema testing,
  performance profiling, suffix deprecation handler).

416 tests pass (140 unit + 276 integration), all linters clean.

Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: be9fe515efd3
- Search for EEG (.edf, .vhdr, .set, .bdf), MEG (.fif), fNIRS
  (.snirf), microscopy (.ome.tif, .ome.zarr), TSV and JSON files —
  not just .nii.gz — so more bids-examples datasets are tested
- Use `reason=` kwarg in all pytest.skip() calls for clear output
- Results: 304 pass (was 276), 18 skip (was 46, now only atlas-*
  datasets which use tpl-* not sub-*)

Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: d905ac840283
So the skip output shows *why* a dataset cannot be loaded (e.g.
"Missing BIDSVersion" vs "No dataset_description.json") rather
than a generic "cannot load dataset".

Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 51478c10f63e
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "yolo '/speckit.clarify there should be easy ways to enable shell completion for bids-utils commands. https://click.palletsprojects.com/en/stable/shell-completion/ since click is used could be the relevant documentation'",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

Entire-Checkpoint: 69e313ea2bc1
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "yolo /speckit.tasks",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

Entire-Checkpoint: 0ecffc352f3b
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "yolo -p '/speckit.implement T034 T035  — complete 1.x migration'",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

Entire-Checkpoint: 76a8b5466a9d
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "yolo -p '/speckit.implement T054 T061 T068 T072 — sweep tests'",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

Entire-Checkpoint: 8a2d1c723bd9
=== Do not change lines below ===
{
 "chain": [],
 "cmd": "yolo -p /speckit.implement",
 "exit": 0,
 "extra_inputs": [],
 "inputs": [],
 "outputs": [],
 "pwd": "."
}
^^^ Do not change lines above ^^^

Entire-Checkpoint: 0816729e7f79
Cross-artifact analysis identified 1 CRITICAL, 2 HIGH, 9 MEDIUM, and
4 LOW findings. This commit applies concrete remediations:

- Add Phase 11 (tasks T083-T085) for shell completion (FR-019/020/021)
  which had zero coverage in tasks.md
- Expand US10 (Split) from 1 to 4 acceptance scenarios
- Resolve all 11 unanswered edge case questions with dispositions
- Add remove_run() and split_dataset() to library API contract
- Add MigrationResult dataclass to data-model.md
- Make SC-003 measurable (5-second benchmark target)
- Mark Phase 4 (BIDS 2.0) as PROVISIONAL pending schema stabilization
- Update plan.md to reflect actual code structure (_tsv.py, test_cli_common.py)
- Annotate FR-016 as specific application of FR-009

Co-Authored-By: Claude Code 2.1.92 / Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: ac75835aa773
Add `bids-utils completion [bash|zsh|fish]` command (FR-019) that outputs
shell-specific activation scripts, with auto-detection from $SHELL.

Add BIDS-aware custom completions (FR-020, FR-021): SubjectCompletion,
SessionCompletion, EntityKeyCompletion, and BIDSFileCompletion types
wired into rename, subject-rename, session-rename, and remove commands.
Dataset root resolved by walking up from CWD to dataset_description.json.

26 tests in tests/test_completion.py cover shell detection, activation
script output, subject/session/entity/file completions, and dataset
root discovery.

Co-Authored-By: Claude Code 2.1.94 / Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 0604dcdcb54a
Add Phase 1b (T086-T091) for content-aware I/O on git-annex/DataLad
datasets. Group-level --annexed option (error/get/skip-warning/skip)
controls missing content policy; BIDS_UTILS_ANNEXED env var for
persistent preference.

VCS protocol extended with four new methods: has_content/get_content
for reads, unlock/add for writes. The _io.py layer provides
ensure_content (gated by --annexed mode), ensure_writable and
mark_modified (always active for annex backends — correctness, not
policy). Full lifecycle: get → unlock → read → modify → write → add.

Updates: spec (FR-022, edge cases, clarifications, assumptions),
library-api contract (VCSBackend protocol, _io API, BIDSDataset.annexed_mode),
plan (Phase 1b steps), tasks (T086-T091 + dependency graph).

Co-Authored-By: Claude Code 2.1.98 / Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 902a7cfc1d2a
Add --annexed group-level option (error/get/skip-warning/skip) with
BIDS_UTILS_ANNEXED env var for git-annex/DataLad datasets where file
content may not be locally available.

VCS protocol extended with has_content/get_content (reads) and
unlock/add (writes). Content-aware I/O layer (_io.py) provides
ensure_content, ensure_writable, mark_modified, read_json, write_json.

Full lifecycle for annexed files: get → unlock → read → modify →
write → add. The --annexed mode controls missing-content policy;
unlock/add are always active for annex backends (correctness).

Wired through all file readers/writers: _tsv, _scans, _participants,
session, subject, rename, metadata (~9 sites), migrate (~17 sites).
load_dataset() applies annexed_mode from CLI context automatically.

38 new tests (test_io.py, test_vcs.py, test_cli_common.py additions).
All 1288 tests pass across py310-py314, lint, type, duplication.

Co-Authored-By: Claude Code 2.1.98 / Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: 439757156f50
…2-T098)

BUG FIX: Replace is_file() with not is_dir() for file iteration in
session.py, subject.py, run.py, split.py, merge.py, _sidecars.py,
migrate.py. Path.is_file() follows symlinks and returns False for
annexed files without content, silently skipping them during rename.

ENHANCEMENT: --dry-run now accepts optional value: --dry-run (overview,
default) shows summary; --dry-run=detailed lists every file operation.
Session rename now enumerates per-file changes before the dry_run check
so both modes have full information. Overview mode filters out indented
detail lines; detailed mode shows "action: source → target" per file.

ENHANCEMENT: Annex operations logged via Python logging — INFO for
content fetches (--annexed=get), DEBUG for unlock/add. CLI verbosity
(-v, -q) wired to logging levels.

Add tmp_annex_dataset fixture (git-annex repo with locked symlinks).
6 regression tests verify all files (including symlinks) renamed
correctly. 8 tests for --dry-run modes and annex log messages.
1297 tests pass across py310-py314, lint, type, duplication.

Co-Authored-By: Claude Code 2.1.98 / Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: c963bb384041
@yarikoptic yarikoptic changed the title Initial design notes Initial design + implementation done via spec-kit + claude code Apr 10, 2026
@yarikoptic
Copy link
Copy Markdown
Contributor Author

FWIW -- we have initial basic implementation. Buggy but doing many things already ;) I will merge since PR is no longer reviewable.

@yarikoptic yarikoptic merged commit 94874bc into main Apr 10, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants