You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BUG FIX: Replace is_file() with not is_dir() for file iteration in
session.py, subject.py, run.py, split.py, merge.py, _sidecars.py,
migrate.py. Path.is_file() follows symlinks and returns False for
annexed files without content, silently skipping them during rename.
ENHANCEMENT: --dry-run now accepts optional value: --dry-run (overview,
default) shows summary; --dry-run=detailed lists every file operation.
Session rename now enumerates per-file changes before the dry_run check
so both modes have full information. Overview mode filters out indented
detail lines; detailed mode shows "action: source → target" per file.
ENHANCEMENT: Annex operations logged via Python logging — INFO for
content fetches (--annexed=get), DEBUG for unlock/add. CLI verbosity
(-v, -q) wired to logging levels.
Add tmp_annex_dataset fixture (git-annex repo with locked symlinks).
6 regression tests verify all files (including symlinks) renamed
correctly. 8 tests for --dry-run modes and annex log messages.
1297 tests pass across py310-py314, lint, type, duplication.
Co-Authored-By: Claude Code 2.1.98 / Claude Opus 4.6 <noreply@anthropic.com>
Entire-Checkpoint: c963bb384041
Copy file name to clipboardExpand all lines: .specify/specs/00-initial-design.md
+13-3Lines changed: 13 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -203,7 +203,7 @@ A dataset needs to be split — for example, extracting only behavioral data or
203
203
- What happens when a rename creates a filename that exceeds OS path length limits?
204
204
→ **Resolution**: Refuse with exit code 2 and a clear error. Covered by FR-011 (refuse invalid state). No extra task needed — implement as a guard in `rename_file()`.
205
205
- How does the tool handle symlinked files (common with git-annex)?
206
-
→ **Resolution**: `_vcs.py` GitAnnex backend handles this. Locked annexed files are symlinks; `git annex` commands operate on them correctly. Covered by T017-T018.
206
+
→ **Resolution**: All file iteration code MUST treat symlinks as files (FR-023). `Path.is_file()` follows symlinks and returns `False` for annexed files without content — use `not path.is_dir()` instead. VCS operations (`git mv`, `git annex unlock/add`) handle symlinks correctly. Covered by T092.
207
207
- What happens when `_scans.tsv` references files that don't exist on disk (dangling references)?
208
208
→ **Resolution**: Warn but do not fail. Dangling references are a pre-existing dataset issue, not caused by bids-utils. Log at `-v` verbosity.
209
209
- How does the tool handle partial datasets (e.g., missing `dataset_description.json`)?
@@ -241,13 +241,20 @@ A dataset needs to be split — for example, extracting only behavioral data or
241
241
- Q: What about writing to annexed files? → A: Annexed files in locked mode (symlinks to `.git/annex/objects`) are read-only. Before modification, `unlock(paths)` must be called (`git annex unlock` / `datalad unlock`). After modification, `add(paths)` must be called (`git annex add`) to re-annex the file. The I/O layer provides `ensure_writable()` (unlock) and `mark_modified()` (add) to bracket writes. The full lifecycle for a modify operation on an annexed file is: get → unlock → read → modify → write → add.
242
242
- Q: Should `unlock`/`add` be implicit or require `--annexed=get`? → A: `unlock` and `add` apply whenever the VCS is git-annex/DataLad, regardless of `--annexed` mode. The `--annexed` mode only controls what happens when content is *missing*. If content is present but the file is locked, any write operation must unlock first — this is a VCS-level concern, not a policy choice.
243
243
244
+
### Session 2026-04-10
245
+
246
+
- Q: Should `--dry-run` show every file operation or just a summary? → A: Both. `--dry-run` (no value or `--dry-run=overview`) shows the current summary view (one line per subject/session). `--dry-run=detailed` lists every individual file rename, file edit, and `_scans.tsv` update. The detailed mode is what users need to verify correctness before committing. The overview mode remains the default for quick checks.
247
+
- Q: How should annexed content operations be logged? → A: When `--annexed=get` fetches content, log each file fetched at normal verbosity. In `--dry-run` mode, report which files *would* need content fetched. At `-v`, also log `unlock` and `add` operations.
248
+
-**BUG**: `session.py` and `subject.py` use `Path.is_file()` to filter files for renaming, but `is_file()` follows symlinks — returning `False` for annexed files without local content (broken symlinks into `.git/annex/objects`). This means **annexed data files (`.nii.gz`, etc.) are silently skipped during rename**. The fix: use `not path.is_dir()` or `path.is_file() or path.is_symlink()` everywhere that iterates over files for processing. This affects `session.py`, `subject.py`, `run.py`, `split.py`, `merge.py`, `_sidecars.py`, and `migrate.py`. All existing tests missed this because they use `tmp_path` fixtures with real files, never symlinks.
249
+
- Q: Why didn't the `bids-examples` integration tests catch the symlink bug? → A: `bids-examples` datasets contain regular files, not annexed symlinks. Integration tests need a fixture that creates a git-annex repo with locked (symlinked) files to exercise this path. Add a `tmp_annex_dataset` fixture.
250
+
244
251
## Requirements *(mandatory)*
245
252
246
253
### Functional Requirements
247
254
248
255
-**FR-001**: System MUST provide a Python library (`bids_utils`) with a clean, importable public API. Every CLI command maps to a library function.
249
256
-**FR-002**: System MUST provide a CLI (`bids-utils`) as a thin wrapper over the library API.
250
-
-**FR-003**: Every mutating command MUST support `--dry-run` / `-n` mode showing exactly what would change without modifying any files.
257
+
-**FR-003**: Every mutating command MUST support `--dry-run` / `-n` mode showing exactly what would change without modifying any files.`--dry-run` (or `--dry-run=overview`) shows a summary view; `--dry-run=detailed` lists every individual file operation (rename, edit, content fetch). SC-002 applies to the detailed mode.
251
258
-**FR-004**: System MUST detect and use VCS (git, git-annex, DataLad) when present — `git mv` instead of `os.rename`, etc. When no VCS is detected, operate directly on filesystem.
252
259
-**FR-005**: System MUST update `_scans.tsv` entries whenever referenced files are renamed or removed.
253
260
-**FR-006**: System MUST update `participants.tsv` when subjects are renamed or removed.
@@ -266,6 +273,8 @@ A dataset needs to be split — for example, extracting only behavioral data or
266
273
-**FR-019**: System MUST provide a `bids-utils completion [SHELL]` subcommand that outputs shell completion activation scripts. When `SHELL` argument is omitted, auto-detect from the `$SHELL` environment variable. Supported shells: Bash, Zsh, Fish (matching Click 8.0+ built-in completion support). Output goes to stdout only (no `--install` flag).
267
274
-**FR-020**: CLI MUST resolve the BIDS dataset root by: (1) using the `--dataset`/`-d` flag if provided, or (2) walking up the directory hierarchy from CWD until `dataset_description.json` is found. This resolution is used both by commands and by shell completion.
268
275
-**FR-021**: Shell completion MUST provide BIDS-aware completions: filesystem-derived items (`sub-*` directories, `ses-*` directories, BIDS file paths) and entity keys from the `bidsschematools` schema (e.g., `task=`, `run=`, `acq=`). Entity value completion (e.g., `task=rest`) is deferred to a later release.
276
+
-**FR-023**: All code that iterates over files MUST treat symlinks as files (not skip them). Use `not path.is_dir()` or `path.is_file() or path.is_symlink()` instead of bare `path.is_file()`. This is critical for git-annex datasets where data files are symlinks to `.git/annex/objects`.
277
+
-**FR-024**: Annexed content operations (get, unlock, add) MUST be logged. At normal verbosity, log each file fetched by `--annexed=get`. In `--dry-run` mode, report files that would need content fetched. At `-v`, also log unlock/add operations. This gives users visibility into what the annex layer is doing.
269
278
- **FR-022**: System MUST provide a group-level `--annexed` option controlling behavior when git-annex/DataLad file content is not locally available. Modes: `error` (default — informative error listing missing files and suggesting `--annexed=get` or `git annex get`), `get` (automatically fetch content via `git annex get` / `datalad get` before reading), `skip-warning` (skip files without content with a per-file warning), `skip` (skip silently). The option MUST also be settable via `BIDS_UTILS_ANNEXED` environment variable (CLI flag takes precedence). The VCS backend protocol MUST expose: `has_content(path)` and `get_content(paths)` for reads; `unlock(paths)` to make locked annexed files writable before modification; `add(paths)` to re-annex modified files after writes (restoring them to their original tracked state). All file reads (TSV, JSON sidecars) MUST go through a content-aware I/O layer. All file writes to potentially-annexed files MUST go through an unlock-before/add-after lifecycle managed by the I/O layer.
270
279
271
280
### Key Entities
@@ -282,7 +291,8 @@ A dataset needs to be split — for example, extracting only behavioral data or
282
291
### Measurable Outcomes
283
292
284
293
-**SC-001**: Every bids-examples dataset that is valid before a `rename`/`subject-rename`/`session-rename` operation is still valid after the operation completes.
285
-
-**SC-002**: `--dry-run` output for every command matches the actual changes when run without `--dry-run` (verified by comparing dry-run output to actual filesystem diff).
294
+
-**SC-002**: `--dry-run=detailed` output for every command matches the actual changes when run without `--dry-run` (verified by comparing dry-run output to actual filesystem diff). `--dry-run=overview` provides a human-friendly summary.
295
+
-**SC-008**: All file-renaming operations (session-rename, subject-rename, rename) correctly handle git-annex symlinks — verified by tests using a `tmp_annex_dataset` fixture with locked annexed files.
286
296
-**SC-003**: All commands complete on a 1000-subject dataset in O(n) time relative to affected files (not O(n²) in total dataset size). Single-entity operations (rename, remove-run) must not scan the entire dataset. Benchmark target: `rename` on a single file in a 1000-subject dataset completes in under 5 seconds.
287
297
-**SC-004**: Library API is independently usable: all acceptance scenarios can be executed via Python imports without the CLI.
288
298
-**SC-005**: 100% of mutating commands have both `--dry-run` and `--json` modes tested in CI.
Copy file name to clipboardExpand all lines: .specify/specs/00-initial-design/contracts/library-api.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -213,7 +213,7 @@ Group-level options (before the command):
213
213
-`--annexed MODE`: How to handle git-annex files without local content. Modes: `error` (default), `get`, `skip-warning`, `skip`. Also settable via `BIDS_UTILS_ANNEXED` env var.
214
214
215
215
Per-command common options:
216
-
-`--dry-run` / `-n`: Show what would change without modifying
216
+
-`--dry-run` / `-n`: Show what would change without modifying. Accepts optional value: `overview` (default, summary) or `detailed` (every file operation listed).
217
217
-`--json`: Machine-readable JSON output
218
218
-`-v` / `-q`: Verbosity control
219
219
-`--force`: Skip confirmation on destructive operations
**Goal**: Fix the `is_file()` symlink bug that silently skips annexed data files during rename operations. Enhance `--dry-run` to show per-file detail. Add annex operation logging.
174
+
175
+
**Steps**:
176
+
1.**Symlink bug fix (T092)**: Audit all `is_file()` calls used for file iteration. Replace with `not path.is_dir()` in `session.py`, `subject.py`, `run.py`, `split.py`, `merge.py`, `_sidecars.py`, `migrate.py`. Keep `is_file()` where checking for file existence (not iteration).
177
+
2.**Annex test fixture (T093)**: `tmp_annex_dataset` in conftest.py — git-annex repo with locked symlinks alongside regular files.
178
+
3.**Regression tests (T094)**: Session/subject/file rename on annexed dataset — verify all files including symlinks are renamed.
179
+
4.**Dry-run detail (T095-T096)**: `--dry-run=overview|detailed`. Update `common_options`, ensure all library functions populate per-file `Change` entries. `output_result` renders overview vs detailed.
180
+
5.**Annex logging (T097)**: INFO-level logging for get/unlock/add operations in `_io.py`.
**Purpose**: Fix critical git-annex symlink handling bug and enhance `--dry-run` to show per-file detail. These are blocking issues for real-world usage on annexed datasets.
-[X] T092 Replace all bare `path.is_file()` calls used for file iteration with `not path.is_dir()` (or `path.is_file() or path.is_symlink()`) in: `session.py` (2 sites), `subject.py` (2 sites), `run.py` (2 sites), `split.py` (1 site), `merge.py` (1 site), `_sidecars.py` (1 site), `migrate.py` (1 site). Preserve `is_file()` where semantically correct (e.g., `_dataset.py` checking `dataset_description.json` existence, `_scans.py` checking `_scans.tsv` existence — these are never annexed).
312
+
-[X] T093 Add `tmp_annex_dataset` pytest fixture in `tests/conftest.py`: creates a git-annex repo with locked (symlinked) data files (`.nii.gz`) alongside regular git files (`.json`, `.tsv`). Requires `git annex` to be installed (mark tests `skipif` otherwise).
313
+
-[X] T094 Write regression tests using `tmp_annex_dataset` for session-rename, subject-rename, and rename — verify that ALL files (including annexed symlinks) are renamed correctly (SC-008). Test both with content present and content absent.
314
+
315
+
### Enhanced dry-run (FR-003 update)
316
+
317
+
-[X] T095 Change `--dry-run` / `-n` from a boolean flag to an optional-value option: `--dry-run` (or `--dry-run=overview`) for current summary behavior, `--dry-run=detailed` for per-file listing. Update `common_options` in `cli/_common.py`, `OperationResult`, and `output_result()`. Library functions already populate `result.changes` with per-file detail — the change is in how `output_result` renders them.
318
+
-[X] T096 Ensure all library functions populate `result.changes` with per-file detail (not just one summary `Change` per subject/session). Audit `session.py`, `subject.py`, `rename.py` — the rename function already does this; session/subject need to add per-file `Change` entries for individual file renames within the session/subject operation.
319
+
320
+
### Annex operation logging (FR-024)
321
+
322
+
-[X] T097 Add logging to `_io.py` for annex operations: log at INFO level when `ensure_content` fetches a file (`--annexed=get`), when `ensure_writable` unlocks, when `mark_modified` re-adds. In `--dry-run` mode, report which files would need content fetched. Wire through to CLI verbosity (`-v` enables DEBUG, default shows INFO, `-q` suppresses).
323
+
324
+
### Tests
325
+
326
+
-[X] T098 Write tests for `--dry-run=detailed` output: verify per-file change listing for session-rename, subject-rename, rename. Verify `--dry-run=overview` retains current behavior. Verify `--dry-run` without value defaults to overview.
327
+
328
+
**Checkpoint**: `bids-utils --annexed=get session-rename --dry-run=detailed` shows every file that would be renamed/edited/fetched. Running without `--dry-run` on an annexed dataset correctly renames all files including symlinks.
329
+
330
+
---
331
+
305
332
## Phase 12: Polish & Cross-Cutting Concerns
306
333
307
334
**Purpose**: Improvements that affect multiple user stories.
@@ -323,6 +350,7 @@
323
350
-**Phase 0 (Scaffolding)**: No dependencies — start immediately
324
351
-**Phase 1 (Infrastructure)**: Depends on Phase 0 — BLOCKS all user stories
325
352
-**Phase 1b (Annexed Content / FR-022)**: Depends on Phase 1. Can be done at any point but SHOULD be done before real-world usage on git-annex/DataLad datasets. Retroactively completes VCS integration from Phase 1.
353
+
-**Phase 1c (Symlink Safety & Dry-Run Detail / FR-003, FR-023, FR-024)**: Depends on Phase 1b. BLOCKS real-world usage on annexed datasets — the symlink bug causes silent data loss (files not renamed). Should be done immediately after Phase 1b.
326
354
-**Phase 2 (Rename / US1)**: Depends on Phase 1
327
355
-**Phase 3 (Migrate 1.x / US2)**: Depends on Phase 2 (uses rename for suffix changes)
328
356
-**Phase 4 (Migrate 2.0 / US3)**: Depends on Phase 3
0 commit comments