You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[DATALAD RUNCMD] yolo '/speckit.clarify - in prior commit...
=== Do not change lines below ===
{
"chain": [],
"cmd": "yolo '/speckit.clarify - in prior commit I added some changes to user stories -- please analyze and potentially adjust user stories to reflect better.'",
"exit": 0,
"extra_inputs": [],
"inputs": [],
"outputs": [],
"pwd": "."
}
^^^ Do not change lines above ^^^
Entire-Checkpoint: cd9954849052
Copy file name to clipboardExpand all lines: .specify/specs/00-initial-design.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -117,7 +117,7 @@ Similar to subject-rename but for session entities. Includes the special case of
117
117
118
118
### User Story 6 — Bubble-up / condense / organize metadata (Priority: P2, need: medium)
119
119
120
-
A dataset has metadata duplicated across many sidecar JSON files at the leaf level. The user runs `bids-utils metadata aggregate` to hoist common key-value pairs up the BIDS inheritance hierarchy, reducing redundancy and making the dataset easier to overview.
120
+
A dataset has metadata duplicated across many sidecar JSON files at the leaf level. The user runs `bids-utils metadata aggregate` to hoist common key-value pairs up the BIDS inheritance hierarchy, reducing redundancy and making the dataset easier to overview. Both `aggregate` and `segregate` accept optional path arguments to scope their operation (e.g., per-subject only) and support `--mode copy|move` to control whether metadata is duplicated or relocated.
121
121
122
122
**Why this priority**: Medium need per design doc. Addresses a real pain point with large datasets. The `aggregate`, `segregate`, and `deduplicate` modes serve different workflows.
123
123
@@ -129,8 +129,8 @@ A dataset has metadata duplicated across many sidecar JSON files at the leaf lev
129
129
2.**Given** a subject that is missing a `_bold.json` entirely (but has `_bold.nii.gz`), **When** aggregation is attempted for `RepetitionTime`, **Then** the tool does NOT aggregate that key (since the value is unknown for that subject, not merely identical).
130
130
3.**Given** a user running `bids-utils metadata segregate`, **When** the command completes, **Then** all metadata is pushed down to leaf-level files (full self-contained sidecars per file).
131
131
4.**Given**`bids-utils metadata audit`, **When** run, **Then** the tool reports metadata keys that are neither fully unique nor fully equivalent across files — indicating potential acquisition inconsistencies.
132
-
5.The `aggregate` and `segregate` could be given specific path(s) to aggregate or segregate to, e.g. `aggregate sub-*/`would bubble-up common metadata per each subject only. Both commands by default would operate across all levels, thus bringing up/down common/different metadata to appropriate level in the hierarchy.
133
-
6.Both `aggregate` and `segregate` should have an option on either to 'normalize' metadata presense, as to "copy" (duplicate) or to "move" thus making it defined uniquely (without duplication).
132
+
5.**Given** a dataset with multiple subjects, **When**`bids-utils metadata aggregate sub-01/` is run, **Then** only metadata within `sub-01/`is aggregated (common keys bubble up to `sub-01/` level sidecars), while other subjects' metadata is untouched. By default (no path argument), aggregation operates across all levels of the hierarchy.
133
+
6.**Given**`bids-utils metadata aggregate --mode copy`, **When** run, **Then** common metadata is written to the higher-level sidecar but also retained in leaf-level files (normalization by duplication). **Given**`--mode move` (the default), **When** run, **Then** common metadata is removed from leaf-level files after being placed at the higher level (no duplication).
134
134
135
135
---
136
136
@@ -166,7 +166,7 @@ A specific run needs to be removed and subsequent run indices shifted to maintai
166
166
167
167
### User Story 9 — Merge datasets (Priority: P3, need: medium)
168
168
169
-
Two BIDS datasets need to be combined — either by simply combining subjects (failing on conflicts) or by placing each dataset into a separate session.
169
+
Two BIDS datasets need to be combined — either by simply combining subjects (failing on conflicts) or by placing each dataset into a separate session. A common workflow is incremental merge: BIDS conversion is done per subject/session producing many small datasets, which are then merged one-by-one into a growing target dataset. Merge must also handle intra-session file conflicts (e.g., additional runs from a split acquisition) and metadata conflicts (e.g., differing `participants.tsv` values or aggregated sidecar metadata).
170
170
171
171
**Why this priority**: Medium per Yarik. Implementation builds on session-rename and also potentially on metadata aggregate/segregate.
172
172
@@ -177,9 +177,9 @@ Two BIDS datasets need to be combined — either by simply combining subjects (f
177
177
1.**Given** two valid datasets with non-overlapping subjects, **When**`bids-utils merge datasetA datasetB --output merged/` is run, **Then** all subjects from both datasets appear in the output and the merged dataset is valid.
178
178
2.**Given** two datasets with overlapping subject IDs, **When** merge is run without `--into-sessions`, **Then** the tool refuses with exit code 2 listing the conflicts.
179
179
3.**Given**`--into-sessions ses-A ses-B`, **When** merge is run, **Then** each dataset's data is placed under the respective session.
180
-
4.Actually there is a "common" workflow for such functionality: when conversion to BIDS is done per subject/session thus collecting lots of small BIDS datasets, and then "merging" them into a single dataset, potentially incrementally.
181
-
5.Could be a case where some session was interrupted and then additional data acquired in a separate acquisition session -- we might convert it separately but then would still want to merge those extra files into the same, already present session (e.g. more _run-'s of `_bold`). Option might need to be given on how to address "conflicts", as e.g. whether to add new _run- indices.
182
-
6.During merge there could be conflicts in e.g. different ages (across sessions) for the same subject. Or top level sidecar metadata .json files aggregated. One of the strategies could be 'segregate' into the next level (sub-*/) and then re-aggregate into the top thus accounting for potential differences etc.
180
+
4.**Given** an existing target dataset and a newly converted single-subject dataset, **When**`bids-utils merge newdata/ --into existing/`is run, **Then** the new subject is added incrementally to the existing dataset without disturbing other subjects. This supports the common workflow of converting subjects one at a time and merging each into the growing dataset.
181
+
5.**Given** a target dataset with `sub-01/ses-01/func/sub-01_ses-01_task-rest_run-01_bold.nii.gz`and a source dataset with the same subject/session containing additional BOLD runs, **When**`bids-utils merge --on-conflict add-runs` is run, **Then** the incoming files are assigned the next available `run-` indices (e.g., `run-02`) and merged into the session. **Given**`--on-conflict error` (default), **Then** the tool refuses with exit code 2 listing the conflicting filenames.
182
+
6.**Given** two datasets with differing `participants.tsv` values for the same subject (e.g., different `age`across sessions), **When** merge is run, **Then**the tool reports the conflict. **Given** top-level sidecar metadata that differs between the datasets, **When** merge is run with `--reconcile-metadata`, **Then** the tool segregates conflicting metadata down to the appropriate level and re-aggregates to produce correct inheritance.
0 commit comments