Skip to content

Commit 7efc4b6

Browse files
committed
Add Command line utility to create a dataset
1 parent 248c5f7 commit 7efc4b6

5 files changed

Lines changed: 591 additions & 0 deletions

File tree

README.md

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,49 @@ pre-commit install
147147

148148
This ensures that every time you commit, all the hooks are executed automatically on the staged files.
149149

150+
### 2.3 Build a PLAID dataset from raw CSV data
151+
152+
PLAID provides a CLI entry point to build a dataset from a raw-data directory layout in one command:
153+
154+
```bash
155+
plaid-build-dataset --input-dir="/path/to/raw/data" --output-dir="/path/to/plaid/output"
156+
```
157+
158+
Equivalent module invocation:
159+
160+
```bash
161+
python -m plaid.cli.build_dataset --input-dir="/path/to/raw/data" --output-dir="/path/to/plaid/output"
162+
```
163+
164+
Expected raw-data layout (example):
165+
166+
```text
167+
/path/to/raw/data
168+
├── input_scalars/
169+
│ ├── scalars_00000.csv
170+
│ ├── scalars_00001.csv
171+
│ └── ...
172+
├── output_scalars/
173+
│ ├── scalars_00000.csv
174+
│ ├── scalars_00001.csv
175+
│ └── ...
176+
├── field_1/
177+
│ ├── scalars_00000.csv
178+
│ ├── scalars_00001.csv
179+
│ └── ...
180+
├── field_2/
181+
│ ├── scalars_00000.csv
182+
│ ├── scalars_00001.csv
183+
│ └── ...
184+
└── ...
185+
```
186+
187+
Notes:
188+
- Scalar files must contain one header row and one data row.
189+
- Sample IDs are inferred from numeric filename suffixes (e.g. `scalars_00012.csv` -> id `12`) and must match across directories.
190+
- Field directories are auto-detected (all subdirectories except `input_scalars` and `output_scalars`) unless passed explicitly via `--field-dirs`.
191+
- Use `--overwrite` to replace an existing output directory.
192+
150193
## 3. Call for Contributions
151194

152195
The PLAID project welcomes your expertise and enthusiasm!

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,9 @@ files=["LICENSE.txt"]
5656
file="README.md"
5757
content-type = "text/markdown"
5858

59+
[project.scripts]
60+
plaid-build-dataset = "plaid.cli.build_dataset:main"
61+
5962
[tool.setuptools]
6063
platforms = [
6164
"Linux",

src/plaid/cli/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
"""Command-line utilities for PLAID."""

0 commit comments

Comments
 (0)