Skip to content

Commit a51ac95

Browse files
committed
docs: add structured AGENTS.md with nested module guides
Add a comprehensive root AGENTS.md following modeles_d_agents best practices, plus nested AGENTS.md files for the containers and storage modules.
1 parent 71e9eec commit a51ac95

3 files changed

Lines changed: 256 additions & 0 deletions

File tree

AGENTS.md

Lines changed: 182 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,182 @@
1+
# AGENTS.md -- plaid (pyplaid)
2+
3+
## Project identity
4+
5+
**plaid** is the foundational data model library of the [PLAID ecosystem](https://github.com/PLAID-lib).
6+
Published on PyPI as `pyplaid`, it provides a structured format for representing physics
7+
simulation data (meshes, fields, boundary conditions) and abstracts storage backends
8+
(zarr, HuggingFace datasets, CGNS).
9+
10+
Every other library in the ecosystem depends on plaid.
11+
12+
## Expected agent behavior
13+
14+
### Role
15+
16+
You are a senior Python developer with experience in scientific computing, data modeling,
17+
and open-source library design. You prioritize backward compatibility and clean abstractions.
18+
19+
### Decision priorities
20+
21+
1. **Backward compatibility** > new features -- this is a foundational library, breaking downstream users is costly
22+
2. **Correctness** > performance -- data integrity in scientific computing is non-negotiable
23+
3. **Readability** > cleverness -- contributors come from diverse scientific backgrounds
24+
25+
### When in doubt
26+
27+
- Do not change public API signatures without explicit approval
28+
- Prefer adding new optional parameters with sensible defaults
29+
- Check if the change impacts `scimm` or `maestro` (downstream consumers)
30+
- Run the full test suite before proposing changes
31+
32+
## Tech stack
33+
34+
- **Language**: Python 3.11--3.13
35+
- **Package manager**: uv (with `pyproject.toml`)
36+
- **Build backend**: setuptools with setuptools-scm (dynamic versioning)
37+
- **Linter/formatter**: ruff
38+
- **Test framework**: pytest
39+
- **Documentation**: Sphinx (ReadTheDocs)
40+
- **CI/CD**: GitHub Actions
41+
42+
## Project structure
43+
44+
```
45+
.
46+
├── AGENTS.md <- This file
47+
├── pyproject.toml <- Dependencies and project metadata
48+
├── ruff.toml <- Ruff linter/formatter configuration
49+
├── CHANGELOG.md <- Version history
50+
├── CONTRIBUTING.md <- Contribution guidelines
51+
├── src/plaid/ <- Source code
52+
│ ├── __init__.py
53+
│ ├── constants.py <- Global constants
54+
│ ├── problem_definition.py <- ProblemDefinition (core concept)
55+
│ ├── containers/ <- Dataset, Sample, Features (see nested AGENTS.md)
56+
│ ├── storage/ <- Storage backends: zarr, hf_datasets, cgns (see nested AGENTS.md)
57+
│ ├── bridges/ <- HuggingFace bridge utilities
58+
│ ├── pipelines/ <- sklearn-compatible processing blocks
59+
│ ├── post/ <- Post-processing (metrics, bisection)
60+
│ └── examples/ <- Built-in example datasets
61+
├── tests/ <- Test suite
62+
├── docs/ <- Sphinx documentation source
63+
├── examples/ <- Usage examples
64+
└── benchmarks/ <- Performance benchmarks
65+
```
66+
67+
## Architecture and key concepts
68+
69+
### Core abstractions
70+
71+
| Concept | Module | Description |
72+
|---------|--------|-------------|
73+
| `ProblemDefinition` | `problem_definition.py` | Declares fields, meshes, and their roles (input/output/context) for a physics problem |
74+
| `Sample` | `containers/sample.py` | One simulation snapshot: mesh + field values |
75+
| `Dataset` | `containers/dataset.py` | Ordered collection of Samples with shared ProblemDefinition |
76+
| `Features` | `containers/features.py` | Named tensor-like data with metadata |
77+
| `FeatureIdentifier` | `containers/feature_identifier.py` | Unique key to identify a feature across samples |
78+
79+
### Storage pattern
80+
81+
Storage uses a **Registry pattern** (`storage/registry.py`) to dispatch read/write
82+
operations to the correct backend (zarr, hf_datasets, cgns). Each backend implements
83+
a `reader.py` and `writer.py` following a common interface defined in `storage/common/`.
84+
85+
### Dependency graph (ecosystem)
86+
87+
```
88+
plaid (pyplaid) scimm
89+
^ ^
90+
| pyplaid>=0.1.13 | scimm>=0.2.0
91+
| |
92+
+----------+-------------+
93+
|
94+
maestro (glue layer)
95+
```
96+
97+
plaid has **no dependency** on scimm or maestro. Changes here propagate downstream.
98+
99+
## Code conventions
100+
101+
### Formatting and linting
102+
103+
Ruff is configured in `ruff.toml`:
104+
- **Line length**: 88 characters
105+
- **Lint rules**: `D` (docstrings), `E`/`W` (pycodestyle), `F` (pyflakes), `ARG` (unused arguments), `I` (import sorting)
106+
- **Docstring convention**: Google style
107+
- **Excluded directories**: `examples/`, `docs/`, `benchmarks/`
108+
- **Test files**: docstring rules (`D`) and `S101` (assert) are ignored
109+
110+
```bash
111+
# Check linting
112+
uv run ruff check .
113+
114+
# Auto-fix
115+
uv run ruff check --fix .
116+
117+
# Format
118+
uv run ruff format .
119+
```
120+
121+
### Type hints
122+
123+
- Required on all public functions and methods
124+
- Use modern syntax: `list[str]`, `dict[str, int]`, `X | None` (not `Optional[X]`)
125+
- Never use deprecated `typing.List`, `typing.Dict`, `typing.Optional`
126+
127+
### Docstrings
128+
129+
- Google style (enforced by ruff rule `D` with `convention = "google"`)
130+
- Required on all public modules, classes, functions, and methods
131+
- Update docstrings whenever you modify code behavior
132+
133+
## Testing
134+
135+
- **Framework**: pytest
136+
- **Location**: `tests/`
137+
- **Run all**: `uv run pytest`
138+
- **Run specific**: `uv run pytest tests/path/to/test_file.py`
139+
- **With coverage**: `uv run pytest --cov=src`
140+
141+
Guidelines:
142+
- Write tests for new public functions, classes, and methods
143+
- Test edge cases and error conditions
144+
- Use descriptive test names that explain the scenario
145+
- Mock external dependencies (file I/O, network) to keep tests fast
146+
- Do not test trivial code or third-party libraries
147+
148+
## Commands
149+
150+
```bash
151+
# Install dependencies
152+
uv sync
153+
154+
# Run tests
155+
uv run pytest
156+
157+
# Check linting
158+
uv run ruff check .
159+
160+
# Auto-fix linting issues
161+
uv run ruff check --fix .
162+
163+
# Format code
164+
uv run ruff format .
165+
166+
# Build documentation
167+
cd docs && make html
168+
```
169+
170+
## Contribution workflow
171+
172+
When making changes:
173+
174+
1. Read and understand existing code before modifying
175+
2. Write or update code with type hints
176+
3. Write unit tests for new functionality
177+
4. Update docstrings (Google style)
178+
5. Update Sphinx documentation if functionality changed
179+
6. Run formatter: `uv run ruff format .`
180+
7. Run linter: `uv run ruff check --fix .`
181+
8. Run tests: `uv run pytest`
182+
9. Check if changes are breaking and inform the reviewer if a major version bump is needed

src/plaid/containers/AGENTS.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# AGENTS.md -- plaid/containers
2+
3+
This module defines the core data containers of the PLAID data model.
4+
5+
## Key classes
6+
7+
| Class | File | Description |
8+
|-------|------|-------------|
9+
| `Dataset` | `dataset.py` | Ordered collection of `Sample` objects sharing a common `ProblemDefinition`. Main entry point for loading and manipulating simulation data. |
10+
| `Sample` | `sample.py` | Single simulation snapshot containing mesh coordinates and field values as `Features`. |
11+
| `Features` | `features.py` | Named tensor-like container with shape and dtype metadata. Wraps numpy arrays. |
12+
| `FeatureIdentifier` | `feature_identifier.py` | Immutable key (name + location) used to uniquely identify a feature across samples. |
13+
| `DefaultManager` | `managers/default_manager.py` | Manages default values and missing data for features within a dataset. |
14+
15+
## Design constraints
16+
17+
- `Dataset` is a **large class** (~1800 lines). Avoid adding new responsibilities to it. Prefer extracting logic into helper functions or dedicated modules.
18+
- `Sample` and `Features` are **value objects** -- they should remain simple, with minimal business logic.
19+
- `FeatureIdentifier` is **immutable and hashable** -- it is used as dictionary keys throughout the codebase. Do not add mutable state.
20+
- All containers must support **serialization** through the storage backends (zarr, hf_datasets, cgns).
21+
22+
## Downstream impact
23+
24+
These classes are the public API surface consumed by `maestro` and end users. Any signature change is a **breaking change** that requires a major version bump.
25+
26+
## Testing
27+
28+
Tests are in `tests/`. When modifying a container class, verify that storage round-trips (write then read) still produce identical data.

src/plaid/storage/AGENTS.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# AGENTS.md -- plaid/storage
2+
3+
This module implements the multi-backend storage layer for reading and writing PLAID datasets.
4+
5+
## Architecture
6+
7+
Storage follows a **Registry pattern**:
8+
9+
```
10+
storage/
11+
├── registry.py <- Dispatches to the correct backend based on format
12+
├── reader.py <- Public read API (delegates to backend readers)
13+
├── writer.py <- Public write API (delegates to backend writers)
14+
├── common/ <- Abstract interfaces and shared utilities
15+
│ ├── reader.py <- Base reader interface
16+
│ ├── writer.py <- Base writer interface
17+
│ ├── bridge.py <- Format conversion helpers
18+
│ └── preprocessor.py
19+
├── zarr/ <- Zarr backend (reader.py, writer.py, bridge.py)
20+
├── hf_datasets/ <- HuggingFace datasets backend (reader.py, writer.py, bridge.py)
21+
└── cgns/ <- CGNS backend (reader.py, writer.py)
22+
```
23+
24+
## How it works
25+
26+
1. The **registry** (`registry.py`) maps format identifiers to backend modules.
27+
2. The public `reader.py` and `writer.py` at the top level accept a format parameter and delegate to the appropriate backend.
28+
3. Each backend implements the interfaces defined in `common/reader.py` and `common/writer.py`.
29+
30+
## Adding a new backend
31+
32+
1. Create a new subdirectory under `storage/` (e.g., `storage/my_format/`).
33+
2. Implement `reader.py` and `writer.py` following the interfaces in `common/`.
34+
3. Register the new backend in `registry.py`.
35+
4. Add round-trip tests (write then read) to verify data integrity.
36+
37+
## Design constraints
38+
39+
- Backends must be **stateless** -- all configuration is passed through function parameters.
40+
- Read/write operations must preserve **data integrity** exactly (no lossy conversions without explicit user consent).
41+
- The `common/` interfaces are the **contract** -- do not add backend-specific parameters to the public API without updating the contract first.
42+
- `zarr` is the primary backend and the most feature-complete. Use it as the reference when implementing others.
43+
44+
## Testing
45+
46+
Each backend should have round-trip tests that write a dataset and read it back, asserting equality. Tests are in `tests/`.

0 commit comments

Comments
 (0)