PLAID-lib · xroynard · Apr 17, 2026 · Apr 18, 2026 · Apr 18, 2026 · Jun 5, 2026
@@ -0,0 +1,209 @@
+# AGENTS.md -- plaid (pyplaid)
+
+## Project identity
+
+**plaid** is the foundational data model library of the [PLAID ecosystem](https://github.com/PLAID-lib).
+Published on PyPI as `pyplaid`, it provides a structured format for representing physics
+simulation data (meshes, fields, boundary conditions) and abstracts storage backends
+(zarr, HuggingFace datasets, CGNS).
+
+Other libraries in the ecosystem depend on plaid.
+
+## Expected agent behavior
+
+### Role
+
+You are a senior Python developer with experience in scientific computing, data modeling,
+and open-source library design. You prioritize backward compatibility and clean abstractions.
+
+### Decision priorities
+
+1. **Backward compatibility** > new features -- this is a foundational library, breaking downstream users is costly
+2. **Correctness** > performance -- data integrity in scientific computing is non-negotiable
+3. **Readability** > cleverness -- contributors come from diverse scientific backgrounds
+
+### When in doubt
+
+- Do not change public API signatures without explicit approval
+- Prefer adding new optional parameters with sensible defaults
+- Check if the change impacts downstream consumers of plaid
+- Run the full test suite before proposing changes
+
+### Confidentiality
+
+plaid is a **public** repository. Some downstream libraries in the PLAID ecosystem are private.
+Never mention private repository names, internal project names, or confidential details
+in any public-facing content (code comments, docstrings, commit messages, PR descriptions,
+issues, or documentation).
+
+### Communication rules
+
+- All interactions on this repository (issues, PRs, reviews, comments) must be in **English**.
+- Be direct and concise. Avoid compliments, flattery, or filler sentences.
+
+## Tech stack
+
+- **Language**: Python 3.11--3.13
+- **Package manager**: uv (with `pyproject.toml`)
+- **Build backend**: setuptools with setuptools-scm (dynamic versioning)
+- **Linter/formatter**: ruff
+- **Test framework**: pytest
+- **Documentation**: Zensical (with mkdocstrings for the API reference), published on ReadTheDocs
+- **CI/CD**: GitHub Actions
+
+## Project structure
+
+```
+.
+├── AGENTS.md                  <- This file
+├── pyproject.toml             <- Dependencies and project metadata
+├── ruff.toml                  <- Ruff linter/formatter configuration
+├── CHANGELOG.md               <- Version history
+├── CONTRIBUTING.md            <- Contribution guidelines
+├── src/plaid/                 <- Source code
+│   ├── __init__.py            <- Public API: Sample, Infos, ProblemDefinition
+│   ├── constants.py           <- Global constants
+│   ├── problem_definition.py  <- ProblemDefinition (core concept)
+│   ├── infos.py               <- Infos (dataset/problem metadata)
+│   ├── containers/            <- Sample container + helpers (see nested AGENTS.md)
+│   ├── storage/               <- Storage backends: zarr, hf_datasets, cgns (see nested AGENTS.md)
+│   ├── types/                 <- Shared type aliases and definitions
+│   ├── cli/                   <- Command-line entry points (e.g. plaidcheck)
+│   ├── viewer/                <- Dataset visualization services
+│   └── downloadable_examples/ <- Built-in downloadable example datasets
+├── tests/                     <- Test suite
+├── docs/                      <- Sphinx documentation source
+└── examples/                  <- Usage examples
+```
+
+> Note: the v1.0.0 reorganization removed the top-level `Dataset` re-export and the
+> `bridges/`, `pipelines/`, `post/` and `examples/` source packages. Data is now handled
+> through `Sample` objects and the `storage` layer. See `docs/source/upgrade_guide.md`.
+
+## Architecture and key concepts
+
+### Core abstractions
+
+| Concept | Module | Description |
+|---------|--------|-------------|
+| `ProblemDefinition` | `problem_definition.py` | Declares fields, meshes, and their roles (input/output/context) for a physics problem |
+| `Infos` | `infos.py` | Metadata describing a dataset/problem (legal, data production, etc.) |
+| `Sample` | `containers/sample.py` | One simulation snapshot: mesh + field values (a pydantic `BaseModel`) |
+
+`Sample`, `Infos` and `ProblemDefinition` are re-exported at the top level of the
+`plaid` package, together with the helpers `get_number_of_samples` and `get_sample_ids`
+from `containers/utils.py`.
+
+### Storage pattern
+
+Storage uses a **Registry pattern** (`storage/registry.py`) to dispatch read/write
+operations to the correct backend (zarr, hf_datasets, cgns). Each backend implements
+a `reader.py` and `writer.py` following the backend contract defined in
+`storage/backend_api.py` and the shared interfaces in `storage/common/`.
+Reading/writing a collection of samples is done through this storage layer rather
+than through a dedicated `Dataset` class.
+
+## Code conventions
+
+### Formatting and linting
+
+Ruff is configured in `ruff.toml`:
+- **Line length**: 88 characters
+- **Lint rules**: `D` (docstrings), `E`/`W` (pycodestyle), `F` (pyflakes), `ARG` (unused arguments), `I` (import sorting)
+- **Docstring convention**: Google style
+- **Excluded directories**: `examples/`, `docs/`, `benchmarks/`
+- **Test files**: docstring rules (`D`) and `S101` (assert) are ignored
+
+```bash
+# Check linting
+uv run ruff check .
+
+# Auto-fix
+uv run ruff check --fix .
+
+# Format
+uv run ruff format .
+```
+
+### Type hints
+
+- Required on all public functions and methods
+- Use modern syntax: `list[str]`, `dict[str, int]`, `X | None` (not `Optional[X]`)
+- Never use deprecated `typing.List`, `typing.Dict`, `typing.Optional`
+
+### Docstrings
+
+- Google style (enforced by ruff rule `D` with `convention = "google"`)
+- Required on all public modules, classes, functions, and methods
+- Update docstrings whenever you modify code behavior
+
+## Testing
+
+- **Framework**: pytest
+- **Location**: `tests/`
+- **Run all**: `uv run pytest`
+- **Run specific**: `uv run pytest tests/path/to/test_file.py`
+- **With coverage**: `uv run pytest --cov=src`
+
+Guidelines:
+- Write tests for new public functions, classes, and methods
+- Test edge cases and error conditions
+- Use descriptive test names that explain the scenario
+- Mock external dependencies (file I/O, network) to keep tests fast
+- Do not test trivial code or third-party libraries
+
+## Pull request rules
+
+PR titles **must start with one of the following emojis** to indicate the type of change:
+
+| Emoji | Type |
+|-------|------|
+| 🐛 | Bug fix |
+| 📄 | Documentation |
+| 🎉 | New feature or initial commit |
+| 🚀 | Performance or deployment |
+| ♻️ | Refactor or cleanup |
+| 📦 | Packaging or dependency management |
+
+PR checklist (from `.github/pull_request_template.md`):
+- Typing enforced
+- Documentation updated
+- Changelog updated
+- Tests and example updates
+- Coverage should be 100%
+
+## Commands
+
+```bash
+# Install dependencies
+uv sync
+
+# Run tests
+uv run pytest
+
+# Check linting
+uv run ruff check .
+
+# Auto-fix linting issues
+uv run ruff check --fix .
+
+# Format code
+uv run ruff format .
+
+# Build documentation
+bash docs/generate_doc.sh
+```
+
+## Contribution workflow
+
+When making changes:
+
+1. Read and understand existing code before modifying
+2. Write or update code with type hints
+3. Write unit tests for new functionality
+4. Update docstrings (Google style)
+5. Update Sphinx documentation if functionality changed
+6. Run formatter: `uv run ruff format .`
+7. Run linter: `uv run ruff check --fix .`
+8. Run tests: `uv run pytest`
+9. Check if changes are breaking and inform the reviewer if a major version bump is needed
@@ -0,0 +1,37 @@
+# AGENTS.md -- plaid/containers
+
+This module defines the core data container of the PLAID data model.
+
+## Key classes
+
+| Class | File | Description |
+|-------|------|-------------|
+| `Sample` | `sample.py` | Single simulation snapshot containing mesh coordinates and field values. Implemented as a pydantic `BaseModel`. This is the main data container exposed by plaid. |
+| `DefaultManager` | `managers/default_manager.py` | Manages default values and missing data for features within a sample. |
+
+Helper functions live in `utils.py` (e.g. `get_number_of_samples`, `get_sample_ids`)
+and are re-exported at the top level of the `plaid` package.
+
+> Note: the v1.0.0 reorganization removed the `Dataset`, `Features` and
+> `FeatureIdentifier` classes. A collection of samples is now read/written through the
+> `storage` layer rather than a dedicated `Dataset` class. See `docs/source/upgrade_guide.md`.
+
+## Design constraints
+
+- `Sample` is a **value object** built on pydantic -- keep it focused on holding mesh
+  and field data, with minimal business logic. Prefer extracting heavy logic into
+  helper functions or dedicated modules.
+- `DefaultManager` centralizes default/missing-data handling -- do not duplicate this
+  logic inside `Sample`.
+- All containers must support **serialization** through the storage backends
+  (zarr, hf_datasets, cgns).
+
+## Downstream impact
+
+`Sample` is part of the public API surface consumed by downstream libraries and end
+users. Any signature change is a **breaking change** that requires a major version bump.
+
+## Testing
+
+Tests are in `tests/`. When modifying a container class, verify that storage round-trips
+(write then read) still produce identical data.
@@ -0,0 +1,52 @@
+# AGENTS.md -- plaid/storage
+
+This module implements the multi-backend storage layer for reading and writing PLAID datasets.
+
+## Architecture
+
+Storage follows a **Registry pattern**:
+
+```
+storage/
+├── registry.py        <- Dispatches to the correct backend based on format
+├── backend_api.py     <- Backend contract (BackendModule Protocol)
+├── reader.py          <- Public read API (delegates to backend readers)
+├── writer.py          <- Public write API (delegates to backend writers)
+├── common/            <- Abstract interfaces and shared utilities
+│   ├── reader.py      <- Base reader interface
+│   ├── writer.py      <- Base writer interface
+│   ├── bridge.py      <- Format conversion helpers
+│   └── preprocessor.py
+├── zarr/              <- Zarr backend (reader.py, writer.py, bridge.py)
+├── hf_datasets/       <- HuggingFace datasets backend (reader.py, writer.py, bridge.py)
+└── cgns/              <- CGNS backend (reader.py, writer.py)
+```
+
+## How it works
+
+1. The **registry** (`registry.py`) holds a `BACKENDS` dict mapping each format name
+   (`"cgns"`, `"hf_datasets"`, `"zarr"`) to its backend class, exposed through
+   `get_backend(name)` and `available_backends()`.
+2. The public `reader.py` and `writer.py` at the top level accept a format parameter and delegate to the appropriate backend.
+3. Each backend exposes a backend class (e.g. `ZarrBackend`, `HFBackend`, `CgnsBackend`)
+   that conforms to the `BackendModule` Protocol in `backend_api.py`, and implements the
+   read/write logic in its `reader.py` and `writer.py`.
+
+## Adding a new backend
+
+1. Create a new subdirectory under `storage/` (e.g., `storage/my_format/`).
+2. Implement a backend class conforming to the `BackendModule` Protocol
+   (`backend_api.py`), with its `reader.py` and `writer.py` following the interfaces in `common/`.
+3. Register the new backend by adding it to the `BACKENDS` dict in `registry.py`.
+4. Add round-trip tests (write then read) to verify data integrity.
+
+## Design constraints
+
+- Backends must be **stateless** -- all configuration is passed through function parameters.
+- Read/write operations must preserve **data integrity** exactly (no lossy conversions without explicit user consent).
+- The `common/` interfaces are the **contract** -- do not add backend-specific parameters to the public API without updating the contract first.
+- `zarr` is the primary backend and the most feature-complete. Use it as the reference when implementing others.
+
+## Testing
+
+Each backend should have round-trip tests that write a dataset and read it back, asserting equality. Tests are in `tests/`.