Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
209 changes: 209 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
# AGENTS.md -- plaid (pyplaid)

## Project identity

**plaid** is the foundational data model library of the [PLAID ecosystem](https://github.com/PLAID-lib).
Published on PyPI as `pyplaid`, it provides a structured format for representing physics
simulation data (meshes, fields, boundary conditions) and abstracts storage backends
(zarr, HuggingFace datasets, CGNS).

Other libraries in the ecosystem depend on plaid.

## Expected agent behavior

### Role

You are a senior Python developer with experience in scientific computing, data modeling,
and open-source library design. You prioritize backward compatibility and clean abstractions.

### Decision priorities

1. **Backward compatibility** > new features -- this is a foundational library, breaking downstream users is costly
2. **Correctness** > performance -- data integrity in scientific computing is non-negotiable
3. **Readability** > cleverness -- contributors come from diverse scientific backgrounds

### When in doubt

- Do not change public API signatures without explicit approval
- Prefer adding new optional parameters with sensible defaults
- Check if the change impacts downstream consumers of plaid
- Run the full test suite before proposing changes

### Confidentiality

plaid is a **public** repository. Some downstream libraries in the PLAID ecosystem are private.
Never mention private repository names, internal project names, or confidential details
in any public-facing content (code comments, docstrings, commit messages, PR descriptions,
issues, or documentation).

### Communication rules

- All interactions on this repository (issues, PRs, reviews, comments) must be in **English**.
- Be direct and concise. Avoid compliments, flattery, or filler sentences.

## Tech stack

- **Language**: Python 3.11--3.13
- **Package manager**: uv (with `pyproject.toml`)
- **Build backend**: setuptools with setuptools-scm (dynamic versioning)
- **Linter/formatter**: ruff
- **Test framework**: pytest
- **Documentation**: Zensical (with mkdocstrings for the API reference), published on ReadTheDocs
- **CI/CD**: GitHub Actions

## Project structure

```
.
├── AGENTS.md <- This file
├── pyproject.toml <- Dependencies and project metadata
├── ruff.toml <- Ruff linter/formatter configuration
├── CHANGELOG.md <- Version history
├── CONTRIBUTING.md <- Contribution guidelines
├── src/plaid/ <- Source code
│ ├── __init__.py <- Public API: Sample, Infos, ProblemDefinition
│ ├── constants.py <- Global constants
│ ├── problem_definition.py <- ProblemDefinition (core concept)
│ ├── infos.py <- Infos (dataset/problem metadata)
│ ├── containers/ <- Sample container + helpers (see nested AGENTS.md)
│ ├── storage/ <- Storage backends: zarr, hf_datasets, cgns (see nested AGENTS.md)
│ ├── types/ <- Shared type aliases and definitions
│ ├── cli/ <- Command-line entry points (e.g. plaidcheck)
│ ├── viewer/ <- Dataset visualization services
│ └── downloadable_examples/ <- Built-in downloadable example datasets
├── tests/ <- Test suite
├── docs/ <- Sphinx documentation source
└── examples/ <- Usage examples
```

> Note: the v1.0.0 reorganization removed the top-level `Dataset` re-export and the
> `bridges/`, `pipelines/`, `post/` and `examples/` source packages. Data is now handled
> through `Sample` objects and the `storage` layer. See `docs/source/upgrade_guide.md`.

## Architecture and key concepts

### Core abstractions

| Concept | Module | Description |
|---------|--------|-------------|
| `ProblemDefinition` | `problem_definition.py` | Declares fields, meshes, and their roles (input/output/context) for a physics problem |
| `Infos` | `infos.py` | Metadata describing a dataset/problem (legal, data production, etc.) |
| `Sample` | `containers/sample.py` | One simulation snapshot: mesh + field values (a pydantic `BaseModel`) |

`Sample`, `Infos` and `ProblemDefinition` are re-exported at the top level of the
`plaid` package, together with the helpers `get_number_of_samples` and `get_sample_ids`
from `containers/utils.py`.

### Storage pattern

Storage uses a **Registry pattern** (`storage/registry.py`) to dispatch read/write
operations to the correct backend (zarr, hf_datasets, cgns). Each backend implements
a `reader.py` and `writer.py` following the backend contract defined in
`storage/backend_api.py` and the shared interfaces in `storage/common/`.
Reading/writing a collection of samples is done through this storage layer rather
than through a dedicated `Dataset` class.

## Code conventions

### Formatting and linting

Ruff is configured in `ruff.toml`:
- **Line length**: 88 characters
- **Lint rules**: `D` (docstrings), `E`/`W` (pycodestyle), `F` (pyflakes), `ARG` (unused arguments), `I` (import sorting)
- **Docstring convention**: Google style
- **Excluded directories**: `examples/`, `docs/`, `benchmarks/`
- **Test files**: docstring rules (`D`) and `S101` (assert) are ignored

```bash
# Check linting
uv run ruff check .

# Auto-fix
uv run ruff check --fix .

# Format
uv run ruff format .
```

### Type hints

- Required on all public functions and methods
- Use modern syntax: `list[str]`, `dict[str, int]`, `X | None` (not `Optional[X]`)
- Never use deprecated `typing.List`, `typing.Dict`, `typing.Optional`

### Docstrings

- Google style (enforced by ruff rule `D` with `convention = "google"`)
- Required on all public modules, classes, functions, and methods
- Update docstrings whenever you modify code behavior

## Testing

- **Framework**: pytest
- **Location**: `tests/`
- **Run all**: `uv run pytest`
- **Run specific**: `uv run pytest tests/path/to/test_file.py`
- **With coverage**: `uv run pytest --cov=src`

Guidelines:
- Write tests for new public functions, classes, and methods
- Test edge cases and error conditions
- Use descriptive test names that explain the scenario
- Mock external dependencies (file I/O, network) to keep tests fast
- Do not test trivial code or third-party libraries

## Pull request rules

PR titles **must start with one of the following emojis** to indicate the type of change:

| Emoji | Type |
|-------|------|
| 🐛 | Bug fix |
| 📄 | Documentation |
| 🎉 | New feature or initial commit |
| 🚀 | Performance or deployment |
| ♻️ | Refactor or cleanup |
| 📦 | Packaging or dependency management |

PR checklist (from `.github/pull_request_template.md`):
- Typing enforced
- Documentation updated
- Changelog updated
- Tests and example updates
- Coverage should be 100%

## Commands

```bash
# Install dependencies
uv sync

# Run tests
uv run pytest

# Check linting
uv run ruff check .

# Auto-fix linting issues
uv run ruff check --fix .

# Format code
uv run ruff format .

# Build documentation
bash docs/generate_doc.sh
```

## Contribution workflow

When making changes:

1. Read and understand existing code before modifying
2. Write or update code with type hints
3. Write unit tests for new functionality
4. Update docstrings (Google style)
5. Update Sphinx documentation if functionality changed
6. Run formatter: `uv run ruff format .`
7. Run linter: `uv run ruff check --fix .`
8. Run tests: `uv run pytest`
9. Check if changes are breaking and inform the reviewer if a major version bump is needed
37 changes: 37 additions & 0 deletions src/plaid/containers/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# AGENTS.md -- plaid/containers

This module defines the core data container of the PLAID data model.

## Key classes

| Class | File | Description |
|-------|------|-------------|
| `Sample` | `sample.py` | Single simulation snapshot containing mesh coordinates and field values. Implemented as a pydantic `BaseModel`. This is the main data container exposed by plaid. |
| `DefaultManager` | `managers/default_manager.py` | Manages default values and missing data for features within a sample. |

Helper functions live in `utils.py` (e.g. `get_number_of_samples`, `get_sample_ids`)
and are re-exported at the top level of the `plaid` package.

> Note: the v1.0.0 reorganization removed the `Dataset`, `Features` and
> `FeatureIdentifier` classes. A collection of samples is now read/written through the
> `storage` layer rather than a dedicated `Dataset` class. See `docs/source/upgrade_guide.md`.

## Design constraints

- `Sample` is a **value object** built on pydantic -- keep it focused on holding mesh
and field data, with minimal business logic. Prefer extracting heavy logic into
helper functions or dedicated modules.
- `DefaultManager` centralizes default/missing-data handling -- do not duplicate this
logic inside `Sample`.
- All containers must support **serialization** through the storage backends
(zarr, hf_datasets, cgns).

## Downstream impact

`Sample` is part of the public API surface consumed by downstream libraries and end
users. Any signature change is a **breaking change** that requires a major version bump.

## Testing

Tests are in `tests/`. When modifying a container class, verify that storage round-trips
(write then read) still produce identical data.
52 changes: 52 additions & 0 deletions src/plaid/storage/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# AGENTS.md -- plaid/storage

This module implements the multi-backend storage layer for reading and writing PLAID datasets.

## Architecture

Storage follows a **Registry pattern**:

```
storage/
├── registry.py <- Dispatches to the correct backend based on format
├── backend_api.py <- Backend contract (BackendModule Protocol)
├── reader.py <- Public read API (delegates to backend readers)
├── writer.py <- Public write API (delegates to backend writers)
├── common/ <- Abstract interfaces and shared utilities
│ ├── reader.py <- Base reader interface
│ ├── writer.py <- Base writer interface
│ ├── bridge.py <- Format conversion helpers
│ └── preprocessor.py
├── zarr/ <- Zarr backend (reader.py, writer.py, bridge.py)
├── hf_datasets/ <- HuggingFace datasets backend (reader.py, writer.py, bridge.py)
└── cgns/ <- CGNS backend (reader.py, writer.py)
```

## How it works

1. The **registry** (`registry.py`) holds a `BACKENDS` dict mapping each format name
(`"cgns"`, `"hf_datasets"`, `"zarr"`) to its backend class, exposed through
`get_backend(name)` and `available_backends()`.
2. The public `reader.py` and `writer.py` at the top level accept a format parameter and delegate to the appropriate backend.
3. Each backend exposes a backend class (e.g. `ZarrBackend`, `HFBackend`, `CgnsBackend`)
that conforms to the `BackendModule` Protocol in `backend_api.py`, and implements the
read/write logic in its `reader.py` and `writer.py`.

## Adding a new backend

1. Create a new subdirectory under `storage/` (e.g., `storage/my_format/`).
2. Implement a backend class conforming to the `BackendModule` Protocol
(`backend_api.py`), with its `reader.py` and `writer.py` following the interfaces in `common/`.
3. Register the new backend by adding it to the `BACKENDS` dict in `registry.py`.
4. Add round-trip tests (write then read) to verify data integrity.

## Design constraints

- Backends must be **stateless** -- all configuration is passed through function parameters.
- Read/write operations must preserve **data integrity** exactly (no lossy conversions without explicit user consent).
- The `common/` interfaces are the **contract** -- do not add backend-specific parameters to the public API without updating the contract first.
- `zarr` is the primary backend and the most feature-complete. Use it as the reference when implementing others.

## Testing

Each backend should have round-trip tests that write a dataset and read it back, asserting equality. Tests are in `tests/`.
Loading