An MCP (Model Context Protocol) server for working with PLAID datasets. It exposes tools that an LLM assistant (such as Claude via Cline) can call to scan raw simulation data, inspect its structure, and convert it into the PLAID format.
PLAID is a format and library for storing and sharing physics simulation datasets. Converting raw simulation outputs (HDF5, NumPy, PLY, CGNS, CSV, ...) into PLAID requires understanding the structure of the raw data and mapping its variables to PLAID features.
This MCP server automates that workflow by providing tools that:
- Scan a raw data directory to detect its layout and candidate simulations
- (Planned) Inspect individual simulation files to identify variables and their shapes
- (Planned) Define a conversion configuration (split mapping, feature mapping, backend)
- (Planned) Execute the conversion using
plaid.storage.save_to_disk
This project is managed with uv.
git clone <repo-url>
cd plaid-mcp-server
uv syncAdd the following to your MCP client configuration (e.g. Cline's cline_mcp_settings.json):
{
"mcpServers": {
"plaid-mcp-server": {
"timeout": 60,
"command": "uv",
"args": [
"run",
"--directory",
"/path/to/plaid-mcp-server",
"plaid-server"
],
"type": "stdio"
}
}
}Scans a directory of raw simulation files to detect its layout.
Input:
| Parameter | Type | Required | Description |
|---|---|---|---|
raw_dir |
string | Yes | Path to the root directory containing raw simulation data |
Output:
{
"session_id": "6093e66b",
"raw_dir": "/data/my_sims",
"layout": "one_subdir_per_simulation",
"dominant_extensions": [".h5"],
"noise_files": ["README.md", "run_info.txt"],
"candidate_simulations": [
{"id": 0, "path": "run_001", "files": ["flow.h5", "mesh.cgns"]},
{"id": 1, "path": "run_002", "files": ["flow.h5", "mesh.cgns"]}
],
"total_candidates": 42,
"truncated": true
}The tool detects three layout patterns:
flat: all simulation files sit directly in the root directory (e.g.sim_001.h5,sim_002.h5, ...)one_subdir_per_simulation: each immediate subdirectory contains one simulation (e.g.run_001/flow.h5,run_002/flow.h5, ...)nested: deeper or mixed nesting
The output is capped at 20 candidate simulations. The full list is stored in the session for use by follow-up tools. Noise files (.txt, .md, .log, .DS_Store, etc.) are identified and excluded from candidates.
A session_id is returned and can be referenced in subsequent tool calls.
Loads an existing PLAID dataset from a local directory and returns its metadata.
Input:
| Parameter | Type | Required | Description |
|---|---|---|---|
local_dir |
string | Yes | Path to the local directory containing the saved dataset |
splits |
array of strings | No | Splits to load. If omitted, all splits are loaded. |
Output:
{
"local_dir": "/data/my_plaid_dataset",
"splits": ["train", "test"],
"num_samples_per_split": {"train": 500, "test": 100},
"backend": "hf_datasets",
"variable_features": ["pressure", "velocity_x"],
"constant_features": ["reynolds_number"]
}The server maintains an in-memory session registry for the duration of the server process. Each call to scan_raw_dataset creates a new session that stores the scan result. This allows follow-up tools to reference the scan without re-running it, using the returned session_id.
Sessions are not persisted across server restarts.
uv run pytestuv run ruff format .
uv run ruff check --fix .uvx ty check src/src/plaid_mcp_server/
├── server.py # MCP server entry point, tool registration
├── session.py # SessionManager and ConversionSession
└── tools/
├── inspection.py # inspect_simulation_file tool
├── scanning.py # scan_raw_dataset tool
└── storage.py # init_from_disk tool
tests/
└── tools/
├── test_inspection.py
└── test_scanning.py
Open a single raw simulation file and report its variables, shapes, and data types.
Input:
| Parameter | Type | Required | Description |
|---|---|---|---|
file_path |
string | Yes | Path to the simulation file to inspect |
Supported formats: .npy, .npz, .ply, .csv, .h5, .hdf5, .vtu, .vtp, .vtk
Output examples:
{
"file": "/data/sim/press.npy",
"format": "npy",
"variables": [{"name": "press", "shape": [3682], "dtype": "float64"}],
"summary": "NumPy array: shape=[3682], dtype=float64"
}{
"file": "/data/sim/tri_mesh.ply",
"format": "ply",
"variables": [
{"name": "vertex", "count": 3586, "properties": [{"name": "x", "dtype": "=f8"}, ...]},
{"name": "face", "count": 7168, "properties": [{"name": "vertex_indices", "dtype": "|O"}]}
],
"summary": "PLY file: 2 element(s) — vertex(3586), face(7168)"
}{
"file": "/data/sim.h5",
"format": "hdf5",
"variables": [
{"name": "pressure", "shape": [100], "dtype": "float32"},
{"name": "velocity/u", "shape": [100], "dtype": "float64"}
],
"summary": "HDF5 file: 2 dataset(s) — pressure, velocity/u"
}Analyses the scan result stored in a session and produces a structured PLAID conversion plan, then writes a ready-to-run Python conversion script to output_script_path.
Infers: sample semantics (static vs temporal), mesh format and element type, PLAID CGNS feature path identifiers, and split definitions from sidecar files (train.txt, test.txt, ...).
Input:
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id |
string | yes | Session ID from a previous scan_raw_dataset call. |
output_script_path |
string | yes | Path where the generated conversion script is written. |
dataset_name |
string | no | Human-readable name used in the script header (default: "dataset"). |
Output:
{
"session_id": "6093e66b",
"sample_semantics": "static",
"closest_example": "shapenetcar.py",
"external_dependencies": ["plyfile", "Muscat"],
"mesh": {"format": "ply", "element_type": "Triangle_3"},
"features": {
"input": ["Base_3_3/Zone/GridCoordinates/CoordinateX", "..."],
"output": ["Base_3_3/Zone/VertexFields/press"],
"constant": ["Base_3_3/Zone/Elements_Triangle_3/ElementConnectivity", "..."]
},
"splits": {"train": {"source": "train.txt", "count": 2900}, "test": {"source": "test.txt", "count": 100}},
"edge_cases": ["PLY watertight meshes may have fewer vertices than the pressure array..."],
"backends": ["hf_datasets", "cgns", "zarr"],
"generated_script_path": "/path/to/convert.py"
}The generated script is a complete, runnable Python file. Edit RAW_DATA_DIR, OUTPUT_DIR, and the field loading block as needed before running.
Executes the conversion script generated by propose_conversion_plan. Patches OUTPUT_DIR and BACKEND in the script, runs it in a subprocess, and returns the status.
Input:
| Parameter | Type | Required | Description |
|---|---|---|---|
session_id |
string | yes | Session ID with a completed conversion plan. |
output_dir |
string | yes | Directory where the PLAID dataset will be written. |
backend |
string | no | PLAID storage backend: hf_datasets, cgns, or zarr (default: hf_datasets). |
Output:
{
"session_id": "6093e66b",
"output_dir": "/data/plaid/shapenetcar",
"backend": "hf_datasets",
"script_path": "/path/to/convert.py",
"status": "success",
"message": "Conversion completed successfully."
}get_conversion_status: query the progress of an ongoing conversion job