Skip to content

PLAID-lib/plaid-mcp-server

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

plaid-mcp-server

An MCP (Model Context Protocol) server for working with PLAID datasets. It exposes tools that an LLM assistant (such as Claude via Cline) can call to scan raw simulation data, inspect its structure, and convert it into the PLAID format.

Overview

PLAID is a format and library for storing and sharing physics simulation datasets. Converting raw simulation outputs (HDF5, NumPy, PLY, CGNS, CSV, ...) into PLAID requires understanding the structure of the raw data and mapping its variables to PLAID features.

This MCP server automates that workflow by providing tools that:

  1. Scan a raw data directory to detect its layout and candidate simulations
  2. (Planned) Inspect individual simulation files to identify variables and their shapes
  3. (Planned) Define a conversion configuration (split mapping, feature mapping, backend)
  4. (Planned) Execute the conversion using plaid.storage.save_to_disk

Installation

This project is managed with uv.

git clone <repo-url>
cd plaid-mcp-server
uv sync

MCP Server Configuration

Add the following to your MCP client configuration (e.g. Cline's cline_mcp_settings.json):

{
  "mcpServers": {
    "plaid-mcp-server": {
      "timeout": 60,
      "command": "uv",
      "args": [
        "run",
        "--directory",
        "/path/to/plaid-mcp-server",
        "plaid-server"
      ],
      "type": "stdio"
    }
  }
}

Available Tools

scan_raw_dataset

Scans a directory of raw simulation files to detect its layout.

Input:

Parameter Type Required Description
raw_dir string Yes Path to the root directory containing raw simulation data

Output:

{
  "session_id": "6093e66b",
  "raw_dir": "/data/my_sims",
  "layout": "one_subdir_per_simulation",
  "dominant_extensions": [".h5"],
  "noise_files": ["README.md", "run_info.txt"],
  "candidate_simulations": [
    {"id": 0, "path": "run_001", "files": ["flow.h5", "mesh.cgns"]},
    {"id": 1, "path": "run_002", "files": ["flow.h5", "mesh.cgns"]}
  ],
  "total_candidates": 42,
  "truncated": true
}

The tool detects three layout patterns:

  • flat: all simulation files sit directly in the root directory (e.g. sim_001.h5, sim_002.h5, ...)
  • one_subdir_per_simulation: each immediate subdirectory contains one simulation (e.g. run_001/flow.h5, run_002/flow.h5, ...)
  • nested: deeper or mixed nesting

The output is capped at 20 candidate simulations. The full list is stored in the session for use by follow-up tools. Noise files (.txt, .md, .log, .DS_Store, etc.) are identified and excluded from candidates.

A session_id is returned and can be referenced in subsequent tool calls.

init_from_disk

Loads an existing PLAID dataset from a local directory and returns its metadata.

Input:

Parameter Type Required Description
local_dir string Yes Path to the local directory containing the saved dataset
splits array of strings No Splits to load. If omitted, all splits are loaded.

Output:

{
  "local_dir": "/data/my_plaid_dataset",
  "splits": ["train", "test"],
  "num_samples_per_split": {"train": 500, "test": 100},
  "backend": "hf_datasets",
  "variable_features": ["pressure", "velocity_x"],
  "constant_features": ["reynolds_number"]
}

Session Manager

The server maintains an in-memory session registry for the duration of the server process. Each call to scan_raw_dataset creates a new session that stores the scan result. This allows follow-up tools to reference the scan without re-running it, using the returned session_id.

Sessions are not persisted across server restarts.

Development

Running tests

uv run pytest

Linting and formatting

uv run ruff format .
uv run ruff check --fix .

Type checking

uvx ty check src/

Project structure

src/plaid_mcp_server/
├── server.py           # MCP server entry point, tool registration
├── session.py          # SessionManager and ConversionSession
└── tools/
    ├── inspection.py   # inspect_simulation_file tool
    ├── scanning.py     # scan_raw_dataset tool
    └── storage.py      # init_from_disk tool

tests/
└── tools/
    ├── test_inspection.py
    └── test_scanning.py

Available Tools (continued)

inspect_simulation_file

Open a single raw simulation file and report its variables, shapes, and data types.

Input:

Parameter Type Required Description
file_path string Yes Path to the simulation file to inspect

Supported formats: .npy, .npz, .ply, .csv, .h5, .hdf5, .vtu, .vtp, .vtk

Output examples:

{
  "file": "/data/sim/press.npy",
  "format": "npy",
  "variables": [{"name": "press", "shape": [3682], "dtype": "float64"}],
  "summary": "NumPy array: shape=[3682], dtype=float64"
}
{
  "file": "/data/sim/tri_mesh.ply",
  "format": "ply",
  "variables": [
    {"name": "vertex", "count": 3586, "properties": [{"name": "x", "dtype": "=f8"}, ...]},
    {"name": "face", "count": 7168, "properties": [{"name": "vertex_indices", "dtype": "|O"}]}
  ],
  "summary": "PLY file: 2 element(s) — vertex(3586), face(7168)"
}
{
  "file": "/data/sim.h5",
  "format": "hdf5",
  "variables": [
    {"name": "pressure", "shape": [100], "dtype": "float32"},
    {"name": "velocity/u", "shape": [100], "dtype": "float64"}
  ],
  "summary": "HDF5 file: 2 dataset(s) — pressure, velocity/u"
}

propose_conversion_plan

Analyses the scan result stored in a session and produces a structured PLAID conversion plan, then writes a ready-to-run Python conversion script to output_script_path.

Infers: sample semantics (static vs temporal), mesh format and element type, PLAID CGNS feature path identifiers, and split definitions from sidecar files (train.txt, test.txt, ...).

Input:

Parameter Type Required Description
session_id string yes Session ID from a previous scan_raw_dataset call.
output_script_path string yes Path where the generated conversion script is written.
dataset_name string no Human-readable name used in the script header (default: "dataset").

Output:

{
  "session_id": "6093e66b",
  "sample_semantics": "static",
  "closest_example": "shapenetcar.py",
  "external_dependencies": ["plyfile", "Muscat"],
  "mesh": {"format": "ply", "element_type": "Triangle_3"},
  "features": {
    "input": ["Base_3_3/Zone/GridCoordinates/CoordinateX", "..."],
    "output": ["Base_3_3/Zone/VertexFields/press"],
    "constant": ["Base_3_3/Zone/Elements_Triangle_3/ElementConnectivity", "..."]
  },
  "splits": {"train": {"source": "train.txt", "count": 2900}, "test": {"source": "test.txt", "count": 100}},
  "edge_cases": ["PLY watertight meshes may have fewer vertices than the pressure array..."],
  "backends": ["hf_datasets", "cgns", "zarr"],
  "generated_script_path": "/path/to/convert.py"
}

The generated script is a complete, runnable Python file. Edit RAW_DATA_DIR, OUTPUT_DIR, and the field loading block as needed before running.

run_conversion

Executes the conversion script generated by propose_conversion_plan. Patches OUTPUT_DIR and BACKEND in the script, runs it in a subprocess, and returns the status.

Input:

Parameter Type Required Description
session_id string yes Session ID with a completed conversion plan.
output_dir string yes Directory where the PLAID dataset will be written.
backend string no PLAID storage backend: hf_datasets, cgns, or zarr (default: hf_datasets).

Output:

{
  "session_id": "6093e66b",
  "output_dir": "/data/plaid/shapenetcar",
  "backend": "hf_datasets",
  "script_path": "/path/to/convert.py",
  "status": "success",
  "message": "Conversion completed successfully."
}

Roadmap

  • get_conversion_status: query the progress of an ongoing conversion job

About

MCP server for converting datasets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages