Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
.idea/
.gitignore
.pre-commit-config.yaml
AGENTS.md
CLAUDE.md
CODE_OF_CONDUCT.md
CONTRIBUTING.md
Expand All @@ -23,4 +24,4 @@ workdir/
# Local environment and config files
.env
.env.*
config.yaml
config.yaml
120 changes: 120 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# AGENTS.md

This file provides guidance to coding agents when working with code in this repository (for example, OpenAI Codex and Claude Code).

`AGENTS.md` is the canonical version of this document. `CLAUDE.md` is kept as a compatibility symlink that points here, so if you opened this file via `CLAUDE.md`, you are in the right place.

Comment thread
marcellodebernardi marked this conversation as resolved.
## Overview

Plexe is an agentic framework for building ML models from natural language. It employs a multi-agent architecture where
specialized AI agents collaborate to analyze data, generate solutions, and build functional ML models through an
autonomous 6-phase workflow.

**Entry point:** `python -m plexe.main --train-dataset-uri <uri> --user-id <id> --intent "<task>" --spark-mode <local|databricks>`

**Docker:**
- `docker build .` — default PySpark image (local Spark execution)
- `docker build --target databricks .` — Databricks Connect image (remote execution)

## Architecture

### 6-Phase Workflow (`plexe/workflow.py`)

1. **Data Understanding**: Statistical analysis → ML task identification → metric selection (with optional custom metric generation)
2. **Data Preparation**: Dataset splitting → intelligent sampling (default: 30k train, 10k val samples)
3. **Baseline Models**: Heuristic baseline with retry logic
4. **Model Search**: Hypothesis-driven tree search on samples (iterative improvement)
5. **Final Evaluation**: Optional test evaluation (default: disabled, uses validation performance)
6. **Packaging**: Consolidates artifacts into `work_dir/model/` (schemas/, config/, artifacts/, src/, evaluation/)

### Multi-Agent System (`plexe/agents/`)

14 specialized agents orchestrate the workflow:
- **LayoutDetectionAgent** → **StatisticalAnalyserAgent** → **MLTaskAnalyserAgent** → **MetricSelectorAgent** (Phase 1)
- **MetricImplementationAgent**: Generates custom metric code if not in `StandardMetric` enum
- **DatasetSplitterAgent** → **SamplingAgent** (Phase 2)
- **BaselineBuilderAgent** (Phase 3)
- **HypothesiserAgent** → **PlannerAgent** → **FeatureProcessorAgent** + **ModelDefinerAgent** (Phase 4 loop)
- **InsightExtractorAgent**: Analyzes variant results, populates `InsightStore` for future hypotheses
- **ModelEvaluatorAgent**: Multi-phase evaluation (Phase 5)

### Tree Search (`plexe/search/`)

Three-stage tree expansion strategy:
1. **Bootstrap**: Create diverse initial solutions from scratch (no parent)
2. **Debug**: Probabilistically fix buggy leaf nodes (max depth: 2)
3. **Improve**: Greedily expand best-performing solutions

Each iteration is hypothesis-driven: `HypothesiserAgent` analyzes the journal + accumulated insights to decide
what to try next, `PlannerAgent` turns that into concrete plans (FeaturePlan + ModelPlan), and `InsightExtractorAgent`
distills learnings from results back into the `InsightStore` to inform future iterations.

Search runs on **samples** (fast), best solution retrained on **full dataset** (accurate).

### Workflow Integration (`plexe/integrations/`)

Pluggable interface for connecting plexe to external infrastructure:
- **`WorkflowIntegration`** (`base.py`): ABC defining the contract (8 methods)
- **`StandaloneIntegration`** (`standalone.py`): Default implementation (local + optional S3)
- **`storage/`**: Composable storage helpers (`S3Helper`, Azure/GCS stubs)

Custom integrations implement `WorkflowIntegration` and pass the instance to `main(integration=MyIntegration(...))`.

### State Management

- **BuildContext** (`models.py`): Central state object passed through workflow
- **SearchJournal** (`search/journal.py`): DAG of all solutions, tracks ancestry and performance
- **InsightStore** (`search/insight_store.py`): Accumulates learnings across search iterations

## Commands

```bash
# Install
poetry install

# Run locally
python -m plexe.main --train-dataset-uri data.parquet --user-id user123 --intent "predict churn" --spark-mode local

# Build Docker images
make build # PySpark (default)
make build-databricks # Databricks Connect

# Run tests
poetry run pytest tests/unit/

# Format and lint
poetry run black .
poetry run ruff check . --fix

# Quick integration test via Docker
make test-quick
```

## Key Files

- `plexe/workflow.py`: Main orchestrator (6 phases)
- `plexe/main.py`: CLI entry point
- `plexe/config.py`: Config dataclass + StandardMetric enum + LLM routing
- `plexe/models.py`: Data models (BuildContext, Solution, Baseline, Hypothesis, Plan)
- `plexe/helpers.py`: Metric computation and model type selection
- `plexe/integrations/base.py`: WorkflowIntegration ABC
- `plexe/integrations/standalone.py`: Default integration (local + S3)
- `plexe/integrations/storage/`: Storage helper ABCs and implementations
- `plexe/search/journal.py`: Solution DAG tracking
- `plexe/search/tree_policy.py`: Search strategy
- `plexe/agents/*.py`: 14 specialized agents
- `plexe/templates/`: Code generation templates (training, inference, features, packaging)
- `plexe/utils/litellm_wrapper.py`: LLM wrapper with retries and optional `on_llm_call` hook
- `plexe/utils/tooling.py`: `@agentinspectable` decorator for agent-callable functions

## Code Style

- **Functions**: Max 50 lines (excluding docstrings)
- **Formatting**: Black with 120 char line length
- **Linting**: Ruff with E203/E501/E402 ignored
- **Typing**: Type hints and Pydantic models required
- **Imports**: ALWAYS at top level in order: stdlib, third-party, local; NEVER inside functions
- **__init__.py**: No implementation code except in `__init__.py` files
- **Docstrings**: Required for public APIs; Sphinx style
- **Testing**: Write pytest tests in `tests/unit/` mirroring `plexe/` package structure
- **Elegance**: Write the simplest solution possible; avoid over-engineering; prefer deleting code over adding code
118 changes: 0 additions & 118 deletions CLAUDE.md

This file was deleted.

1 change: 1 addition & 0 deletions CLAUDE.md