11# Code Index: plexe
22
3- > Generated on 2026-03-02 19:57:53
3+ > Generated on 2026-03-02 22:03:39
44
55Code structure and public interface documentation for the ** plexe** package.
66
@@ -207,14 +207,14 @@ Local process runner - executes training in subprocess.
207207
208208** ` LocalProcessRunner ` ** - Runs training in local subprocess.
209209- ` __init__(self, work_dir: str) `
210- - ` run_training(self, template: str, model: Any, feature_pipeline: Pipeline, train_uri: str, val_uri: str, timeout: int, target_columns: list[str], optimizer: Any, loss: Any, epochs: int, batch_size: int, group_column: str | None) -> Path ` - Execute training in subprocess.
210+ - ` run_training(self, template: str, model: Any, feature_pipeline: Pipeline, train_uri: str, val_uri: str, timeout: int, target_columns: list[str], task_type: str, optimizer: Any, loss: Any, epochs: int, batch_size: int, group_column: str | None, mixed_precision: bool, dataloader_workers: int ) -> Path ` - Execute training in subprocess.
211211
212212---
213213## ` execution/training/runner.py `
214214Training runner abstract base class.
215215
216216** ` TrainingRunner ` ** - Abstract base class for training execution environments.
217- - ` run_training(self, template: str, model: Any, feature_pipeline: Pipeline, train_uri: str, val_uri: str, timeout: int, target_columns: list[str]) -> Path ` - Execute model training and return path to artifacts.
217+ - ` run_training(self, template: str, model: Any, feature_pipeline: Pipeline, train_uri: str, val_uri: str, timeout: int, target_columns: list[str], task_type: str ) -> Path ` - Execute model training and return path to artifacts.
218218
219219---
220220## ` helpers.py `
@@ -312,6 +312,9 @@ Simple dataclasses for model building workflow.
312312
313313** ` DataLayout ` ** - Physical structure of dataset (not semantic meaning).
314314
315+ ** ` TaskType ` ** - Canonical ML task type determined during Phase 1.
316+ - ` is_classification(self) -> bool ` - No description
317+
315318** ` Metric ` ** - Evaluation metric definition.
316319
317320** ` BuildContext ` ** - Context passed through workflow phases.
@@ -452,6 +455,7 @@ Standard Keras predictor - NO Plexe dependencies.
452455** ` KerasPredictor ` ** - Standalone Keras predictor.
453456- ` __init__(self, model_dir: str) `
454457- ` predict(self, x: pd.DataFrame) -> pd.DataFrame ` - Make predictions on input DataFrame.
458+ - ` predict_proba(self, x: pd.DataFrame) -> pd.DataFrame ` - Predict per-class probabilities on input DataFrame.
455459
456460---
457461## ` templates/inference/lightgbm_predictor.py `
@@ -468,6 +472,7 @@ Standard PyTorch predictor - NO Plexe dependencies.
468472** ` PyTorchPredictor ` ** - Standalone PyTorch predictor.
469473- ` __init__(self, model_dir: str) `
470474- ` predict(self, x: pd.DataFrame) -> pd.DataFrame ` - Make predictions on input DataFrame.
475+ - ` predict_proba(self, x: pd.DataFrame) -> pd.DataFrame ` - Predict per-class probabilities on input DataFrame.
471476
472477---
473478## ` templates/inference/xgboost_predictor.py `
@@ -489,37 +494,37 @@ Model card template generator.
489494Hardcoded robust CatBoost training loop.
490495
491496** Functions:**
492- - ` train_catboost(untrained_model_path: Path, train_uri: str, val_uri: str, output_dir: Path, target_column: str) -> dict ` - Train CatBoost model directly (no Spark).
497+ - ` train_catboost(untrained_model_path: Path, train_uri: str, val_uri: str, output_dir: Path, target_column: str, task_type: str | None ) -> dict ` - Train CatBoost model directly (no Spark).
493498- ` main() ` - No description
494499
495500---
496501## ` templates/training/train_keras.py `
497- Hardcoded robust Keras training loop .
502+ Keras training template with streaming data loading, multi-GPU (MirroredStrategy), and mixed precision .
498503
499504** Functions:**
500- - ` train_keras(untrained_model_path: Path, train_uri: str, val_uri: str, output_dir: Path, target_column: str, epochs: int, batch_size: int) -> dict ` - Train Keras model directly .
505+ - ` train_keras(untrained_model_path: Path, train_uri: str, val_uri: str, output_dir: Path, target_column: str, epochs: int, batch_size: int, use_multi_gpu: bool, use_mixed_precision: bool, task_type: str | None ) -> dict ` - Train Keras model with streaming data, optional multi-GPU, and mixed precision .
501506
502507---
503508## ` templates/training/train_lightgbm.py `
504509Hardcoded robust LightGBM training loop.
505510
506511** Functions:**
507- - ` train_lightgbm(untrained_model_path: Path, train_uri: str, val_uri: str, output_dir: Path, target_column: str, group_column: str | None) -> dict ` - Train LightGBM model directly (no Spark).
512+ - ` train_lightgbm(untrained_model_path: Path, train_uri: str, val_uri: str, output_dir: Path, target_column: str, group_column: str | None, task_type: str | None ) -> dict ` - Train LightGBM model directly (no Spark).
508513- ` main() ` - No description
509514
510515---
511516## ` templates/training/train_pytorch.py `
512- Hardcoded robust PyTorch training loop .
517+ PyTorch training template with streaming data loading, multi-GPU (DDP), and mixed precision .
513518
514519** Functions:**
515- - ` train_pytorch(untrained_model_path: Path, train_uri: str, val_uri: str, output_dir: Path, target_column: str, epochs: int, batch_size: int) -> dict ` - Train PyTorch model directly .
520+ - ` train_pytorch(untrained_model_path: Path, train_uri: str, val_uri: str, output_dir: Path, target_column: str, epochs: int, batch_size: int, num_workers: int, use_ddp: bool, use_mixed_precision: bool, task_type: str | None ) -> dict ` - Train PyTorch model with streaming data, optional DDP, and mixed precision .
516521
517522---
518523## ` templates/training/train_xgboost.py `
519524Hardcoded robust XGBoost training loop.
520525
521526** Functions:**
522- - ` train_xgboost(untrained_model_path: Path, train_uri: str, val_uri: str, output_dir: Path, target_column: str, group_column: str | None) -> dict ` - Train XGBoost model directly (no Spark).
527+ - ` train_xgboost(untrained_model_path: Path, train_uri: str, val_uri: str, output_dir: Path, target_column: str, group_column: str | None, task_type: str | None ) -> dict ` - Train XGBoost model directly (no Spark).
523528- ` main() ` - No description
524529
525530---
@@ -624,7 +629,7 @@ Utility functions for dashboard data loading.
624629- ` load_report(exp_path: Path, report_name: str) -> dict | None ` - Load YAML report from DirNames.BUILD_DIR/reports/.
625630- ` load_code_file(file_path: Path) -> str | None ` - Load Python code file.
626631- ` load_parquet_sample(uri: str, limit: int) -> pd.DataFrame | None ` - Load first N rows from parquet file.
627- - ` get_parquet_row_count(uri: str) -> int | None ` - Get row count from parquet file .
632+ - ` get_parquet_row_count(uri: str) -> int | None ` - Get row count from parquet metadata without reading data .
628633- ` load_json_file(file_path: Path) -> dict | None ` - Load JSON file.
629634
630635---
@@ -636,6 +641,21 @@ LiteLLM model wrapper with retry logic and optional post-call hook.
636641- ` generate(self) ` - Generate with automatic retries, header injection, and post-call hook.
637642- ` chat(self) ` - Chat with automatic retries, header injection, and post-call hook.
638643
644+ ---
645+ ## ` utils/parquet_dataset.py `
646+ Streaming parquet data loading utilities for large-dataset training.
647+
648+ ** ` ParquetIterableDataset ` ** - Streaming parquet dataset for PyTorch DataLoader.
649+ - ` __init__(self, uri: str, target_column: str, task_type: str) `
650+ - ` total_rows(self) -> int ` - No description
651+
652+ ** Functions:**
653+ - ` get_parquet_row_count(uri: str) -> int ` - Get total row count from parquet metadata without reading data.
654+ - ` get_dataset_size_bytes(uri: str) -> int ` - Get dataset size in bytes for a local file or directory of parquet files.
655+ - ` parquet_batch_generator(uri: str, target_column: str, batch_size: int, task_type: str | None) -> Iterator[tuple[np.ndarray, np.ndarray]] ` - Streaming parquet batch generator for Keras/TensorFlow.
656+ - ` get_parquet_feature_count(uri: str, target_column: str) -> int ` - Get number of feature columns (total columns minus target).
657+ - ` get_steps_per_epoch(uri: str, batch_size: int) -> int ` - Compute number of steps per epoch for a parquet dataset.
658+
639659---
640660## ` utils/reporting.py `
641661Utilities for saving agent reports to disk.
0 commit comments