The source code for the paper PromptFE: Automated Feature Engineering by Prompting.
PromptFE uses large language models to automatically suggest and evaluate engineered features for tabular machine learning datasets. It iteratively prompts an LLM to generate feature transformations, evaluates them using cross-validation, and selects the best-performing features.
- Python 3.8+
- OpenAI API key
Install dependencies:
pip install -r requirements.txtCreate a .env file in the project root with your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
- To create and evaluate features, run:
python run.pyThis prompts GPT to suggest feature transformations, evaluates them on the specified model and dataset, and saves results to the results/ directory.
- To read and analyze results, run:
python read.pyThis loads saved results from results/ and writes a performance summary to log_result.txt.
PromptFE/
├── data/ # Dataset files
│ ├── *.csv # Raw data
│ ├── *.json # Dataset metadata (task type, feature info)
│ └── *.txt # Feature descriptions
├── dataset_descriptions/ # Extended dataset descriptions
├── params/ # Model hyperparameter configurations
│ └── {MODEL}-{DATASET}.txt
├── utils/
│ └── tools.py # Logging and timing utilities
├── run.py # Main script: feature generation and evaluation
├── read.py # Results analysis script
├── env.py # Model training and evaluation environment
├── feat_tree.py # Feature tree construction and RPN parsing
├── dataset.py # Dataset loading and preprocessing
├── search_space.py # Feature operation definitions
├── metrics.py # Custom evaluation metrics
└── requirements.txt
| Dataset | Task |
|---|---|
| AIDS | Classification |
| Airfoil | Regression |
| Bikeshare | Regression |
| Credit Default | Classification |
| German Credit | Classification |
| Housing | Regression |
| Wine Quality (Red) | Regression |
Classification: LogisticRegression, RandomForestClassifier, XGBClassifier, LGBMClassifier
Regression: Lasso, RandomForestRegressor, XGBRegressor, LGBMRegressor
PromptFE supports the following feature transformation types:
- Unary:
log,sqrt,reciprocal,min-max - Binary:
+,-,*,/,mod
Features are represented as trees using canonical Reverse Polish Notation (cRPN) and evaluated iteratively to find the best-performing subset.
If you use this code, please cite the paper:
@inproceedings{zou-etal-2026-promptfe,
title = "{P}rompt{FE}: Automated Feature Engineering by Prompting",
author = "Zou, Yufeng and Utke, Jean and Klabjan, Diego and Liu, Han",
booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)",
month = mar,
year = "2026",
address = "Rabat, Morocco",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2026.eacl-long.28/",
doi = "10.18653/v1/2026.eacl-long.28",
pages = "653--681",
ISBN = "979-8-89176-380-7"
}
A portion of the code is adapted from DIFER.