This repository contains the implementation of the Linearly Interpretable Concept Embedding Model (LICEM) for text classification tasks. LICEM leverages concept-based embeddings to provide interpretable predictions while maintaining competitive performance.
To set up the environment, follow these steps:
-
Clone the repository
-
Move to the working directory:
cd Linearly-Interpretable-Concept-Embedding-Model-for-Text -
Install dependencies using
conda:conda env create -f environment.yml conda activate licem
-
Configure environment variables: After installing dependencies, update the environment variables in
env.py:- PROJECT_NAME: Set this to your project name.
- HOME: Specify the path to your project directory.
- OPENAI_API_KEY: Replace this with your OpenAI API key.
- MISTRALAIKEY: Replace this with your Mistral AI API key.
These variables are essential for the proper functioning of the code.
Ensure your datasets are stored in the directory specified in env.py.
Supported datasets include:
cebabimdbtrec50wosclincbanking
If you want to use preprocessed datasets, set use_stored_dataset: True in the configuration file (conf/general_sweep.yaml).
The experiments are configured using Hydra. The main configuration file is located at conf/general_sweep.yaml. Key parameters include:
- Dataset: Specify the dataset to use (
dataset). - Model: Choose the model (
model). - Supervision Strategy: Define the supervision strategy (
supervision). - Training Parameters: Adjust parameters like
max_epochs,batch_size, andlr_patience.
For detailed configuration options, refer to the comments in conf/general_sweep.yaml.
To reproduce the experiments, run the following command:
python main.pyThis script will:
- Load or preprocess the dataset.
- Train the specified model.
- Evaluate the model on the test set.
This will repeated for each combination of parameters defined in the sweeper section of conf/general_sweep.yaml.
This repository supports logging with Weights & Biases. To enable WandB logging:
-
Set your WandB project name and entity in
conf/general_sweep.yaml:wandb: project: "licem" entity: "your-wandb-entity"
-
Results will be logged to your WandB dashboard.
Results, including intervention data and metrics, are stored in the outputs directory. The path is dynamically generated based on the current timestamp.
If you use LICEM in your research, please cite our work:
@article{de2024self,
title={Self-supervised Interpretable Concept-based Models for Text Classification},
author={De Santis, Francesco and Bich, Philippe and Ciravegna, Gabriele and Barbiero, Pietro and Giordano, Danilo and Cerquitelli, Tania},
journal={arXiv preprint arXiv:2406.14335},
year={2024}
}