We support two built-in multi-hop knowledge graph question answering (KGQA) datasets:
webqspcwq
We first pre-compute and cache entity and relation embeddings for all samples to save time for later training and inference of retrievers.
We use gte-large-en-v1.5 for text encoder, hence the environment name.
conda create -n gte_large_en_v1-5 python=3.10 -y
conda activate gte_large_en_v1-5
pip install -r requirements/gte_large_en_v1-5.txt
pip install -U xformers --index-url https://download.pytorch.org/whl/cu121python emb.py -d Dwhere D should be a dataset mentioned in "Supported Datasets".
We now train a retriever, employ it for retrieval (inference), and evaluate the retrieval results.
conda create -n retriever python=3.10 -y
conda activate retriever
pip install -r requirements/retriever.txt
pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cu121
pip install torch_geometric==2.5.3
pip install pyg_lib==0.3.1 torch_scatter==2.1.2 torch_sparse==0.6.18 -f https://data.pyg.org/whl/torch-2.1.0+cu121.htmlpython train.py -d Dwhere D should be a dataset mentioned in "Supported Datasets".
For logged learning curves, go to the corresponding Wandb interface.
Once trained, there will be a folder in the current directory of the form {dataset}_{time} (e.g., webqsp_Nov08-01:14:47/) that stores the trained model checkpoint cpt.pth.
python inference.py -p Pwhere P is the path to a saved model checkpoint. The predicted retrieval result will be stored in the same folder as the model checkpoint. For example, if P is webqsp_Nov08-01:14:47/cpt.pth, then the retrieval result will be saved as webqsp_Nov08-01:14:47/retrieval_result.pth.
python eval.py -d D -p Pwhere D should be a dataset mentioned in "Supported Datasets" and P is the path to inference result, e.g., webqsp_Nov08-01:14:47/retrieval_result.pth.