| Version | Pretrained weights | Finetuned Weights |
|---|---|---|
| Res-50, routing, #1 | OneDrive | OneDrive (MLT19 Task4 H-mean: 50.3) |
| Res-50, routing, #3 | OneDrive | OneDrive (MLT19 Task4 H-mean: 51.2) |
Python 3.8 + PyTorch 1.9.0 + CUDA 11.1 + Detectron2 (v0.6)
# git clone https://github.com/ViTAE-Transformer/DeepSolo.git
# cd DeepSolo/DeepSolo++
conda create -n deepsolo++ python=3.8 -y
conda activate deepsolo++
pip install torch==1.9.0+cu111 torchvision==0.10.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
python -m pip install -e detectron2
python setup.py build develop
Synthetic multilingual training images from ICDAR 2019: Arabic | Bangla | Chinese | Japanese | Korean | Latin | Hindi
Training images of ArT, LSVT, MLT19: Link
Training images of RCTW: Link
All json files: Link
Some image files need to be renamed. Organize them as follows (lexicon files are not listed here):
|- ./datasets
|- Arabic
| |- train_images
| └ train.json
|- Bangla
| |- train_images
| └ train.json
|- Chinese
| |- train_images
| └ train.json
|- Hindi
| |- train_images
| └ train.json
|- Japanese
| |- train_images
| └ train.json
|- Korean
| |- train_images
| └ train.json
|- Latin
| |- train_images
| └ train.json
|- RCTW
| |- train_images
| └ train.json
|- ArT
| |- rename_artimg_train
| └ train.json
|- LSVT
| |- rename_lsvtimg_train
| └ train.json
|- mlt19
| |- train_images
| |- test_images
| |- mlt19_train.json
| └ mlt19_test.json
|- mlt17
| |- test_images
| └ mlt17_test.json
Before training, download DeepSolo and put it in ./pretrained_model for initialization.
1. Pre-train
python tools/train_net.py --config-file configs/R_50/mlt19_multihead/pretrain.yaml --num-gpus 8
2. Fine-tune
python tools/train_net.py --config-file configs/R_50/mlt19_multihead/finetune.yaml --num-gpus 8
python tools/train_net.py --config-file configs/R_50/mlt19_multihead/finetune.yaml --eval-only MODEL.WEIGHTS ${MODEL_PATH}
Note: To conduct evaluation on ICDAR MLT 2019, you can directly submit the saved file under output/R50/bs8_600k_synth-textocr-init/finetune to the official website for evaluation.
python demo/demo.py --config-file configs/R_50/mlt19_multihead/finetune.yaml --input ${IMAGES_FOLDER_OR_ONE_IMAGE_PATH} --output ${OUTPUT_PATH} --opts MODEL.WEIGHTS <MODEL_PATH>
@article{ye2023deepsolo++,
title={DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multilingual Text Spotting},
author={Ye, Maoyuan and Zhang, Jing and Zhao, Shanshan and Liu, Juhua and Liu, Tongliang and Du, Bo and Tao, Dacheng},
booktitle={arxiv preprint arXiv:2305.19957},
year={2023}
}