|
| 1 | +--- |
| 2 | +license: mit |
| 3 | +task_categories: |
| 4 | +- question-answering |
| 5 | +language: |
| 6 | +- en |
| 7 | +tags: |
| 8 | +- finance |
| 9 | +size_categories: |
| 10 | +- 100K<n<1M |
| 11 | +--- |
| 12 | +# ReceiptQA: A Comprehensive Dataset for Receipt Understanding and Question Answering |
| 13 | + |
| 14 | +ReceiptQA is a large-scale dataset specifically designed to support and advance research in receipt understanding through question-answering (QA) tasks. This dataset offers a wide range of questions derived from real-world receipt images, addressing diverse challenges such as text extraction, layout understanding, and numerical reasoning. ReceiptQA provides a benchmark for evaluating and improving models for receipt-based QA tasks. |
| 15 | + |
| 16 | + |
| 17 | + |
| 18 | +## Dataset Overview |
| 19 | +ReceiptQA consists of 3,500 receipt images paired with 171,000 question-answer pairs, constructed using two complementary approaches: |
| 20 | + |
| 21 | +1. **LLM-Generated Subset:** 70,000 QA pairs generated by GPT-4o, validated by human annotators to ensure accuracy and relevance. |
| 22 | +2. **Human-Created Subset:** 101,000 QA pairs crafted manually, including both answerable and unanswerable questions for diverse evaluation. |
| 23 | + |
| 24 | +### Key Features: |
| 25 | +- Covers five domains: Retail, Food Services, Supermarkets, Fashion, and Medical. |
| 26 | +- Includes both straightforward and complex questions. |
| 27 | +- Offers a comprehensive benchmark for receipt-specific QA tasks. |
| 28 | + |
| 29 | +### Dataset Statistics |
| 30 | +| Domain | Receipts | Human QA Pairs | LLM QA Pairs | |
| 31 | +|-----------------|----------|----------------|--------------| |
| 32 | +| Retail | 800 | 23,200 | 16,000 | |
| 33 | +| Food Services | 700 | 20,300 | 14,000 | |
| 34 | +| Supermarkets | 700 | 20,300 | 14,000 | |
| 35 | +| Fashion | 650 | 18,850 | 13,000 | |
| 36 | +| coffe shop | 650 | 18,850 | 13,000 | |
| 37 | +| **Total** | **3,500**| **101,935** | **70,000** | |
| 38 | + |
| 39 | +### Example of Data |
| 40 | + |
| 41 | +Here is a sample of the data structure used in the ReceiptQA dataset: |
| 42 | + |
| 43 | +```json |
| 44 | +{ |
| 45 | + "question": "What is the total amount for this receipt?", |
| 46 | + "answer": "559.99 L.E" |
| 47 | +}, |
| 48 | +{ |
| 49 | + "question": "What is the name of item 1?", |
| 50 | + "answer": "Pullover PU-SOK1175" |
| 51 | +}, |
| 52 | +{ |
| 53 | + "question": "What is the transaction number?", |
| 54 | + "answer": "29786" |
| 55 | +}, |
| 56 | +{ |
| 57 | + "question": "How many items were purchased?", |
| 58 | + "answer": "2" |
| 59 | +} |
| 60 | +``` |
| 61 | +## Requirements |
| 62 | +```bash |
| 63 | +# Install required libraries for inference |
| 64 | +pip install torch==1.10.0 |
| 65 | +pip install transformers==4.5.0 |
| 66 | +pip install datasets==2.3.0 |
| 67 | +pip install Pillow |
| 68 | +``` |
| 69 | + |
| 70 | + |
| 71 | + |
| 72 | +## Download Links |
| 73 | + |
| 74 | +### Full Dataset |
| 75 | +- **Train Set :** [Images](https://huggingface.co/datasets/mahmoud2019/ReceiptQA/resolve/main/train_images.zip?download=true) | [Labels](https://huggingface.co/datasets/mahmoud2019/ReceiptQA/resolve/main/train_label.zip?download=true) |
| 76 | +- **Validation Set :** [Images](https://huggingface.co/datasets/mahmoud2019/ReceiptQA/resolve/main/validation_images.zip?download=true) | [Labels](https://huggingface.co/datasets/mahmoud2019/ReceiptQA/resolve/main/validation_label.zip?download=true) |
| 77 | +- **Test Set :** [Images](https://huggingface.co/datasets/mahmoud2019/ReceiptQA/resolve/main/test_images.zip?download=true) | [Labels](https://huggingface.co/datasets/mahmoud2019/ReceiptQA/resolve/main/test_label.zip?download=true) |
| 78 | + |
| 79 | + |
| 80 | +## Using ReceiptQA |
| 81 | +To use ReceiptQA for training or evaluation, follow these steps: |
| 82 | + |
| 83 | +### Step 1: Clone the Repository |
| 84 | +```bash |
| 85 | +git clone https://github.com/your-repo/ReceiptQA](https://github.com/MahmoudElsayedMahmoud/ReceiptQA-A-Comprehensive-Dataset-for-Receipt-Understanding-and-Question-Answering |
| 86 | +cd ReceiptQA |
| 87 | +``` |
| 88 | +
|
| 89 | +### Step 2: Download the Dataset |
| 90 | +Download the dataset using the links provided above and place it in the `data/` directory. |
| 91 | +
|
| 92 | +
|
| 93 | +## Evaluation Metrics |
| 94 | +ReceiptQA provides the following metrics for evaluating QA models: |
| 95 | +- **Exact Match (EM):** Measures if the predicted answer exactly matches the ground truth. |
| 96 | +- **F1 Score:** Evaluates the overlap between the predicted and ground truth answers. |
| 97 | +- **Precision:** Measures the accuracy of the predictions. |
| 98 | +- **Recall:** Measures the ability to retrieve relevant answers. |
| 99 | +- **Answer Containment:** Checks if the ground truth answer is included in the predicted response. |
| 100 | +
|
| 101 | +## Models Benchmarked |
| 102 | +ReceiptQA has been used to evaluate state-of-the-art models, including: |
| 103 | +- **GPT-4** |
| 104 | +- **Llama3.2 (11B)** |
| 105 | +- **Gemni 2.0** |
| 106 | +- **Phi 3.5 Vision** |
| 107 | +- **InternVL2 (4B/8B)** |
| 108 | +- **LLaVA 7B** |
| 109 | +
|
| 110 | +
|
| 111 | +
|
| 112 | +## Citation |
| 113 | +If you use ReceiptQA in your research, please cite our paper: |
| 114 | +``` |
| 115 | +Will be publish soon !! |
| 116 | +``` |
| 117 | +
|
| 118 | +
|
| 119 | +
|
| 120 | +## Contact |
| 121 | +For questions or feedback, please contact: |
| 122 | +- Mahmoud Abdalla: [mahmoudelsayed@chungbuk.ac.kr](mailto:mahmoudelsayed@chungbuk.ac.kr) |
| 123 | +- GitHub Issues: [Submit an issue](https://github.com/your-repo/ReceiptQA/issues) |
0 commit comments