- Angelica Noriega Stoudenikina
- Zaid Minhas
- Héna Ricucci
- Beaudelaire Tsoungui Nzodoumkouo
Chosen challenge: #1 Data Mining, Processing & Enrichment
There is currently a loss of information due to ignoring visual elements in reports and missing support for french reports from Canada.
Efficiently extract, classify and analyse the visual elements of the modern slavery reports provided by companies.
- Python script that scrapes Public Safety Canada's Supply Chains Act library to get the pdf URLs in both french and english
- Python script that extracts the visual elements from these statements
- AI classification model that categorizes the images into one of the following categories: Signature, Logo, Scanned Page, Diagram, or Other
- Model that extracts information from diagrams and text from scanned pages
Watch our solution pitch here!
Location: Online
• Public Safety Canada's Supply Chain Act Library
Location: /project
It includes the following:
• Data transformations, merging & quality assurance
• Model related code (projection, prediction, correlation etc.)
- Python 3.9+ installed on your system
- Node.js 16+ and npm installed
- Git (to clone the repository)
- tesseract
- macOS → brew install tesseract
- Ubuntu/Debian: sudo apt-get install tesseract-ocr
- Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki
# Clone the repository
git clone <repository-url>
cd AIMS-repo
# Run the automated setup script
chmod +x setup.sh
./setup.shThe setup script will automatically:
- ✅ Install all Python dependencies (backend + ML/AI libraries)
- ✅ Install all Node.js dependencies (React frontend)
- ✅ Verify all prerequisites are met
- ✅ Provide clear next steps
After setup completes, start both services:
Terminal 1 - Backend:
cd backend
python3 app.pyTerminal 2 - Frontend:
cd frontend
npm run devThen open your browser to: http://localhost:5173
If you prefer manual setup or encounter issues with the automated script:
cd backend
pip3 install -r requirements.txt
# Install ML/AI dependencies
cd ../project_code
pip3 install -r requirements.txtcd frontend
npm install# Terminal 1: Backend
cd backend && python3 app.py
# Terminal 2: Frontend
cd frontend && npm run dev- Click "Start Extraction" on the main dashboard
- The system will automatically:
- Crawl Canadian Supply Chain Act statements (if needed)
- Extract visual elements from PDF documents
- Classify images using AI models
- Display results in an interactive table
- Overview Tab: Statistics and summary of extracted data
- Data Table Tab: Detailed view of all extracted images with sorting
- Image Preview: Click any row to see the actual extracted image
- Image Classifier: Upload and classify individual images
- PDF Extractor: Extract visuals from your own PDF files
AIMS-repo/
├── backend/ # Flask API server
├── frontend/ # React web interface
├── project_code/ # Core extraction & AI modules
│ ├── classification/ # Image classification models
│ └── data_extraction/ # PDF processing & web scraping
├── setup.sh # Automated setup script
└── README.md # This file
- "command not found: python" → Use
python3instead - Module import errors → Ensure you've installed project_code dependencies
- Port conflicts → Backend runs on :5001, Frontend on :5173
- Image loading issues → Check that extraction has completed successfully
- Check the browser console for frontend errors
- Check terminal output for backend errors
- Ensure all dependencies are installed correctly
- Verify Python and Node.js versions meet requirements
Location: /docs
• PowerPoint presentation
• Flayers
• Additional videos/demo
• Protocols
• Guides
This project builds on the open research of Project AIMS (AI against Modern Slavery) by Mila and QUT.
GitHub repository: ai4h_aims-au.
- Describe here the resources used in developing your solution (e.g. GPUs, etc).
This repository and its accompanying models, datasets, metrics, dashboards, and comparative analyses are provided strictly for research and demonstration purposes.
Any comparisons, rankings, or assessments of companies or organizations are exploratory in nature. They may be affected by incomplete data, modeling limitations, or methodological choices. These results must not be used to make factual, legal, or reputational claims about any entity without independent expert review and validation.
Do not use this repository’s contents to make public statements or claims about specific companies, organizations, or individuals.
By submitting this solution to the AIMS Hackathon, our team acknowledges and agrees to abide by the Event’s Terms and Conditions.