Thank you for your interest in contributing to My Web Intelligence (MWI)! This document provides guidelines and instructions for contributing.
If you find a bug, please open an issue on GitHub with:
- A clear, descriptive title
- Steps to reproduce the bug
- Expected behavior vs. actual behavior
- Your environment (Python version, OS, Docker version if applicable)
- Relevant log output or error messages
Feature requests are welcome! Please open an issue with:
- A clear description of the feature
- The use case / research scenario it addresses
- Any implementation ideas you have
-
Fork the repository and create your branch from
master -
Set up your development environment:
git clone https://github.com/YOUR_USERNAME/mwi.git cd mwi python -m venv venv source venv/bin/activate # or venv\Scripts\activate on Windows pip install -r requirements.txt pip install -r requirements-dev.txt # if available
-
Make your changes following the code style guidelines below
-
Write or update tests for your changes:
make test # or PYTHONPATH=. pytest tests/ -v
-
Run code quality checks:
flake8 mwi/ mypy mwi/
-
Commit your changes with a clear commit message:
Add feature X for research scenario Y - Detailed description of changes - Any breaking changes noted -
Push and open a Pull Request against the
masterbranch
- Follow PEP 8 conventions
- Use meaningful variable and function names
- Maximum line length: 120 characters
- Use type hints where practical
- Document public functions and classes with docstrings
- Update README.md if adding new features or changing CLI commands
- Write tests for new functionality
- Ensure existing tests pass before submitting PR
- Target >85% code coverage for new code
- Use pytest fixtures from
tests/conftest.py
# Clone and install
git clone https://github.com/MyWebIntelligence/mwi.git
cd mwi
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Initialize database
python mywi.py db setup
# Run tests
make test# For Mercury Parser (content extraction)
npm install -g @postlight/mercury-parser
# For Playwright (dynamic media extraction)
python install_playwright.py
# For ML features (embeddings, NLI)
pip install -r requirements-ml.txt# Build and run
docker compose up -d --build
# Execute commands
docker compose exec mwi python mywi.py land list
# View logs
docker compose logs -f mwimwi/
├── cli.py # Command-line interface
├── controller.py # Business logic controllers
├── core.py # Core algorithms
├── model.py # Database schema (Peewee ORM)
├── export.py # Export functionality
└── ... # Other modules
tests/
├── conftest.py # Shared fixtures
├── fixtures/ # Test data
└── test_*.py # Test files
- Open an issue for questions about contributing
- Check existing issues and discussions first
- For research methodology questions, see the JOSS paper
By contributing, you agree that your contributions will be licensed under the MIT License.