Thank you for your interest in contributing to DICE Embeddings! This guide outlines our development workflow and best practices.
git clone https://github.com/dice-group/dice-embeddings.git
cd dice-embeddings
# Install in development mode with all dependencies
pip install -e '.[dev]' --extra-index-url https://download.pytorch.org/whl/cpu
# For GPU support, use:
pip install -e '.[dev]'wget https://files.dice-research.org/datasets/dice-embeddings/KGs.zip --no-check-certificate
unzip KGs.zipAlways run these checks before committing:
# 1. Run ruff linter (required)
ruff check dicee/ --line-length=200
# 2. Run type checker (recommended)
mypy dicee/ --config-file=pyproject.toml
# 3. Run tests
python -m pytest -p no:warnings -xOur CI pipeline automatically runs:
- ✅ Ruff linting (blocking)
- ✅ Type checking with mypy (non-blocking, currently)
- ✅ Full test suite with coverage
Pull requests must pass ruff and pytest to be merged.
We are progressively adding type hints to the codebase. For new code:
- ✅ DO add type hints to all public functions
- ✅ DO use
Optional[T]for optional parameters - ✅ DO specify return types
⚠️ CONSIDER adding type hints to internal functions
Example:
from typing import Optional, List, Tuple, Union
def predict_topk(
self,
*,
h: Optional[Union[str, List[str]]] = None,
r: Optional[Union[str, List[str]]] = None,
t: Optional[Union[str, List[str]]] = None,
topk: int = 10
) -> Union[List[Tuple[str, float]], List[List[Tuple[str, float]]]]:
"""
Predict top-k missing items in a triple pattern.
Args:
h: Head entity/entities. None to predict heads.
r: Relation/relations. None to predict relations.
t: Tail entity/entities. None to predict tails.
topk: Number of top predictions to return.
Returns:
For single query: List[(item, score), ...]
For batch query: List of such lists
"""
...Provide actionable error messages with:
- Clear problem description
- Suggested solutions
- Example commands
- Link to documentation
Example:
raise ValueError(
f"Dataset directory not found: {path}\\n"
f"\\nSuggestions:\\n"
f" 1. Download datasets:\\n"
f" wget https://files.dice-research.org/datasets/dice-embeddings/KGs.zip\\n"
f" 2. Use absolute path: --dataset_dir /absolute/path/to/KGs/UMLS\\n"
f"\\nSee docs/guides/troubleshooting.md for more solutions\\n"
)Use Google-style docstrings with type information:
def my_function(param1: str, param2: int = 10) -> bool:
"""
One-line summary of function purpose.
Longer description if needed, with examples and context.
Args:
param1: Description of param1
param2: Description of param2 (default: 10)
Returns:
Description of return value
Raises:
ValueError: When param1 is empty
Examples:
>>> my_function("test", 5)
True
>>> my_function("example")
False
See Also:
- related_function(): Related functionality
- docs/guide.md: Documentation reference
"""
...- Place tests in
tests/directory - Name test files
test_*.py - Use descriptive test function names
Example test:
def test_predict_topk_single_query():
"""Test predict_topk with a single (h, r, ?) query."""
from dicee import KGE
model = KGE(path="path/to/trained/model")
results = model.predict_topk(h="Mongolia", r="isLocatedIn", topk=3)
assert len(results) == 3
assert all(isinstance(item, tuple) for item in results)
assert all(len(item) == 2 for item in results)# Run all tests
python -m pytest -p no:warnings -x
# Run specific test file
python -m pytest tests/test_predict_kge.py -p no:warnings
# Run with coverage
coverage run -m pytest -p no:warnings -x
coverage report -mTests serve as living documentation! When adding examples to the README:
- ✅ DO link to test files instead of creating standalone examples
- ✅ DO keep tests up-to-date and CI-verified
⚠️ DON'T create example scripts that can become stale
| Content Type | Location |
|---|---|
| API Reference | Docstrings (auto-generated to docs/) |
| User Guides | docs/guides/*.md |
| Troubleshooting | docs/guides/troubleshooting.md |
| Multi-hop Queries | docs/guides/multi_hop_queries.md |
| Examples | Test files in tests/test_*.py |
| Dataset Formats | docs/guides/datasets.md |
When adding new features:
- ✅ Add comprehensive docstrings with examples
- ✅ Create or update relevant guide in
docs/guides/ - ✅ Add test cases demonstrating usage
- ✅ Update README.md with link to test file
- ✅ Update CHANGELOG if applicable
git checkout develop
git pull origin develop
git checkout -b feature/my-new-feature- Write code with type hints
- Add comprehensive tests
- Update documentation
- Write clear commit messages
# Run all checks
ruff check dicee/ --line-length=200
mypy dicee/ --config-file=pyproject.toml
python -m pytest -p no:warnings -xgit add -A
git commit -m "feat: add new feature X
- Detailed description of changes
- Added type hints to all new functions
- Added tests in tests/test_feature_x.py
- Updated docs/guides/feature_guide.md"
git push origin feature/my-new-feature- Target branch:
develop(notmain) - Clear description of changes
- Link to related issues
- Ensure CI passes
Use conventional commits format:
<type>: <subject>
<body>
Types:
feat:New featurefix:Bug fixdocs:Documentation changesrefactor:Code restructuringtest:Adding testschore:Maintenance tasksperf:Performance improvements
Example:
feat: add multi-hop query support for union operations
- Implemented 2u (two-way union) query pattern
- Added up (union + projection) query pattern
- Added comprehensive type hints to answer_multi_hop_query()
- Added tests in tests/test_answer_multi_hop_query.py
- Updated docs/guides/multi_hop_queries.md with examples
Closes #123
See .github/skills/add-model/SKILL.md for the complete workflow.
- Extend
AbstractTrainerindicee/abstracts.py - Implement required methods:
fit(),configure_callbacks() - Register in
dicee/trainer/__init__.py - Add tests in
tests/test_trainers.py
- Check
docs/guides/troubleshooting.md - Run with verbose logging:
--verbose 1 - Enable debug mode if available
- Add minimal reproduction in test file
- 📖 Documentation: https://dice-embeddings.readthedocs.io/
- 💬 Issues: https://github.com/dice-group/dice-embeddings/issues
- 🧪 Examples: See
tests/test_*.pyfiles - 📋 Guides: See
docs/guides/directory
Before submitting a pull request:
- Code follows line length limit (200 characters)
- Ruff linting passes:
ruff check dicee/ --line-length=200 - Type hints added to new functions
- Tests added and passing:
pytest -p no:warnings -x - Docstrings written with examples
- Documentation updated (guides, README)
- Commit messages follow convention
- PR targets
developbranch - CI pipeline passes
Thank you for contributing! 🎉