Thank you for your interest in contributing to the Self-Play Energy Forecasting & Anomaly Detection project! This document provides guidelines for contributing to the codebase.
We use a Git flow branching model:
main: Production-ready code, tagged releasesdev: Integration branch for ongoing development- Feature branches:
feature/your-feature-namebranched fromdev - Hotfix branches:
hotfix/issue-descriptionfor urgent fixes
-
Fork and Clone
git clone https://github.com/USERNAME/FYP-Predictive_Anomaly_Detection.git cd FYP-Predictive_Anomaly_Detection -
Set Up Development Environment
# Install dependencies poetry install # Activate virtual environment poetry shell # Install pre-commit hooks pre-commit install # Initialize DVC dvc init --no-scm
-
Create Feature Branch
git checkout dev git pull origin dev git checkout -b feature/your-feature-name
-
Make Changes
- Write code following our coding standards
- Add tests for new functionality
- Update documentation as needed
- Ensure all checks pass locally
-
Local Testing
# Run pre-commit checks pre-commit run --all-files # Run tests pytest tests/ -v # Check DVC pipeline dvc repro --force
-
Submit Pull Request
- Push feature branch to your fork
- Create PR against
devbranch - Fill out PR template completely
- Address any review feedback
We follow Conventional Commits specification:
<type>[optional scope]: <description>
[optional body]
[optional footer(s)]
- feat: New feature for the user
- fix: Bug fix for the user
- docs: Changes to documentation
- style: Formatting, missing semicolons, etc.; no code change
- refactor: Refactoring production code
- test: Adding missing tests, refactoring tests; no production code change
- build: Changes that affect the build system or external dependencies
- ci: Changes to CI configuration files and scripts
- perf: Performance improvements
- revert: Reverting a previous commit
feat(selfplay): add EV charging spike scenario generator
fix(data): handle missing values in UK-DALE preprocessing
docs(api): update docstrings for forecasting models
test(verifier): add physics constraint validation tests
ci: update Python version to 3.11 in GitHub Actionsfeat(models): implement PatchTST for energy forecasting
- Add patch-based transformer architecture
- Include energy-specific modifications for seasonal patterns
- Support uncertainty quantification through quantile heads
- Add comprehensive unit tests for model components
Closes #123We use Black for code formatting and Ruff for linting:
# Format code
black src/ tests/
# Check linting
ruff check src/ tests/
# Fix auto-fixable issues
ruff check --fix src/ tests/-
Type Hints: All functions must include type hints
def process_household_data( data: pd.DataFrame, resolution: str = "30min" ) -> pd.DataFrame: """Process household energy consumption data.""" pass
-
Docstrings: All public functions and classes must have docstrings
def train_forecasting_model( data: np.ndarray, config: ModelConfig ) -> ForecastingModel: """Train a forecasting model on household energy data. Args: data: Training data with shape (n_samples, n_features) config: Model configuration including hyperparameters Returns: Trained forecasting model ready for prediction Raises: ValueError: If data is empty or config is invalid """ pass
-
Error Handling: Use specific exception types and helpful messages
if data.empty: raise ValueError("Cannot train on empty dataset")
- Test Coverage: Maintain >80% test coverage for new code
- Test Types: Include unit tests, integration tests, and smoke tests
- Test Naming: Use descriptive test names explaining the scenario
def test_ev_spike_scenario_generation_with_winter_conditions():
"""Test EV charging spike generation during winter heating season."""
pass
def test_forecast_accuracy_on_holiday_periods():
"""Test model accuracy during holiday periods with atypical patterns."""
pass- Naming Convention: Follow experiment taxonomy in
docs/experiments.md - Required Artifacts: Log config, model, metrics, and plots
- Reproducibility: Set seeds and log all parameters
with mlflow.start_run(run_name="selfplay_patchtst_ukdale_v1_42"):
# Set reproducibility
mlflow.log_param("random_seed", 42)
# Log configuration
mlflow.log_dict(config.dict(), "config.yaml")
# Log model and metrics
mlflow.pytorch.log_model(model, "best_model")
mlflow.log_metrics(evaluation_metrics)# Run all tests
pytest tests/
# Run with coverage
pytest tests/ --cov=src/fyp --cov-report=html
# Run specific test types
pytest tests/ -m "unit" # Unit tests only
pytest tests/ -m "integration" # Integration tests only
pytest tests/ -m "not slow" # Skip slow tests
# Run tests in parallel
pytest tests/ -n auto-
Test Structure: Use AAA pattern (Arrange, Act, Assert)
def test_scenario_generator_produces_valid_spikes(): # Arrange generator = EVSpikeGenerator(power_range=(3.5, 7.0)) baseline_data = create_sample_household_data() # Act scenario = generator.generate_scenario(baseline_data) # Assert assert scenario.peak_power >= 3.5 assert scenario.peak_power <= 7.0 assert scenario.duration > 0
-
Fixtures: Use pytest fixtures for common test data
@pytest.fixture def sample_household_data(): """Provide sample household consumption data for testing.""" return pd.DataFrame({ 'timestamp': pd.date_range('2023-01-01', periods=48, freq='30min'), 'consumption': np.random.normal(1.5, 0.5, 48) })
-
Parameterized Tests: Test multiple scenarios efficiently
@pytest.mark.parametrize("season,expected_multiplier", [ ("winter", 1.2), ("summer", 0.8), ("spring", 1.0), ("autumn", 1.0) ]) def test_seasonal_adjustments(season, expected_multiplier): """Test seasonal consumption adjustments.""" pass
Use Google-style docstrings:
def aggregate_households_to_feeder(
household_forecasts: List[np.ndarray],
diversity_factors: Dict[int, float],
transformer_capacity: float = 500.0
) -> np.ndarray:
"""Aggregate household forecasts into realistic distribution feeder load.
Applies diversity factors and transformer constraints to simulate
realistic distribution network loading from household consumption.
Args:
household_forecasts: List of individual household load forecasts
diversity_factors: Mapping from household count to diversity factor
transformer_capacity: Maximum transformer capacity in kVA
Returns:
Aggregated feeder load profile respecting network constraints
Raises:
ValueError: If household_forecasts is empty
OverloadError: If aggregated load exceeds transformer capacity
Example:
>>> forecasts = [np.random.rand(48) for _ in range(50)]
>>> diversity = {50: 0.6} # 50 households have 60% diversity
>>> feeder_load = aggregate_households_to_feeder(forecasts, diversity)
>>> assert len(feeder_load) == 48
"""
passWhen adding significant features:
- Update the main README.md if user-facing
- Add examples to relevant documentation files
- Update the project roadmap if applicable
<type>[scope]: <description>
Examples:
feat(selfplay): implement proposer scenario generationfix(data): resolve UK-DALE missing value handlingdocs(api): add comprehensive model usage examples
## Summary
Brief description of changes and motivation.
## Type of Change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Documentation update
- [ ] Performance improvement
- [ ] Refactoring (no functional changes)
## Testing
- [ ] All existing tests pass
- [ ] New tests added for new functionality
- [ ] Manual testing completed
## Documentation
- [ ] Code changes are documented with docstrings
- [ ] README updated if needed
- [ ] API documentation updated if needed
## Checklist
- [ ] Code follows project style guidelines
- [ ] Self-review completed
- [ ] No merge conflicts
- [ ] All CI checks pass
- [ ] Linked to relevant issues
## Screenshots/Examples
(If applicable, include screenshots or example outputs)# Full pipeline check (should complete quickly with placeholder stages)
dvc repro
# Check pipeline status
dvc status
# Visualize pipeline
dvc dag# Run all pre-commit hooks
pre-commit run --all-files
# Run specific hooks
pre-commit run black
pre-commit run ruff
pre-commit run mypyFor local development with actual data:
# Add sample datasets (small subsets for development)
dvc add data/raw/ukdale_sample/
dvc add data/raw/lcl_sample/
# Configure local DVC remote (optional)
dvc remote add local /path/to/local/storage
dvc push- Profile code with
cProfilefor performance bottlenecks - Use vectorized operations (NumPy, Polars) instead of loops
- Consider memory usage for large datasets
- Implement batch processing for large-scale experiments
- Log metrics incrementally during training (not all at once)
- Use MLflow's automatic logging when possible
- Clean up old experiment runs periodically
- Use tags for efficient filtering and organization
- Issues: Open a GitHub issue for bugs or feature requests
- Discussions: Use GitHub Discussions for questions and ideas
- Documentation: Check
docs/directory for detailed information - Code Review: Request reviews from maintainers for significant changes
Contributors are recognized in:
- GitHub contributor graphs
- Release notes for significant contributions
- Project documentation acknowledgments
- Academic citations for research contributions
Thank you for contributing to advancing energy forecasting research! 🔋⚡