Contributing to FYP Energy Forecasting

Thank you for your interest in contributing to the Self-Play Energy Forecasting & Anomaly Detection project! This document provides guidelines for contributing to the codebase.

Development Workflow

Branching Model

We use a Git flow branching model:

main: Production-ready code, tagged releases
dev: Integration branch for ongoing development
Feature branches: feature/your-feature-name branched from dev
Hotfix branches: hotfix/issue-description for urgent fixes

Contribution Process

Fork and Clone

git clone https://github.com/USERNAME/FYP-Predictive_Anomaly_Detection.git
cd FYP-Predictive_Anomaly_Detection

Set Up Development Environment

# Install dependencies
poetry install

# Activate virtual environment
poetry shell

# Install pre-commit hooks
pre-commit install

# Initialize DVC
dvc init --no-scm

Create Feature Branch

git checkout dev
git pull origin dev
git checkout -b feature/your-feature-name

Make Changes
- Write code following our coding standards
- Add tests for new functionality
- Update documentation as needed
- Ensure all checks pass locally

Local Testing

# Run pre-commit checks
pre-commit run --all-files

# Run tests
pytest tests/ -v

# Check DVC pipeline
dvc repro --force

Submit Pull Request
- Push feature branch to your fork
- Create PR against dev branch
- Fill out PR template completely
- Address any review feedback

Commit Message Convention

We follow Conventional Commits specification:

Format

<type>[optional scope]: <description>

[optional body]

[optional footer(s)]

Types

feat: New feature for the user
fix: Bug fix for the user
docs: Changes to documentation
style: Formatting, missing semicolons, etc.; no code change
refactor: Refactoring production code
test: Adding missing tests, refactoring tests; no production code change
build: Changes that affect the build system or external dependencies
ci: Changes to CI configuration files and scripts
perf: Performance improvements
revert: Reverting a previous commit

Examples

feat(selfplay): add EV charging spike scenario generator
fix(data): handle missing values in UK-DALE preprocessing
docs(api): update docstrings for forecasting models
test(verifier): add physics constraint validation tests
ci: update Python version to 3.11 in GitHub Actions

Multi-line Commits

feat(models): implement PatchTST for energy forecasting

- Add patch-based transformer architecture
- Include energy-specific modifications for seasonal patterns
- Support uncertainty quantification through quantile heads
- Add comprehensive unit tests for model components

Closes #123

Coding Standards

Python Code Style

We use Black for code formatting and Ruff for linting:

# Format code
black src/ tests/

# Check linting
ruff check src/ tests/

# Fix auto-fixable issues
ruff check --fix src/ tests/

Code Quality Requirements

Type Hints: All functions must include type hints

def process_household_data(
    data: pd.DataFrame,
    resolution: str = "30min"
) -> pd.DataFrame:
    """Process household energy consumption data."""
    pass

Docstrings: All public functions and classes must have docstrings

def train_forecasting_model(
    data: np.ndarray,
    config: ModelConfig
) -> ForecastingModel:
    """Train a forecasting model on household energy data.

    Args:
        data: Training data with shape (n_samples, n_features)
        config: Model configuration including hyperparameters

    Returns:
        Trained forecasting model ready for prediction

    Raises:
        ValueError: If data is empty or config is invalid
    """
    pass

Error Handling: Use specific exception types and helpful messages

if data.empty:
    raise ValueError("Cannot train on empty dataset")

Testing Standards

Test Coverage: Maintain >80% test coverage for new code
Test Types: Include unit tests, integration tests, and smoke tests
Test Naming: Use descriptive test names explaining the scenario

def test_ev_spike_scenario_generation_with_winter_conditions():
    """Test EV charging spike generation during winter heating season."""
    pass

def test_forecast_accuracy_on_holiday_periods():
    """Test model accuracy during holiday periods with atypical patterns."""
    pass

MLflow Experiment Standards

Naming Convention: Follow experiment taxonomy in docs/experiments.md
Required Artifacts: Log config, model, metrics, and plots
Reproducibility: Set seeds and log all parameters

with mlflow.start_run(run_name="selfplay_patchtst_ukdale_v1_42"):
    # Set reproducibility
    mlflow.log_param("random_seed", 42)

    # Log configuration
    mlflow.log_dict(config.dict(), "config.yaml")

    # Log model and metrics
    mlflow.pytorch.log_model(model, "best_model")
    mlflow.log_metrics(evaluation_metrics)

Testing Guidelines

Running Tests

# Run all tests
pytest tests/

# Run with coverage
pytest tests/ --cov=src/fyp --cov-report=html

# Run specific test types
pytest tests/ -m "unit"        # Unit tests only
pytest tests/ -m "integration" # Integration tests only
pytest tests/ -m "not slow"    # Skip slow tests

# Run tests in parallel
pytest tests/ -n auto

Writing Tests

Test Structure: Use AAA pattern (Arrange, Act, Assert)

def test_scenario_generator_produces_valid_spikes():
    # Arrange
    generator = EVSpikeGenerator(power_range=(3.5, 7.0))
    baseline_data = create_sample_household_data()

    # Act
    scenario = generator.generate_scenario(baseline_data)

    # Assert
    assert scenario.peak_power >= 3.5
    assert scenario.peak_power <= 7.0
    assert scenario.duration > 0

Fixtures: Use pytest fixtures for common test data

@pytest.fixture
def sample_household_data():
    """Provide sample household consumption data for testing."""
    return pd.DataFrame({
        'timestamp': pd.date_range('2023-01-01', periods=48, freq='30min'),
        'consumption': np.random.normal(1.5, 0.5, 48)
    })

Parameterized Tests: Test multiple scenarios efficiently

@pytest.mark.parametrize("season,expected_multiplier", [
    ("winter", 1.2),
    ("summer", 0.8),
    ("spring", 1.0),
    ("autumn", 1.0)
])
def test_seasonal_adjustments(season, expected_multiplier):
    """Test seasonal consumption adjustments."""
    pass

Documentation Standards

Docstring Format

Use Google-style docstrings:

def aggregate_households_to_feeder(
    household_forecasts: List[np.ndarray],
    diversity_factors: Dict[int, float],
    transformer_capacity: float = 500.0
) -> np.ndarray:
    """Aggregate household forecasts into realistic distribution feeder load.

    Applies diversity factors and transformer constraints to simulate
    realistic distribution network loading from household consumption.

    Args:
        household_forecasts: List of individual household load forecasts
        diversity_factors: Mapping from household count to diversity factor
        transformer_capacity: Maximum transformer capacity in kVA

    Returns:
        Aggregated feeder load profile respecting network constraints

    Raises:
        ValueError: If household_forecasts is empty
        OverloadError: If aggregated load exceeds transformer capacity

    Example:
        >>> forecasts = [np.random.rand(48) for _ in range(50)]
        >>> diversity = {50: 0.6}  # 50 households have 60% diversity
        >>> feeder_load = aggregate_households_to_feeder(forecasts, diversity)
        >>> assert len(feeder_load) == 48
    """
    pass

README Updates

When adding significant features:

Update the main README.md if user-facing
Add examples to relevant documentation files
Update the project roadmap if applicable

Pull Request Guidelines

PR Title Format

<type>[scope]: <description>

Examples:

feat(selfplay): implement proposer scenario generation
fix(data): resolve UK-DALE missing value handling
docs(api): add comprehensive model usage examples

PR Description Template

## Summary
Brief description of changes and motivation.

## Type of Change
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] Documentation update
- [ ] Performance improvement
- [ ] Refactoring (no functional changes)

## Testing
- [ ] All existing tests pass
- [ ] New tests added for new functionality
- [ ] Manual testing completed

## Documentation
- [ ] Code changes are documented with docstrings
- [ ] README updated if needed
- [ ] API documentation updated if needed

## Checklist
- [ ] Code follows project style guidelines
- [ ] Self-review completed
- [ ] No merge conflicts
- [ ] All CI checks pass
- [ ] Linked to relevant issues

## Screenshots/Examples
(If applicable, include screenshots or example outputs)

Local Development Pipeline

Running the Smoke Pipeline

# Full pipeline check (should complete quickly with placeholder stages)
dvc repro

# Check pipeline status
dvc status

# Visualize pipeline
dvc dag

Pre-commit Checks

# Run all pre-commit hooks
pre-commit run --all-files

# Run specific hooks
pre-commit run black
pre-commit run ruff
pre-commit run mypy

Development Database

For local development with actual data:

# Add sample datasets (small subsets for development)
dvc add data/raw/ukdale_sample/
dvc add data/raw/lcl_sample/

# Configure local DVC remote (optional)
dvc remote add local /path/to/local/storage
dvc push

Performance Considerations

Computational Efficiency

Profile code with cProfile for performance bottlenecks
Use vectorized operations (NumPy, Polars) instead of loops
Consider memory usage for large datasets
Implement batch processing for large-scale experiments

MLflow Best Practices

Log metrics incrementally during training (not all at once)
Use MLflow's automatic logging when possible
Clean up old experiment runs periodically
Use tags for efficient filtering and organization

Getting Help

Issues: Open a GitHub issue for bugs or feature requests
Discussions: Use GitHub Discussions for questions and ideas
Documentation: Check docs/ directory for detailed information
Code Review: Request reviews from maintainers for significant changes

Recognition

Contributors are recognized in:

GitHub contributor graphs
Release notes for significant contributions
Project documentation acknowledgments
Academic citations for research contributions

Thank you for contributing to advancing energy forecasting research! 🔋⚡

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributing to FYP Energy Forecasting

Development Workflow

Branching Model

Contribution Process

Commit Message Convention

Format

Types

Examples

Multi-line Commits

Coding Standards

Python Code Style

Code Quality Requirements

Testing Standards

MLflow Experiment Standards

Testing Guidelines

Running Tests

Writing Tests

Documentation Standards

Docstring Format

README Updates

Pull Request Guidelines

PR Title Format

PR Description Template

Local Development Pipeline

Running the Smoke Pipeline

Pre-commit Checks

Development Database

Performance Considerations

Computational Efficiency

MLflow Best Practices

Getting Help

Recognition

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History

CONTRIBUTING.md

File metadata and controls

Contributing to FYP Energy Forecasting

Development Workflow

Branching Model

Contribution Process

Commit Message Convention

Format

Types

Examples

Multi-line Commits

Coding Standards

Python Code Style

Code Quality Requirements

Testing Standards

MLflow Experiment Standards

Testing Guidelines

Running Tests

Writing Tests

Documentation Standards

Docstring Format

README Updates

Pull Request Guidelines

PR Title Format

PR Description Template

Local Development Pipeline

Running the Smoke Pipeline

Pre-commit Checks

Development Database

Performance Considerations

Computational Efficiency

MLflow Best Practices

Getting Help

Recognition