Skip to content

Latest commit

 

History

History
309 lines (241 loc) · 9.25 KB

File metadata and controls

309 lines (241 loc) · 9.25 KB

Contributing to ROUGE-Torch

Thank you for your interest in contributing to ROUGE-Torch! We welcome contributions of all kinds, from bug reports and feature requests to code improvements and documentation updates.

🤝 Ways to Contribute

  • 🐛 Bug Reports: Report issues you encounter
  • 💡 Feature Requests: Suggest new features or improvements
  • 📝 Documentation: Improve docs, examples, or tutorials
  • 🔧 Code: Fix bugs, implement features, or optimize performance
  • 🧪 Tests: Add or improve test coverage
  • 📊 Benchmarks: Performance improvements or comparisons

🚀 Getting Started

Development Setup

  1. Fork and Clone

    git clone https://github.com/your-username/rouge-torch.git
    cd rouge-torch
  2. Create Development Environment

    # Using conda (recommended)
    conda create -n rouge-torch python=3.8
    conda activate rouge-torch
    
    # Or using venv
    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install Dependencies

    # Install package in development mode with all dependencies
    pip install -e ".[dev,test]"
    
    # Or install basic dependencies
    pip install torch>=1.9.0 pytest>=6.0 black isort flake8 mypy
  4. Verify Installation

    python -c "import rouge_torch; print('Installation successful!')"
    python -m pytest test_rouge_torch.py -v

Development Workflow

  1. Create a Branch

    git checkout -b feature/your-feature-name
    # or
    git checkout -b bugfix/issue-description
  2. Make Changes

    • Write your code following our style guidelines (see below)
    • Add tests for new functionality
    • Update documentation as needed
  3. Run Tests and Checks

    # Run all tests
    python -m pytest test_rouge_torch.py -v
    
    # Run specific test
    python -m pytest test_rouge_torch.py::TestROUGEScoreTorch::test_overfit_convergence -v
    
    # Check code formatting
    black --check rouge_torch/ test_rouge_torch.py example.py
    isort --check-only rouge_torch/ test_rouge_torch.py example.py
    
    # Check code style
    flake8 rouge_torch/ test_rouge_torch.py example.py
    
    # Type checking
    mypy rouge_torch/
  4. Format Code (if needed)

    black rouge_torch/ test_rouge_torch.py example.py
    isort rouge_torch/ test_rouge_torch.py example.py
  5. Commit and Push

    git add .
    git commit -m "Add: brief description of changes"
    git push origin feature/your-feature-name
  6. Create Pull Request

    • Go to GitHub and create a pull request
    • Fill out the PR template with details about your changes
    • Link any related issues

📝 Code Style Guidelines

General Principles

  • Readability: Code should be self-documenting and easy to understand
  • Performance: Maintain the vectorized, GPU-optimized approach
  • Compatibility: Ensure compatibility with PyTorch 1.9+ and Python 3.8+
  • Testing: All new features must include comprehensive tests

Python Style

We follow PEP 8 with some modifications:

# Good: Clear, documented function
def compute_rouge_loss(
    self,
    candidate_logits: torch.Tensor,
    reference_logits: List[torch.Tensor],
    rouge_types: List[str] = ["rouge_1", "rouge_l"],
    weights: Optional[Dict[str, float]] = None,
    reduction: str = "mean",
) -> torch.Tensor:
    """
    Compute ROUGE-based loss for training.
    
    Args:
        candidate_logits: Model output logits (batch_size, seq_len, vocab_size)
        reference_logits: List of reference tensors
        rouge_types: ROUGE metrics to combine
        weights: Optional weights for different ROUGE types
        reduction: 'mean', 'sum', or 'none'
        
    Returns:
        Loss tensor with proper bounds [0, N] where N = len(rouge_types)
    """

Code Formatting

  • Line Length: 88 characters (Black default)
  • Imports: Use isort for consistent import ordering
  • Type Hints: Required for all public functions and methods
  • Docstrings: Use Google-style docstrings

PyTorch Specific

  • Device Agnostic: Always respect the device parameter
  • Vectorized Operations: Avoid Python loops in favor of tensor operations
  • Memory Efficient: Consider memory usage for large batches/sequences
  • Gradient-Safe: Ensure operations work correctly with autograd

🧪 Testing Guidelines

Test Categories

  1. Unit Tests: Test individual functions and methods
  2. Integration Tests: Test end-to-end workflows
  3. Performance Tests: Validate speed and memory usage
  4. Validation Tests: The overfit test that validates loss convergence

Writing Tests

def test_your_feature(self):
    """Test description explaining what this validates."""
    # Setup
    device = self.device
    rouge_scorer = self.rouge
    
    # Test data
    candidate_logits = torch.randn(2, 10, self.vocab_size, device=device)
    ref_logits = [torch.randn(2, 10, self.vocab_size, device=device)]
    
    # Execute
    result = rouge_scorer.your_method(candidate_logits, ref_logits)
    
    # Validate
    self.assertEqual(result.shape, (2,))
    self.assertTrue(torch.all(result >= 0))
    self.assertTrue(torch.all(result <= 1))

Critical Tests

  • Loss Bounds: Verify loss ∈ [0, N] for N ROUGE types
  • Perfect Match: loss = 0 when F1 = 1
  • No Match: loss = N when F1 = 0
  • Gradient Flow: Ensure backpropagation works correctly
  • Batch Consistency: Same results for batched vs individual processing

📋 Pull Request Guidelines

PR Title Format

  • Add: new feature description
  • Fix: bug description
  • Update: improvement description
  • Docs: documentation changes
  • Test: test improvements

PR Description Template

## Description
Brief description of changes made.

## Type of Change
- [ ] Bug fix (non-breaking change fixing an issue)
- [ ] New feature (non-breaking change adding functionality) 
- [ ] Breaking change (fix or feature causing existing functionality to change)
- [ ] Documentation update

## Testing
- [ ] All existing tests pass
- [ ] New tests added for new functionality
- [ ] Manual testing performed

## Performance Impact
- [ ] No performance impact
- [ ] Performance improved
- [ ] Performance regression (justify why)

## Checklist
- [ ] Code follows style guidelines
- [ ] Self-review completed
- [ ] Documentation updated
- [ ] Tests added/updated

🐛 Bug Reports

When reporting bugs, please include:

  1. Environment Information

    import torch
    import rouge_torch
    print(f"Python: {__import__('sys').version}")
    print(f"PyTorch: {torch.__version__}")
    print(f"ROUGE-Torch: {rouge_torch.__version__}")
    print(f"CUDA Available: {torch.cuda.is_available()}")
  2. Minimal Reproduction Example

    # Minimal code that reproduces the issue
    import torch
    from rouge_torch import ROUGEScoreTorch
    
    # Your reproduction code here
  3. Expected vs Actual Behavior

  4. Error Messages (full traceback)

  5. Additional Context

💡 Feature Requests

For feature requests, please provide:

  1. Problem Statement: What problem does this solve?
  2. Proposed Solution: How should it work?
  3. Alternative Solutions: Other approaches considered
  4. Use Cases: Real-world scenarios where this would be useful
  5. Performance Considerations: Impact on speed/memory

🔍 Code Review Process

What We Look For

  1. Correctness: Does the code work as intended?
  2. Performance: Maintains vectorized, GPU-optimized approach
  3. Testing: Adequate test coverage for new functionality
  4. Documentation: Clear docstrings and comments
  5. Style: Follows project conventions
  6. Backward Compatibility: Doesn't break existing APIs

Review Timeline

  • Initial review within 3-5 days
  • Follow-up reviews within 1-2 days
  • Merge after approval and passing CI

🏆 Recognition

Contributors will be:

  • Listed in CHANGELOG.md for their contributions
  • Acknowledged in release notes
  • Added to a CONTRIBUTORS.md file (coming soon)

📚 Development Resources

Understanding ROUGE Metrics

PyTorch Best Practices

Project-Specific Knowledge

❓ Getting Help

📄 License

By contributing to ROUGE-Torch, you agree that your contributions will be licensed under the same MIT License that covers the project.


Thank you for contributing to ROUGE-Torch! Your help makes this project better for everyone. 🎉