Welcome to the GitHub Organization Statistics Tool project! We're excited that you're interested in contributing. This guide will help you get started with contributing to our comprehensive tool for analyzing GitHub organization statistics.
- Project Overview
- Getting Started
- Development Setup
- Contributing Guidelines
- Testing
- Documentation
- Release Process
- Code of Conduct
- Getting Help
The GitHub Organization Statistics Tool is a comprehensive, open-source Python application that analyzes GitHub organization statistics including:
- Repository metrics (stars, forks, issues, languages)
- Contributor activity and insights
- Code quality metrics and dependency analysis
- Multi-organization support with GitHub Apps
- Advanced filtering and bot detection
- Multiple output formats (JSON, CSV, Excel)
- Multi-authentication: Supports both Personal Access Tokens and GitHub Apps
- Advanced Analytics: Dependency analysis, submodule detection, GitHub Actions integration
- Performance Optimized: Adaptive batch sizing, rate limit management, memory optimization
- Production Ready: Comprehensive error handling, logging, and testing
Before contributing, ensure you have:
- Python 3.7+ (we support Python 3.7-3.12)
- Git for version control
- pip package manager
- A GitHub account for testing and contributions
-
Fork the repository on GitHub
-
Clone your fork locally:
git clone https://github.com/YOUR-USERNAME/github-org-stats.git cd github-org-stats -
Add upstream remote:
git remote add upstream https://github.com/zoharbabin/github-org-stats.git
-
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install in development mode with all dependencies:
pip install -e .[dev]
This installs:
- Core dependencies: PyGithub, pandas, numpy, requests, PyJWT, tqdm, openpyxl, pytz
- Development tools: pytest, pytest-cov, black, flake8, mypy
Our development stack includes:
- pytest (≥6.0.0) - Testing framework
- pytest-cov (≥2.10.0) - Coverage reporting
- black (≥21.0.0) - Code formatting
- flake8 (≥3.8.0) - Linting
- mypy (≥0.800) - Type checking
Run the comprehensive test suite:
# Run all tests
python -m pytest tests/
# Run with coverage
python -m pytest tests/ --cov=github_org_stats --cov-report=html
# Run specific test categories
python tests/test_github_org_stats.py --category auth
python tests/test_github_org_stats.py --category data
python tests/test_github_org_stats.py --category excelWe maintain high code quality standards:
# Format code with Black
black .
# Check linting with flake8
flake8 github_org_stats.py tests/
# Type checking with mypy
mypy github_org_stats.pyOur code style configuration is defined in pyproject.toml:
- Black: 88 character line length, Python 3.7+ target
- pytest: Configured for
tests/directory with verbose output - mypy: Strict type checking enabled
- Check existing issues first to avoid duplicates
- Use the bug report template when creating new issues
- Include detailed information:
- Python version and operating system
- Command used and expected vs actual behavior
- Complete error messages and stack traces
- Minimal reproduction steps
Example Bug Report:
**Bug Description**: Rate limit exceeded when analyzing large organizations
**Environment**:
- Python 3.9.0
- macOS 12.0
- github-org-stats v1.0.0
**Command Used**:
```bash
python github_org_stats.py --org large-org --token ghp_xxx --max-repos 1000
Error Message:
RateLimitExceededException: API rate limit exceeded
Expected: Should handle rate limits gracefully with retries Actual: Script crashes with unhandled exception
### How to Suggest Features
1. **Check the roadmap** and existing feature requests
2. **Open a feature request issue** with:
- Clear description of the proposed feature
- Use cases and benefits
- Potential implementation approach
- Any breaking changes or considerations
### Pull Request Process
1. **Create a feature branch**:
```bash
git checkout -b feature/amazing-feature
-
Make your changes following our coding standards
-
Add or update tests for your changes
-
Run the full test suite:
python -m pytest tests/ black . flake8 github_org_stats.py tests/ mypy github_org_stats.py -
Update documentation if needed
-
Commit your changes with clear messages:
git commit -m "feat: add dependency vulnerability scanning - Add support for scanning package.json, requirements.txt - Include vulnerability count in repository metrics - Add tests for vulnerability detection - Update documentation with new feature"
-
Push to your fork and create a pull request
-
Address review feedback promptly
- Follow PEP 8 with Black formatting (88 character line length)
- Use type hints for all function parameters and return values
- Write docstrings for all public functions and classes
- Keep functions focused and under 50 lines when possible
- Use meaningful variable names and avoid abbreviations
def get_repository_metrics(
repo: Repository,
days_back: int = 30,
exclude_bots: bool = True
) -> Dict[str, Any]:
"""
Collect comprehensive metrics for a GitHub repository.
Args:
repo: PyGithub Repository object
days_back: Number of days to analyze for activity metrics
exclude_bots: Whether to exclude bot accounts from statistics
Returns:
Dictionary containing repository metrics including stars, forks,
issues, contributors, and activity statistics
Raises:
GitHubException: If repository access fails
ValueError: If days_back is negative
"""
if days_back < 0:
raise ValueError("days_back must be non-negative")
metrics = {
'name': repo.name,
'stars': repo.stargazers_count,
'forks': repo.forks_count,
'issues': repo.open_issues_count,
}
# Add activity metrics
commit_stats = get_commit_stats(repo, days_back, exclude_bots)
metrics.update(commit_stats)
return metricsWe follow Conventional Commits:
- feat: New features
- fix: Bug fixes
- docs: Documentation changes
- style: Code style changes (formatting, etc.)
- refactor: Code refactoring
- test: Adding or updating tests
- chore: Maintenance tasks
Examples:
feat: add support for GitHub Enterprise Server
fix: handle rate limit exceeded gracefully
docs: update installation instructions for Windows
test: add integration tests for multi-org analysis
refactor: extract authentication logic into separate class
Our test suite is comprehensive and organized by functionality:
tests/
├── __init__.py
└── test_github_org_stats.py # Main test file with all test classes
- Authentication Tests: GitHub App and PAT authentication
- Bot Detection Tests: Bot account filtering and detection
- Data Processing Tests: Repository analysis and metrics collection
- Excel Output Tests: Data sanitization and Excel generation
- Error Handling Tests: Error tracking and recovery
- Configuration Tests: Argument parsing and config loading
- Integration Tests: End-to-end workflow testing
- Performance Tests: Scaling and optimization features
When adding new functionality:
- Add unit tests for individual functions
- Add integration tests for complete workflows
- Test error conditions and edge cases
- Mock external dependencies (GitHub API calls)
- Maintain test coverage above 80%
Example Test:
def test_get_repository_metrics_with_valid_repo(self):
"""Test repository metrics collection with valid repository."""
mock_repo = Mock()
mock_repo.name = "test-repo"
mock_repo.stargazers_count = 100
mock_repo.forks_count = 25
mock_repo.open_issues_count = 5
metrics = get_repository_metrics(mock_repo, days_back=30)
self.assertEqual(metrics['name'], "test-repo")
self.assertEqual(metrics['stars'], 100)
self.assertEqual(metrics['forks'], 25)
self.assertEqual(metrics['issues'], 5)# Run authentication tests only
python tests/test_github_org_stats.py --category auth
# Run with verbose output
python tests/test_github_org_stats.py --verbose
# Stop on first failure
python tests/test_github_org_stats.py --failfastWhen contributing, please update relevant documentation:
- README.md: For new features, installation changes, or usage updates
- CHANGELOG.md: For all user-facing changes
- Code comments: For complex logic or algorithms
- Docstrings: For all public functions and classes
- Use clear, concise language
- Include code examples for new features
- Update command-line help for new arguments
- Add troubleshooting sections for common issues
When adding new features, consider adding:
- Usage examples in the README
- Configuration examples in
config/ - Setup guides in
examples/
We follow Semantic Versioning:
- MAJOR: Breaking changes
- MINOR: New features (backward compatible)
- PATCH: Bug fixes (backward compatible)
- Update version in
pyproject.toml - Update CHANGELOG.md with release notes
- Create release branch:
release/v1.1.0 - Run full test suite and quality checks
- Create pull request to main branch
- Tag release after merge:
git tag v1.1.0 - Create GitHub release with release notes
- All tests pass
- Documentation updated
- CHANGELOG.md updated
- Version bumped in pyproject.toml
- No breaking changes without major version bump
- Examples and guides tested
We are committed to providing a welcoming and inclusive environment:
- Be respectful and considerate in all interactions
- Be collaborative and help others learn and grow
- Be patient with newcomers and different experience levels
- Be constructive in feedback and criticism
- Focus on what's best for the community and project
- Harassment, discrimination, or offensive comments
- Personal attacks or trolling
- Publishing private information without consent
- Spam or off-topic discussions
Project maintainers are responsible for clarifying standards and will take appropriate action for unacceptable behavior. This may include warnings, temporary bans, or permanent removal from the project.
Report unacceptable behavior to the project maintainers via:
- GitHub Issues (for public matters)
- Email to project maintainers (for private matters)
- GitHub Discussions: For general questions and community support
- GitHub Issues: For bug reports and feature requests
- README.md: For usage instructions and examples
- Code Comments: For implementation details
Q: How do I set up GitHub App authentication? A: See our comprehensive GitHub App Setup Guide
Q: How do I analyze multiple organizations?
A: Use the --installation-id parameter with multiple mappings: "org1:id1,org2:id2"
Q: Why is my analysis slow?
A: Try reducing --max-repos, using --exclude-bots, or switching to JSON output format
Q: How do I contribute a new output format? A: Add the format logic to the main script, update argument parsing, add tests, and update documentation
For troubleshooting contributions:
python github_org_stats.py \
--org test-org \
--token your-token \
--log-level DEBUG \
--log-file debug.log \
--max-repos 5- PyGithub Documentation - GitHub API wrapper
- pandas Documentation - Data manipulation
- pytest Documentation - Testing framework
- GitHub API Documentation - REST API reference
Thank you for contributing to the GitHub Organization Statistics Tool! Your contributions help make this tool better for everyone in the open-source community.
Questions? Don't hesitate to ask in GitHub Discussions or open an issue.
Happy Contributing! 🎉
This contributing guide is a living document. Please suggest improvements via pull requests or issues.