Skip to content
Draft
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
151 changes: 151 additions & 0 deletions docs/branch_report_generator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# Branch Report Generator

This feature allows you to generate Excel reports containing branch information from GitHub repositories.

## Features

The generated Excel report includes:
- **Branch Name**: Name of each branch in the repository
- **PR Status**: Status of associated pull requests (merged, open, closed, or none)
- **Created By**: Username of the person who created the branch
- **PR URL**: Link to the associated pull request (if any)
- **Protected**: Whether the branch is protected
- **Last Commit Date**: Date and time of the last commit on the branch

## Setup

### Prerequisites

1. Install required dependencies:
```bash
pip install -r requirements.txt
```

2. Set up GitHub authentication (required for accessing private repositories and higher rate limits):
```bash
export GITHUB_TOKEN='your_github_personal_access_token'
```

To create a GitHub personal access token:
- Go to GitHub Settings → Developer settings → Personal access tokens
- Generate a new token with `repo` scope for private repositories
- For public repositories only, the token is optional but recommended to avoid rate limits

## Usage

### Method 1: API Endpoint

Use the FastAPI endpoint to generate reports:

```bash
GET /api/v4/generate_branch_report?owner=<owner>&repo=<repo>
```

**Parameters:**
- `owner` (required): Repository owner (username or organization)
- `repo` (required): Repository name

**Example:**
```bash
curl -X GET "http://localhost:8000/api/v4/generate_branch_report?owner=microsoft&repo=Multi-Agent-Custom-Automation-Engine-Solution-Accelerator" \
-H "user_principal_id: your-user-id" \
-o branch_report.xlsx
```

### Method 2: Standalone Script

Run the standalone script directly:

```bash
cd src/backend
python generate_branch_report.py <owner> <repo> [output_file]
```

**Examples:**
```bash
# Generate report with default filename
python generate_branch_report.py microsoft Multi-Agent-Custom-Automation-Engine-Solution-Accelerator

# Generate report with custom filename
python generate_branch_report.py microsoft Multi-Agent-Custom-Automation-Engine-Solution-Accelerator my_report.xlsx
```

## Output Format

The Excel file contains:
- **Professional formatting** with styled headers
- **Color-coded rows** for easy reading
- **Auto-sized columns** for optimal viewing
- **Clickable PR URLs** (when applicable)

## Troubleshooting

### Rate Limiting

If you encounter rate limiting errors:
- Set the `GITHUB_TOKEN` environment variable with a valid token
- Authenticated requests have a limit of 5,000 requests per hour
- Unauthenticated requests are limited to 60 per hour

### Access Errors

If you cannot access a repository:
- Ensure the repository is public, or your token has access to private repositories
- Verify the owner and repository names are correct
- Check that your token has the necessary scopes (`repo` for private repos)

### No Data

If the report is empty:
- Verify the repository exists and has branches
- Check the console logs for specific error messages
- Ensure network connectivity to GitHub API

## API Rate Limits

GitHub API has the following rate limits:
- **Authenticated**: 5,000 requests per hour
- **Unauthenticated**: 60 requests per hour

Each repository typically requires:
- 1 request to fetch the repository
- 1 request per branch to get details
- 1 request per branch to check for pull requests

For large repositories, consider using authentication to avoid rate limits.

## Security Notes

- Never commit your `GITHUB_TOKEN` to version control
- Use environment variables or secure secret management
- Tokens should have minimal required permissions
- Regularly rotate your tokens for security

## Development

### Testing the Utility

You can test the GitHub Excel Generator utility:

```python
from common.utils.github_excel_generator import GitHubExcelGenerator

# Initialize with token
generator = GitHubExcelGenerator(github_token="your_token")

# Generate report
success = generator.generate_report("microsoft", "Multi-Agent-Custom-Automation-Engine-Solution-Accelerator", "output.xlsx")
```

### Extending the Report

To add more columns to the report, modify:
1. `github_excel_generator.py`: Update `get_branch_info()` to collect additional data
2. `github_excel_generator.py`: Update `generate_excel()` to include new columns in the spreadsheet

## Support

For issues or questions:
- Check the application logs for detailed error messages
- Review GitHub API status at https://www.githubstatus.com/
- Ensure all prerequisites are properly configured
200 changes: 200 additions & 0 deletions src/backend/common/utils/github_excel_generator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,200 @@
"""
GitHub Excel Generator - Utility to generate Excel reports from GitHub repository data.

This module provides functionality to fetch branch and pull request information from
a GitHub repository and generate an Excel spreadsheet with the data.
"""

import logging
import os
from datetime import datetime
from typing import List, Dict, Optional
from openpyxl import Workbook
from openpyxl.styles import Font, PatternFill, Alignment
from github import Github, GithubException

logger = logging.getLogger(__name__)


class GitHubExcelGenerator:
"""Generator for Excel reports containing GitHub repository branch and PR information."""

def __init__(self, github_token: Optional[str] = None):
"""
Initialize the GitHub Excel Generator.

Args:
github_token: GitHub personal access token. If not provided, will attempt
to read from GITHUB_TOKEN environment variable.
"""
self.token = github_token or os.getenv("GITHUB_TOKEN")
self.github_client = None
if self.token:
try:
self.github_client = Github(self.token)
logger.info("GitHub client initialized successfully")
except Exception as e:
logger.error(f"Failed to initialize GitHub client: {e}")
else:
logger.warning("No GitHub token provided. Some features may not work.")

def get_branch_info(self, owner: str, repo: str) -> List[Dict]:
"""
Fetch branch information from a GitHub repository.

Args:
owner: Repository owner (username or organization)
repo: Repository name

Returns:
List of dictionaries containing branch information
"""
if not self.github_client:
logger.error("GitHub client not initialized")
return []

branch_info = []
try:
repository = self.github_client.get_repo(f"{owner}/{repo}")
branches = repository.get_branches()

for branch in branches:
pr_status = "none"
pr_url = None

# Check for associated pull requests
try:
pulls = repository.get_pulls(state="all", head=f"{owner}:{branch.name}")
pull_list = list(pulls)

if pull_list:
# Get the most recent PR for this branch
latest_pr = pull_list[0]
if latest_pr.merged:
pr_status = "merged"
elif latest_pr.state == "open":
pr_status = "open"
elif latest_pr.state == "closed":
pr_status = "closed"
pr_url = latest_pr.html_url
except Exception as pr_error:
logger.debug(f"Error fetching PRs for branch {branch.name}: {pr_error}")

# Get branch creator from the first commit
created_by = "Unknown"
try:
commit = branch.commit
if commit and commit.author:
created_by = commit.author.login
elif commit and commit.commit and commit.commit.author:
created_by = commit.commit.author.name
except Exception as e:
logger.debug(f"Error getting creator for branch {branch.name}: {e}")

branch_info.append({
"branch_name": branch.name,
"pr_status": pr_status,
"created_by": created_by,
"pr_url": pr_url or "N/A",
"protected": branch.protected,
"last_commit_date": commit.commit.author.date.strftime("%Y-%m-%d %H:%M:%S") if commit and commit.commit and commit.commit.author else "N/A"
})

logger.info(f"Successfully fetched information for {len(branch_info)} branches")
return branch_info

except GithubException as e:
logger.error(f"GitHub API error: {e}")
return []
except Exception as e:
logger.error(f"Error fetching branch information: {e}")
return []

def generate_excel(self, branch_data: List[Dict], output_path: str) -> bool:
"""
Generate an Excel file from branch data.

Args:
branch_data: List of dictionaries containing branch information
output_path: Path where the Excel file should be saved

Returns:
True if successful, False otherwise
"""
try:
wb = Workbook()
ws = wb.active
ws.title = "Branch Information"

# Define headers
headers = [
"Branch Name",
"PR Status",
"Created By",
"PR URL",
"Protected",
"Last Commit Date"
]

# Style for headers
header_fill = PatternFill(start_color="366092", end_color="366092", fill_type="solid")
header_font = Font(bold=True, color="FFFFFF", size=12)
header_alignment = Alignment(horizontal="center", vertical="center")

# Write headers
for col_num, header in enumerate(headers, 1):
cell = ws.cell(row=1, column=col_num)
cell.value = header
cell.fill = header_fill
cell.font = header_font
cell.alignment = header_alignment

# Write data
for row_num, branch in enumerate(branch_data, 2):
ws.cell(row=row_num, column=1).value = branch.get("branch_name", "")
ws.cell(row=row_num, column=2).value = branch.get("pr_status", "none")
ws.cell(row=row_num, column=3).value = branch.get("created_by", "Unknown")
ws.cell(row=row_num, column=4).value = branch.get("pr_url", "N/A")
ws.cell(row=row_num, column=5).value = "Yes" if branch.get("protected") else "No"
ws.cell(row=row_num, column=6).value = branch.get("last_commit_date", "N/A")

# Apply alternating row colors for better readability
if row_num % 2 == 0:
fill = PatternFill(start_color="F2F2F2", end_color="F2F2F2", fill_type="solid")
for col in range(1, len(headers) + 1):
ws.cell(row=row_num, column=col).fill = fill

# Adjust column widths
column_widths = [25, 15, 25, 60, 12, 20]
for col_num, width in enumerate(column_widths, 1):
ws.column_dimensions[chr(64 + col_num)].width = width

# Save the workbook
wb.save(output_path)
logger.info(f"Excel file successfully created at {output_path}")
return True

except Exception as e:
logger.error(f"Error generating Excel file: {e}")
return False

def generate_report(self, owner: str, repo: str, output_path: str) -> bool:
"""
Generate a complete Excel report for a GitHub repository.

Args:
owner: Repository owner
repo: Repository name
output_path: Path where the Excel file should be saved

Returns:
True if successful, False otherwise
"""
logger.info(f"Generating report for {owner}/{repo}")
branch_data = self.get_branch_info(owner, repo)

if not branch_data:
logger.warning("No branch data found or error occurred")
return False

return self.generate_excel(branch_data, output_path)
Loading