Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ static-analyzer-report
.settings*
.clangd
.vscode
.changelog
scripts/docs/en/components_schema
scripts/docs/en/dynamic_configs
scripts/docs/en/versions.md
Expand Down
103 changes: 103 additions & 0 deletions scripts/changelog_tool/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Changelog Tool

This agent is responsible for running the changelog tool, which collects commit information and identifies external contributors.

## Heuristics for LLM Analysis

The tool uses heuristics to determine which commits should be sent to an LLM for changelog analysis:

We calculate a `score_size` metric as `lines_added + lines_deleted` for each commit.

The tool will NOT send commits to the LLM if they meet any of these criteria:
1. Any file path contains "docs/" or "documentation", OR commit title contains documentation keywords
2. Commit title contains fix/bug keywords AND the commit is small (score_size <= 20)
3. All commits with score_size <= 20

Documentation keywords: "doc", "docs", "documentation", "readme"
Fix/bug keywords: "fix", "bugfix", "bug", "patch", "repair", "correct", "resolve"

## Usage

IMPORTANT: The changelog tool must always be run with the virtual environment activated:

```bash
# Always activate the virtual environment first
source .vent/bin/activate

# Run the tool
./changelog-tool [command] [options]
```

## Commands

### collect

Collects commits from the specified range and classifies them using heuristics and LLM analysis.

```bash
./changelog-tool collect [options]
```

Options:
- `--from-sha`: Starting commit SHA (overrides config)
- `--to-sha`: Ending commit SHA (overrides config)
- `--repo-path`: Path to the repository (overrides config)

### review

Generates a markdown report and an override YAML file for reviewing classified commits.

```bash
./changelog-tool review
```

The review command generates two files in the output directory:
- `review_report.md`: A markdown report showing all commits, sorted by size, with their classification status, changelog lines, and analysis
- `override.yaml`: A commented YAML file containing all commits that can be uncommented and modified to override classifications

The report is divided into two sections:
1. **Not in Changelog**: Commits that are not included in the changelog (either filtered by heuristics or marked as unclear)
2. **In Changelog**: Commits that are included in the changelog

Each commit in the report shows:
- Commit hash with link to GitHub
- Commit title
- Status (✅ In Changelog, ❌ Not in Changelog, or ❓ Unclear)
- Size (number of lines changed)
- Changelog line (if available)
- Analysis (if available)

### report

Generates a formatted Markdown changelog based on the review output and applies user overrides.

```bash
./changelog-tool report
```

The report command performs the following steps:
1. Loads classified commits from `classified.json`
2. Applies overrides from `override.yaml` (if present)
3. Identifies commits marked for the changelog that lack changelog lines or analysis
4. Runs these commits through the LLM with 1.5x increased prompt size and diff truncation enabled
5. Generates a formatted Markdown changelog grouped by classification:
- Breaking Changes
- Features
- Optimizations
- Bug Fixes
- Refactoring
- Minor Changes
- Documentation
6. Appends "Many thanks to [Name] for the PR!" for external contributors in the changelog
7. Appends a section at the end for external contributors not included in the changelog
8. Saves the generated changelog to `changelog.md` in the output directory

## Output Directory

By default, the tool outputs classified commits to `.changelog/preclassified.json`. You can customize this with the `--output-dir` global option:

```bash
# Run with custom output directory
./changelog-tool --output-dir ./my-output-dir collect
./changelog-tool --output-dir ./my-output-dir review
```
174 changes: 174 additions & 0 deletions scripts/changelog_tool/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# Changelog Tool

A tool for automatically generating changelogs from git commits using LLM analysis.

## Features

- **Automatic commit classification**: Classifies commits into categories (feature, bug, optimization, refactor, minor, docs, unclear)
- **LLM-powered analysis**: Uses LLM to analyze commits and generate changelog entries
- **External contributor detection**: Identifies external contributors and generates acknowledgments
- **Component extraction**: Extracts component names from commit titles for better organization
- **Override support**: Allows manual override of classifications and changelog entries
- **State persistence**: Saves LLM analysis results to avoid reprocessing
- **Rate limiting**: Configurable rate limiting and concurrent request limits

## Installation

1. Ensure you have Python 3.8+ installed
2. Install dependencies:
```bash
cd scripts/changelog_tool
python3 -m venv .venv
source .venv/bin/activate
python3 -m pip3 install -r requirements.txt
```

3. Set up environment variables:
```bash
export CHANGELOG_LLM_URL="https://your-llm-api.com/v1"
export CHANGELOG_LLM_API_KEY="your-api-key"
export CHANGELOG_LLM_MODEL="your-model-name"
```

## Configuration

The tool is configured via `changelog.yaml`:

```yaml
collect:
from_sha: <commit-sha> # Starting commit SHA
to_sha: HEAD # Ending commit SHA (default: HEAD)
repo_path: ../.. # Path to the repository (default: ../..)
core_team_patterns: # Patterns to identify core team members
- ".*@userver\\.tech"
- ".*@yandex-team\\.com"

llm-config:
target_rps: 1 # Target requests per second
retries: 3 # Number of retry attempts
max_commits_per_batch: 10 # Maximum commits per LLM batch
max_user_prompt_length: 100000 # Maximum prompt length in characters
include_diff: true # Include diff in LLM prompt
truncate_diff: false # Truncate diff if too long
max_concurrent_requests: 5 # Maximum concurrent requests

review:
github_url: "https://github.com/userver-framework/userver"

report:
github_url: "https://github.com/userver-framework/userver"
```

## Usage

### Step 1: Collect Commits

Run the `collect` command to gather commits and analyze them:

```bash
source .venv/bin/activate
./changelog-tool collect
```

The tool will:
1. Fetch commits from the specified range
2. Classify commits using heuristics
3. Send unclear commits to LLM for analysis
4. Save results to `.changelog/classified.json`

**Important**: Run the `collect` command repeatedly until you see a message like:
```
Found 10 commits, 10 already processed, 0 to process via LLM
```

This ensures all commits have been processed by the LLM. The tool uses state persistence to avoid reprocessing commits, so running it multiple times is safe and recommended for reliability.

### Step 2: Review and Override

Run the `review` command to generate a review report:

```bash
./changelog-tool review
```

This generates two files in `.changelog/`:
- `review_report.md`: A markdown report showing all commits with their classification status
- `override.yaml`: A commented YAML file for overriding classifications

Review the report and uncomment/modify entries in `override.yaml` to override classifications:

```yaml
# Example override.yaml
commit_sha_1:
to_changelog: true
changelog_line: "Added support for async LLM processing"

commit_sha_2:
to_changelog: false
classification: "minor"
```

Feel free to leave classification or changelog_line empty LLM will handle it on the next step.

### Step 3: Generate Changelog

Run the `report` command to generate the final changelog:

```bash
./changelog-tool report
```

This will:
1. Load classified commits from `classified.json`
2. Apply overrides from `override.yaml`
3. Process commits needing LLM analysis with increased prompt size (1.5x) and diff truncation
4. Generate a formatted Markdown changelog grouped by classification and component
5. Save the changelog to `.changelog/changelog.md`

**Important**: Run the `report` command repeatedly until you see a message like:
```
Found 10 commits, 10 already processed, 0 to process via LLM
```

This ensures all commits that need LLM analysis have been processed.

## Output Format

The generated changelog has the following structure:

```markdown
* Breaking Change
* component1
* changelog line 1 <!-- abc12345 -->
* changelog line 2 <!-- def67890 -->
* changelog line without component <!-- ghi12345 -->

* Feature
* component1
* changelog line 3 <!-- jkl67890 -->
* changelog line without component <!-- mno12345 -->

* Optimization
* component2
* changelog line 4 <!-- pqr12345 -->

* Bug
* component1
* changelog line 5 <!-- stu67890 -->

* Refactor
* component3
* changelog line 6 <!-- vwx12345 -->

* Minor
* changelog line 7 <!-- yza67890 -->

* Documentation
* changelog line 8 <!-- bcd12345 -->

* Many thanks to:
* External Contributor 1 for commit title 1!
* External Contributor 2 for:
* commit title 2
* commit title 3
```
53 changes: 53 additions & 0 deletions scripts/changelog_tool/changelog-tool
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
#!/usr/bin/env python3
import pathlib
import click

import changelog_tool.config as cfg
import changelog_tool.collect.command as collect_cmd
import changelog_tool.review.command as review_cmd
import changelog_tool.report.command as report_cmd

@click.group()
@click.option('--config', default='changelog.yaml')
@click.option('--output-dir', type=pathlib.Path, default=None)
@click.pass_context
def cli(ctx: click.Context, config: str, output_dir: pathlib.Path | None):
ctx.ensure_object(dict)
parsed_config = cfg.parse_config(pathlib.Path(config))
if output_dir:
parsed_config.collect.output_dir = output_dir
parsed_config.review.output_dir = output_dir
parsed_config.report.output_dir = output_dir
ctx.obj["CONFIG"] = parsed_config

@cli.command()
@click.option('--from-sha')
@click.option('--to-sha')
@click.option('--repo-path', type=pathlib.Path)
@click.pass_context
def collect(ctx: click.Context, from_sha: str | None, to_sha: str | None, repo_path: pathlib.Path | None):
# Get the config and override with CLI options if provided
config = ctx.obj["CONFIG"]
if from_sha:
config.collect.from_sha = from_sha
if to_sha:
config.collect.to_sha = to_sha
if repo_path:
config.collect.repo_path = repo_path

collect_cmd.collect(config)

@cli.command()
@click.pass_context
def review(ctx: click.Context):
config = ctx.obj["CONFIG"]
review_cmd.review(config)

@cli.command()
@click.pass_context
def report(ctx: click.Context):
config = ctx.obj["CONFIG"]
report_cmd.report(config)

if __name__ == '__main__':
cli()
22 changes: 22 additions & 0 deletions scripts/changelog_tool/changelog.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
collect:
from_sha: da8642900398c33333e29e2bd3e91ca4e181f602
to_sha: HEAD
repo_path: ../..
core_team_patterns:
- ".*@userver\\.tech"
- ".*@yandex-team\\.com"

llm-config:
target_rps: 1
retries: 7
max_commits_per_batch: 4
max_user_prompt_length: 100000
include_diff: true
truncate_diff: false
max_concurrent_requests: 2

review:
github_url: "https://github.com/userver-framework/userver"

report:
github_url: "https://github.com/userver-framework/userver"
Empty file.
Empty file.
Loading
Loading