userver-framework · Malevrovich · Jun 9, 2026 · Jun 9, 2026 · Jun 9, 2026 · Jun 9, 2026
diff --git a/.gitignore b/.gitignore
@@ -21,6 +21,7 @@ static-analyzer-report
 .settings*
 .clangd
 .vscode
+.changelog
 scripts/docs/en/components_schema
 scripts/docs/en/dynamic_configs
 scripts/docs/en/versions.md

diff --git a/scripts/changelog_tool/AGENTS.md b/scripts/changelog_tool/AGENTS.md
@@ -0,0 +1,103 @@
+# Changelog Tool
+
+This agent is responsible for running the changelog tool, which collects commit information and identifies external contributors.
+
+## Heuristics for LLM Analysis
+
+The tool uses heuristics to determine which commits should be sent to an LLM for changelog analysis:
+
+We calculate a `score_size` metric as `lines_added + lines_deleted` for each commit.
+
+The tool will NOT send commits to the LLM if they meet any of these criteria:
+1. Any file path contains "docs/" or "documentation", OR commit title contains documentation keywords
+2. Commit title contains fix/bug keywords AND the commit is small (score_size <= 20)
+3. All commits with score_size <= 20
+
+Documentation keywords: "doc", "docs", "documentation", "readme"
+Fix/bug keywords: "fix", "bugfix", "bug", "patch", "repair", "correct", "resolve"
+
+## Usage
+
+IMPORTANT: The changelog tool must always be run with the virtual environment activated:
+
+```bash
+# Always activate the virtual environment first
+source .vent/bin/activate
+
+# Run the tool
+./changelog-tool [command] [options]
+```
+
+## Commands
+
+### collect
+
+Collects commits from the specified range and classifies them using heuristics and LLM analysis.
+
+```bash
+./changelog-tool collect [options]
+```
+
+Options:
+- `--from-sha`: Starting commit SHA (overrides config)
+- `--to-sha`: Ending commit SHA (overrides config)
+- `--repo-path`: Path to the repository (overrides config)
+
+### review
+
+Generates a markdown report and an override YAML file for reviewing classified commits.
+
+```bash
+./changelog-tool review
+```
+
+The review command generates two files in the output directory:
+- `review_report.md`: A markdown report showing all commits, sorted by size, with their classification status, changelog lines, and analysis
+- `override.yaml`: A commented YAML file containing all commits that can be uncommented and modified to override classifications
+
+The report is divided into two sections:
+1. **Not in Changelog**: Commits that are not included in the changelog (either filtered by heuristics or marked as unclear)
+2. **In Changelog**: Commits that are included in the changelog
+
+Each commit in the report shows:
+- Commit hash with link to GitHub
+- Commit title
+- Status (✅ In Changelog, ❌ Not in Changelog, or ❓ Unclear)
+- Size (number of lines changed)
+- Changelog line (if available)
+- Analysis (if available)
+
+### report
+
+Generates a formatted Markdown changelog based on the review output and applies user overrides.
+
+```bash
+./changelog-tool report
+```
+
+The report command performs the following steps:
+1. Loads classified commits from `classified.json`
+2. Applies overrides from `override.yaml` (if present)
+3. Identifies commits marked for the changelog that lack changelog lines or analysis
+4. Runs these commits through the LLM with 1.5x increased prompt size and diff truncation enabled
+5. Generates a formatted Markdown changelog grouped by classification:
+   - Breaking Changes
+   - Features
+   - Optimizations
+   - Bug Fixes
+   - Refactoring
+   - Minor Changes
+   - Documentation
+6. Appends "Many thanks to [Name] for the PR!" for external contributors in the changelog
+7. Appends a section at the end for external contributors not included in the changelog
+8. Saves the generated changelog to `changelog.md` in the output directory
+
+## Output Directory
+
+By default, the tool outputs classified commits to `.changelog/preclassified.json`. You can customize this with the `--output-dir` global option:
+
+```bash
+# Run with custom output directory
+./changelog-tool --output-dir ./my-output-dir collect
+./changelog-tool --output-dir ./my-output-dir review
+```
diff --git a/scripts/changelog_tool/README.md b/scripts/changelog_tool/README.md
@@ -0,0 +1,174 @@
+# Changelog Tool
+
+A tool for automatically generating changelogs from git commits using LLM analysis.
+
+## Features
+
+- **Automatic commit classification**: Classifies commits into categories (feature, bug, optimization, refactor, minor, docs, unclear)
+- **LLM-powered analysis**: Uses LLM to analyze commits and generate changelog entries
+- **External contributor detection**: Identifies external contributors and generates acknowledgments
+- **Component extraction**: Extracts component names from commit titles for better organization
+- **Override support**: Allows manual override of classifications and changelog entries
+- **State persistence**: Saves LLM analysis results to avoid reprocessing
+- **Rate limiting**: Configurable rate limiting and concurrent request limits
+
+## Installation
+
+1. Ensure you have Python 3.8+ installed
+2. Install dependencies:
+```bash
+cd scripts/changelog_tool
+python3 -m venv .venv
+source .venv/bin/activate
+python3 -m pip3 install -r requirements.txt
+```
+
+3. Set up environment variables:
+```bash
+export CHANGELOG_LLM_URL="https://your-llm-api.com/v1"
+export CHANGELOG_LLM_API_KEY="your-api-key" 
+export CHANGELOG_LLM_MODEL="your-model-name"
+```
+
+## Configuration
+
+The tool is configured via `changelog.yaml`:
+
+```yaml
+collect:
+  from_sha: <commit-sha>  # Starting commit SHA
+  to_sha: HEAD           # Ending commit SHA (default: HEAD)
+  repo_path: ../..       # Path to the repository (default: ../..)
+  core_team_patterns:    # Patterns to identify core team members
+    - ".*@userver\\.tech"
+    - ".*@yandex-team\\.com"
+
+llm-config:
+  target_rps: 1                    # Target requests per second
+  retries: 3                       # Number of retry attempts
+  max_commits_per_batch: 10        # Maximum commits per LLM batch
+  max_user_prompt_length: 100000   # Maximum prompt length in characters
+  include_diff: true               # Include diff in LLM prompt
+  truncate_diff: false             # Truncate diff if too long
+  max_concurrent_requests: 5       # Maximum concurrent requests
+
+review:
+  github_url: "https://github.com/userver-framework/userver"
+
+report:
+  github_url: "https://github.com/userver-framework/userver"
+```
+
+## Usage
+
+### Step 1: Collect Commits
+
+Run the `collect` command to gather commits and analyze them:
+
+```bash
+source .venv/bin/activate
+./changelog-tool collect
+```
+
+The tool will:
+1. Fetch commits from the specified range
+2. Classify commits using heuristics
+3. Send unclear commits to LLM for analysis
+4. Save results to `.changelog/classified.json`
+
+**Important**: Run the `collect` command repeatedly until you see a message like:
+```
+Found 10 commits, 10 already processed, 0 to process via LLM
+```
+
+This ensures all commits have been processed by the LLM. The tool uses state persistence to avoid reprocessing commits, so running it multiple times is safe and recommended for reliability.
+
+### Step 2: Review and Override
+
+Run the `review` command to generate a review report:
+
+```bash
+./changelog-tool review
+```
+
+This generates two files in `.changelog/`:
+- `review_report.md`: A markdown report showing all commits with their classification status
+- `override.yaml`: A commented YAML file for overriding classifications
+
+Review the report and uncomment/modify entries in `override.yaml` to override classifications:
+
+```yaml
+# Example override.yaml
+commit_sha_1:
+  to_changelog: true
+  changelog_line: "Added support for async LLM processing"
+
+commit_sha_2:
+  to_changelog: false
+  classification: "minor"
+```
+
+Feel free to leave classification or changelog_line empty LLM will handle it on the next step.
+
+### Step 3: Generate Changelog
+
+Run the `report` command to generate the final changelog:
+
+```bash
+./changelog-tool report
+```
+
+This will:
+1. Load classified commits from `classified.json`
+2. Apply overrides from `override.yaml`
+3. Process commits needing LLM analysis with increased prompt size (1.5x) and diff truncation
+4. Generate a formatted Markdown changelog grouped by classification and component
+5. Save the changelog to `.changelog/changelog.md`
+
+**Important**: Run the `report` command repeatedly until you see a message like:
+```
+Found 10 commits, 10 already processed, 0 to process via LLM
+```
+
+This ensures all commits that need LLM analysis have been processed.
+
+## Output Format
+
+The generated changelog has the following structure:
+
+```markdown
+* Breaking Change
+  * component1
+    * changelog line 1 <!-- abc12345 -->
+    * changelog line 2 <!-- def67890 -->
+  * changelog line without component <!-- ghi12345 -->
+
+* Feature
+  * component1
+    * changelog line 3 <!-- jkl67890 -->
+  * changelog line without component <!-- mno12345 -->
+
+* Optimization
+  * component2
+    * changelog line 4 <!-- pqr12345 -->
+
+* Bug
+  * component1
+    * changelog line 5 <!-- stu67890 -->
+
+* Refactor
+  * component3
+    * changelog line 6 <!-- vwx12345 -->
+
+* Minor
+  * changelog line 7 <!-- yza67890 -->
+
+* Documentation
+  * changelog line 8 <!-- bcd12345 -->
+
+* Many thanks to:
+  * External Contributor 1 for commit title 1!
+  * External Contributor 2 for:
+    * commit title 2
+    * commit title 3
+```
diff --git a/scripts/changelog_tool/changelog-tool b/scripts/changelog_tool/changelog-tool
@@ -0,0 +1,53 @@
+#!/usr/bin/env python3
+import pathlib
+import click
+
+import changelog_tool.config as cfg
+import changelog_tool.collect.command as collect_cmd
+import changelog_tool.review.command as review_cmd
+import changelog_tool.report.command as report_cmd
+
+@click.group()
+@click.option('--config', default='changelog.yaml')
+@click.option('--output-dir', type=pathlib.Path, default=None)
+@click.pass_context
+def cli(ctx: click.Context, config: str, output_dir: pathlib.Path | None):
+    ctx.ensure_object(dict)
+    parsed_config = cfg.parse_config(pathlib.Path(config))
+    if output_dir:
+        parsed_config.collect.output_dir = output_dir
+        parsed_config.review.output_dir = output_dir
+        parsed_config.report.output_dir = output_dir
+    ctx.obj["CONFIG"] = parsed_config
+
+@cli.command()
+@click.option('--from-sha')
+@click.option('--to-sha')
+@click.option('--repo-path', type=pathlib.Path)
+@click.pass_context
+def collect(ctx: click.Context, from_sha: str | None, to_sha: str | None, repo_path: pathlib.Path | None):
+    # Get the config and override with CLI options if provided
+    config = ctx.obj["CONFIG"]
+    if from_sha:
+        config.collect.from_sha = from_sha
+    if to_sha:
+        config.collect.to_sha = to_sha
+    if repo_path:
+        config.collect.repo_path = repo_path
+
+    collect_cmd.collect(config)
+
+@cli.command()
+@click.pass_context
+def review(ctx: click.Context):
+    config = ctx.obj["CONFIG"]
+    review_cmd.review(config)
+
+@cli.command()
+@click.pass_context
+def report(ctx: click.Context):
+    config = ctx.obj["CONFIG"]
+    report_cmd.report(config)
+
+if __name__ == '__main__':
+    cli()
diff --git a/scripts/changelog_tool/changelog.yaml b/scripts/changelog_tool/changelog.yaml
@@ -0,0 +1,22 @@
+collect:
+  from_sha: da8642900398c33333e29e2bd3e91ca4e181f602
+  to_sha: HEAD
+  repo_path: ../..
+  core_team_patterns:
+    - ".*@userver\\.tech"
+    - ".*@yandex-team\\.com"
+
+llm-config:
+  target_rps: 1
+  retries: 7
+  max_commits_per_batch: 4
+  max_user_prompt_length: 100000
+  include_diff: true
+  truncate_diff: false
+  max_concurrent_requests: 2
+
+review:
+  github_url: "https://github.com/userver-framework/userver"
+
+report:
+  github_url: "https://github.com/userver-framework/userver"
diff --git a/scripts/changelog_tool/changelog_tool/__init__.py b/scripts/changelog_tool/changelog_tool/__init__.py
diff --git a/scripts/changelog_tool/changelog_tool/collect/__init__.py b/scripts/changelog_tool/changelog_tool/collect/__init__.py