Skip to content

Commit b85ff03

Browse files
authored
feat(scripts): Add dependency version scanner tool (#16867)
This adds a utility with the ability to scan for common references to dependencies (Python runtimes and package dependencies) to facilitate updating code when runtimes and dependencies change. * It can be run against an entire repo OR against specific packages within a monorepo * It is customizable with [regex patterns and examples here](https://github.com/googleapis/google-cloud-python/pull/16867/changes#diff-d17423afcf0604a287af7c6a590da37df7105674bbb715248d4f0c53545f3ea9) * The test suite checks each regex against the examples to ensure the efficacy of the patterns * The current patterns account for edge cases such as finding `< 3.8` when searching for references to `3.7` since they are semantically equivalent even if syntactically different. * The scanner produces a CSV report with: ``` path/filename, package name, line number, matching pattern, full line for context, etc. ```
1 parent c7b49a9 commit b85ff03

14 files changed

Lines changed: 1837 additions & 0 deletions

File tree

scripts/version_scanner/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
.conductor/
2+
scanner_report.csv
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Directories and files to ignore by the version scanner
2+
.git
3+
__pycache__
4+
.tox
5+
.nox
6+
venv
7+
.venv
8+
.conductor
9+
version_scanner
10+
docs
11+
samples
12+
changelog.md
13+
.librarian
14+
goldens
15+
# Ignore pandoc references in repositories.bzl
16+
repositories.bzl
17+
18+
# Ignore binary media files
19+
*.jpg
20+
*.png
21+
*.gif
22+
*.ico

scripts/version_scanner/README.md

Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
# Automated Dependency Version Scanner
2+
3+
This tool scans the repository for hardcoded references to specific dependency versions (like Python 3.7) that need to be upgraded or removed.
4+
5+
## Usage
6+
7+
Run the script from the repository root:
8+
9+
```bash
10+
python3 scripts/version_scanner/version_scanner.py -d <dependency> -v <version> [options]
11+
```
12+
13+
### Options
14+
15+
* `-d`, `--dependency`: Name of the dependency (e.g., python, protobuf)
16+
* `-v`, `--version`: Specific version to search for (e.g., 3.7, 4.25.8)
17+
* `-p`, `--path`: Root directory to scan (defaults to current directory)
18+
* `--package`: Specific subdirectory filter (useful for monorepos)
19+
* `--package-file`: Path to a file containing a list of package directories to scan (e.g., `scripts/version_scanner/small_package_list.txt`)
20+
* `--config`: Path to the regex configuration file (defaults to scripts/version_scanner/regex_config.yaml)
21+
* `-o`, `--output`: Path to the output CSV file (defaults to <dependency>-<version>-<timestamp>.csv)
22+
* `--github-repo`: GitHub repository URL base (defaults to https://github.com/googleapis/google-cloud-python)
23+
* `--branch`: GitHub branch for links (defaults to main)
24+
25+
## Installation & Setup
26+
27+
By default, the core scanner only depends on Python's standard library and **`pyyaml`** to read the configuration file.
28+
29+
If you want to use the Google Drive upload feature (`--upload`), you must install the optional Google API client dependencies:
30+
```bash
31+
pip install -r scripts/version_scanner/requirements.txt
32+
```
33+
34+
## Scope: Handwritten vs. Generated Code
35+
36+
> [!NOTE]
37+
> **This scanner is primarily intended for auditing handwritten code, configuration files, CI scripts, and documentation.**
38+
> You do **not** need to scan or manually edit auto-generated GAPIC libraries. Any dependency updates for generated code are handled upstream by editing the generator templates in the `gapic-generator-python` repository. When the templates are updated, the changes naturally trickle downstream to correct all generated client libraries upon the next regeneration.
39+
40+
## Limitations
41+
42+
* **Single-Line Matching Only**: The scanner processes files line-by-line to ensure high performance and simplicity. Consequently, version declarations or dependency lists that span across multiple lines (such as multiline lists in a `setup.py` file) will not be caught by the regex patterns.
43+
44+
## Configuration
45+
46+
The scanner uses a YAML configuration file (`regex_config.yaml`) to define rules and regex patterns.
47+
48+
## Ignoring Directories
49+
50+
You can create a `.scannerignore` file in the directory you are scanning (usually the repo root) to list directories to skip, one per line.
51+
52+
## Known Issues & Future Investigations
53+
- **Binary Ignores in `.scannerignore`**: Recursive wildcard ignores (e.g., `*.jpg`) currently do not effectively ignore deeply nested binary files. The scanner logic should be investigated to support robust globbing or full-path suffix matching.
54+
55+
---
56+
57+
## Universal Prompt for EOL Runtime & Dependency Migration
58+
59+
### Context & Overview
60+
61+
#### Overview
62+
This prompt is provided as an example and outlines the approach to update Python packages to drop support for end-of-life Python runtimes (3.7, 3.8, 3.9) OR for deprecated dependencies, and ensure the packages are configured for modern Python. This may help speed up your ability to resolve version mismatches. This prompt is provided with no guarantees, your mileage may vary. LLMs may make mistakes, always double check the LLM's work and test thoroughly.
63+
64+
#### High-Level Strategy
65+
- **One Branch Per Package**: To keep PRs manageable and isolated, we suggest a dedicated worktree and branch for each package (e.g., `feat/drop-<dependency>-<version>-<package-name>` i.e. `feat/drop-protobuf-4.25.8-google-cloud-bigquery`).
66+
- **Small & Reversible Commits**: Group changes into logical commits (Metadata, Nox, Docs, Cleanup, Tests) following Conventional Commits.
67+
68+
---
69+
70+
### Per-Package Workflow
71+
72+
Follow these steps for each package in the target list. Context and warnings are provided inline before the steps where they apply.
73+
74+
#### Step 1: Sync & Branch
75+
1. Ensure `main` branch is up to date.
76+
2. Create the feature branch: `git checkout -b feat/drop-<dependency>-<version>-<package-name>`.
77+
78+
#### Step 2: Scan (Baseline)
79+
1. Run the `version_scanner` for the package to get a list of all occurrences of the dependency and version.
80+
> [!TIP]
81+
> Use `# version-scanner: ignore` or `ignore-next-line` in code to silence true false-positives and maintain clean reports.
82+
83+
---
84+
85+
#### 💡 Context for Step 3: Standards & Cleanup
86+
*Before applying changes, review these standards to ensure consistency:*
87+
88+
##### Runtime Version Checks
89+
- **Standard**: Use `sys.version_info < (X, Y)`.
90+
- **Rationale**: Python compares tuples lexicographically, making this robust.
91+
- **Avoid**: `sys.version_info.minor < Y` or string conversions.
92+
93+
##### Pytest Skips
94+
- **Standard**: `@pytest.mark.skipif(sys.version_info < (X, Y), reason="Requires Python X.Y+")`.
95+
- **Avoid**: String-based conditions like `@pytest.mark.skipif("sys.version_info < ...")`.
96+
97+
##### Noxfile Version Matches
98+
- **Standard**: `session.python == "X.Y"` (Nox uses strings).
99+
- **Avoid**: `float(session.python) < X.Y` (fails for `3.10`).
100+
101+
##### Cleanup Rules
102+
- **Polyfills**: Remove dead `try/except` blocks guarding polyfills for features now standard in 3.10+.
103+
- **Obsolete Skips**: Remove pytest skips for features now universally available.
104+
105+
##### Dependency Specific rules
106+
- Use idiomatic python references to detect dependency versions and to compare against the target version.
107+
108+
---
109+
110+
#### 💡 Context for Step 3: Disposition Rules
111+
*Every reference to the dependency version found by the scanner must be dispositioned in one of these ways:*
112+
113+
1. **Update**: Update the reference if still necessary (e.g., changing `3.9` to `3.10` in support files).
114+
2. **Delete**: Delete if no longer relevant (dead code, obsolete comments).
115+
3. **Pragma Ignore**: Use `# version-scanner: ignore` or `# version-scanner: ignore-next-line` but ONLY for immutable historical facts or true false positives. Do NOT use for things that might change in future upgrades.
116+
117+
#### Step 3: Apply Changes
118+
1. Update `setup.py` or `pyproject.toml` metadata and `requires-python`.
119+
2. Update `noxfile.py` to remove old versions from sessions.
120+
3. Update `README.rst` and `CONTRIBUTING.rst` documentation.
121+
4. Remove compatibility code and skips based on the standards above.
122+
5. **Sync Documentation**: If the package has a `docs` folder containing a `README.rst`, copy the updated top-level `README.rst` to overwrite it (unless it is a symlink).
123+
6. Continue with the update process until all rows from the scan have been properly dispositioned.
124+
125+
---
126+
127+
#### Step 4: Verify (Post-Scan)
128+
1. Run the `version_scanner` again. The result should be 0 matches (or only valid ignores).
129+
130+
---
131+
132+
#### 💡 Context for Step 5: Constraints & Conflicts
133+
*Review these lessons learned when dealing with constraints:*
134+
135+
- **Lowest Runtime Constraints**: The file for the lowest accepted runtime (e.g., `constraints-3.10.txt`) must have pins matching the lowest acceptable versions in `setup.py` or `pyproject.toml`.
136+
- **Philosophy on Warnings**: Do not simply block warnings (like `six` or `pkg_resources`) to make tests pass. **Bump the lower bounds** of dependencies to versions that don't trigger warnings on the current lowest acceptable runtime. This protects customers who use strict warning filters.
137+
- **SQLAlchemy Transition**: For libraries supporting both 1.4 and 2.0, use `SQLALCHEMY_SILENCE_UBER_WARNING=1` in specific legacy Nox sessions rather than silencing globally.
138+
139+
---
140+
141+
#### Step 5: Local Test
142+
1. Run unit tests using Nox (e.g., `nox -s unit`).
143+
> [!TIP]
144+
> Use `nox -s unit-3.10` to save time when debugging specific runtime failures.
145+
2. Run `blacken` and `lint` sessions.
146+
147+
#### Step 6: Push & PR
148+
1. Push the branch and create the PR using the template in the Appendix.
149+
150+
---
151+
152+
## Appendix
153+
154+
### PR Template [^1]
155+
```text
156+
This PR updates `<dependency>` to establish version x.y.z as the minimum supported version.
157+
158+
### Changes
159+
* Configuration: Updated `setup.py` and `noxfile.py` to require <dependency> <version> and remove references to older versions.
160+
* Cleanup: Removed dead code and polyfills no longer needed.
161+
162+
Fixes internal issue: http://b/482126936 🦕
163+
```
164+
165+
---
166+
167+
## Candidates for `.conductor` or `gemini.md`
168+
169+
*The following guidelines are universal for AI assistants workin' in this repo and should be moved to `.conductor` files or Gemini memories:*
170+
171+
1. **AI & LLM Guidelines for Verification**:
172+
- Use Git Worktrees to scan branches without switching.
173+
- Run scanner from main branch pointing to worktree.
174+
- Bypass env artifacts by worktree only checking out tracked files.
175+
2. **Automated Bisection**:
176+
- Use `version_bisector.py` to find lowest workable versions.
177+
- Abort tests early as soon as collection succeeds to save time.
178+
179+
[^1]: Adapted from the standard PR template used in this repository.
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Copyright 2026 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# https://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import argparse
16+
import os
17+
import random
18+
import subprocess
19+
import sys
20+
import tempfile
21+
import time
22+
from typing import List, Dict
23+
24+
def get_package_subset(packages_dir: str, count: int) -> List[str]:
25+
"""
26+
Get a randomized subset of package names from the specified directory.
27+
28+
Args:
29+
packages_dir: Path to the directory containing packages.
30+
count: Number of packages to return.
31+
32+
Returns:
33+
A list of package directory names.
34+
"""
35+
all_packages = [d for d in os.listdir(packages_dir) if os.path.isdir(os.path.join(packages_dir, d))]
36+
37+
if count >= len(all_packages):
38+
return all_packages
39+
40+
return random.sample(all_packages, count)
41+
42+
def run_benchmark(
43+
scanner_path: str,
44+
root_path: str,
45+
package_file: str,
46+
dependency: str,
47+
version: str
48+
) -> float:
49+
"""
50+
Run the scanner and return the duration in seconds.
51+
"""
52+
cmd = [
53+
"python3", scanner_path,
54+
"-d", dependency,
55+
"-v", version,
56+
"-p", root_path,
57+
"--package-file", package_file
58+
]
59+
60+
start_time = time.perf_counter()
61+
62+
try:
63+
result = subprocess.run(cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
64+
except subprocess.CalledProcessError as e:
65+
print(f"Error running benchmark: {e}")
66+
return -1.0
67+
68+
duration = time.perf_counter() - start_time
69+
return duration
70+
71+
def run_benchmarks(
72+
scanner_path: str,
73+
root_path: str,
74+
packages_dir: str,
75+
counts: List[int],
76+
dependency: str,
77+
version: str
78+
) -> Dict[int, float]:
79+
"""Runs benchmarks for specified counts and returns a dict of results."""
80+
results = {}
81+
82+
for count in counts:
83+
subset = get_package_subset(packages_dir, count)
84+
print(f" Testing {len(subset)} packages (e.g., {subset[:3]}...)")
85+
86+
# Create temp package file
87+
with tempfile.NamedTemporaryFile(mode='w', delete=False) as f:
88+
for pkg in subset:
89+
f.write(f"packages/{pkg}\n")
90+
pkg_file = f.name
91+
92+
try:
93+
duration = run_benchmark(scanner_path, root_path, pkg_file, dependency, version)
94+
results[count] = duration
95+
finally:
96+
# Clean up
97+
if os.path.exists(pkg_file):
98+
os.remove(pkg_file)
99+
100+
return results
101+
102+
def main():
103+
parser = argparse.ArgumentParser(description="Benchmark the version scanner.")
104+
105+
parser.add_argument(
106+
"-s", "--scanner-path",
107+
default="version_scanner.py",
108+
help="Path to version_scanner.py"
109+
)
110+
111+
parser.add_argument(
112+
"-r", "--root-path",
113+
required=True,
114+
help="Path to the monorepo root directory"
115+
)
116+
117+
parser.add_argument(
118+
"-p", "--packages-dir",
119+
help="Path to packages directory (defaults to <root-path>/packages)"
120+
)
121+
122+
parser.add_argument(
123+
"-d", "--dependency",
124+
default="python",
125+
help="Dependency to search for"
126+
)
127+
128+
parser.add_argument(
129+
"-v", "--version",
130+
default="3.7",
131+
help="Version to search for"
132+
)
133+
134+
parser.add_argument(
135+
"-c", "--counts",
136+
default="1,10,50",
137+
help="Comma-separated list of package counts to test"
138+
)
139+
140+
args = parser.parse_args()
141+
142+
packages_dir = args.packages_dir or os.path.join(args.root_path, "packages")
143+
144+
if not os.path.exists(packages_dir):
145+
print(f"Error: Packages directory not found: {packages_dir}", file=sys.stderr)
146+
sys.exit(1)
147+
148+
counts = [int(c) for c in args.counts.split(',')]
149+
150+
all_packages = [d for d in os.listdir(packages_dir) if os.path.isdir(os.path.join(packages_dir, d))]
151+
152+
total_packages = len(all_packages)
153+
154+
print(f"Found {total_packages} packages in {packages_dir}")
155+
156+
# Filter counts that are greater than total packages
157+
counts = [c for c in counts if c <= total_packages]
158+
# Add total if not already there
159+
if total_packages not in counts:
160+
counts.append(total_packages)
161+
162+
print(f"Running benchmarks for counts: {counts}")
163+
164+
results = run_benchmarks(
165+
scanner_path=args.scanner_path,
166+
root_path=args.root_path,
167+
packages_dir=packages_dir,
168+
counts=counts,
169+
dependency=args.dependency,
170+
version=args.version
171+
)
172+
173+
print("\nBenchmark Results:")
174+
print(f"{'Packages':<10} | {'Time (seconds)':<15}")
175+
print("-" * 30)
176+
for count, duration in results.items():
177+
print(f"{count:<10} | {duration:<15.4f}")
178+
179+
if __name__ == "__main__":
180+
main()

0 commit comments

Comments
 (0)