Skip to content

feat: GitLab MR support and per-user analysis pipeline#12

Open
urishalit wants to merge 5 commits intolemonade-hq:mainfrom
urishalit:gitlab-support-fix
Open

feat: GitLab MR support and per-user analysis pipeline#12
urishalit wants to merge 5 commits intolemonade-hq:mainfrom
urishalit:gitlab-support-fix

Conversation

@urishalit
Copy link
Copy Markdown

Summary

Adds end-to-end support for analyzing GitLab merge requests (including self-hosted instances) alongside existing GitHub PR support, plus a personal per-user analysis script kept outside the package. The CLI auto-detects the provider from the URL and routes to the appropriate API client.

What's new

GitLab merge request support (6ed2262, 1c71acf)

  • cli/gitlab.py — full GitLab API client: fetch_mr, fetch_mr_with_rotation, diff pagination, normalization to the GitHub-style file dict shape consumed by preprocess.py.
  • cli/main.pyanalyze-pr, batch-analyze, and label-pr now accept GitLab MR URLs. Provider is detected via parse_mr_url (cli/utils.py).
  • cli/config.pyget_gitlab_token / get_gitlab_tokens, with GL_TOKEN / GL_TOKENS aliases for consistency with the GitHub equivalents.
  • cli/errors.pyErrorHandler.handle_gitlab_404 and handle_gitlab_error mirror the GitHub handlers.
  • Domain warning when using a non-standard GitLab instance (token-leak safety).
  • Input validation on project_path and mr_iid.
  • Rate-limit handling honors Retry-After.
  • Labeling is gracefully skipped for GitLab MRs (GitLab labels are scoped to projects, not MRs in this codebase).

Self-hosted GitLab quirks

  • --no-verify-ssl flag on batch-analyze (and a SSL_NO_VERIFY env var, threaded through every httpx.Client via a new cli/utils.ssl_verify_enabled() helper).
  • GITLAB_DIFFS_PER_PAGE = 20 constant — git.datik.io returns HTTP 500 when /api/v4/projects/:id/merge_requests/:iid/diffs is requested with per_page > 20. Other endpoints keep per_page = 100.

Tests (ffca5b9)

  • tests/test_gitlab.py — headers, validation, diff parsing, pagination (single page / multi-page / empty page / full-then-empty), single-fetch optimization, token rotation, error handling.
  • tests/test_utils.pyparse_mr_url across GitHub / GitLab.com / self-hosted GitLab URL shapes.
  • tests/test_config.pyget_gitlab_token / get_gitlab_tokens precedence and GL_* aliases.
  • tests/test_errors.pyhandle_gitlab_404 / handle_gitlab_error.
  • tests/test_batch.py — mixed-provider batches, label skipping for GitLab URLs.

Personal analysis scripts kept out of the package (8a49719, b63abeb)

  • 8a49719 originally added analyze_user.sh, cli/list_user_prs.py, and cli/add_pr_dates.py as a per-user pipeline (list a user's PRs/MRs across GitHub + GitLab in parallel → batch-analyze → enrich with pr_date).
  • b63abeb then moves them to a gitignored scripts/ folder so they don't ship with the production CLI, and drops the corresponding list-user-prs / add-pr-dates typer subcommands and imports from cli/main.py. The relocated scripts use absolute imports against cli.* and load tokens from scripts/.env via python-dotenv.

Housekeeping in b63abeb

  • Fixes three tests/test_gitlab.py pagination cases that hung under GITLAB_DIFFS_PER_PAGE = 20 because the mocks were still sized for 100.
  • Drops unused imports flagged by ruff.
  • Applies black formatting drift across cli/ and tests/.

Production CLI surface (unchanged for existing users)

complexity-cli exposes the same four commands as before:

  • analyze-pr — now accepts GitLab MR URLs
  • batch-analyze — now accepts mixed-provider input files; gains --no-verify-ssl and --gitlab-token
  • rate-limit
  • label-pr — GitHub-only (GitLab MRs are skipped with a clear message in batch mode)

Test plan

  • pytest -q210 passed
  • ruff check — clean
  • black --check — clean
  • Single MR: complexity-cli analyze-pr <self-hosted-gitlab-MR-URL> --no-verify-ssl succeeds against git.datik.io.
  • Mixed batch: complexity-cli batch-analyze --input-file <urls.txt> --output out.csv analyzed 54 GitHub PRs end-to-end (54-row CSV with provider, complexity, explanation columns).
  • Per-user pipeline (gitignored): bash scripts/analyze_user.sh orengriffin 1 produced a complete orengriffin.csv with pr_url, pr_date, complexity, explanation.
  • Reviewer should sanity-check that no production code path imports anything from scripts/.

🤖 Generated with Claude Code

urishalit and others added 5 commits March 30, 2026 22:44
Add support for analyzing GitLab MRs alongside GitHub PRs. Auto-detects
the provider from the URL and routes to the appropriate API client.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add pagination to GitLab diffs endpoint using GITLAB_PER_PAGE
- Fetch diffs once per MR (was double-fetching via separate calls)
- Reuse httpx.Client across requests within fetch_mr
- Add domain warning for non-standard GitLab instances (token safety)
- Add input validation for project_path and mr_iid
- Add ErrorHandler.handle_gitlab_error/handle_gitlab_404
- Use Retry-After for rate limits, remove redundant time import
- Log warning for truncated GitLab diffs
- Fix batch labeling to use parse_mr_url (was crashing on GitLab URLs)
- Skip labeling for GitLab MRs with clear message
- Add GL_TOKEN/GL_TOKENS env var aliases for consistency
- Import regexes from utils.py instead of duplicating in main.py
- Remove unused get_gitlab_tokens import
- Pass gitlab_token in main() callback direct-URL path

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test_gitlab.py: headers, validation, diff parsing, pagination,
  single-fetch optimization, token rotation, error handling
- test_utils.py: parse_mr_url for GitHub/GitLab/self-hosted URLs,
  domain warning for non-standard instances
- test_config.py: get_gitlab_token/get_gitlab_tokens with GL_TOKEN aliases
- test_errors.py: handle_gitlab_404 and handle_gitlab_error
- test_batch.py: mixed GitHub/GitLab batch analysis, label skipping
  for GitLab URLs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- analyze_user.sh: end-to-end script that lists a user's PRs/MRs across
  GitHub + GitLab in parallel, runs batch-analyze, then enriches with
  creation dates.
- cli/list_user_prs.py: list PRs/MRs by author from GitHub
  (search/issues) or GitLab (/api/v4/merge_requests).
- cli/add_pr_dates.py: prepend pr_date column to a complexity CSV using
  a shared httpx.Client across worker threads.
- cli/utils.py: ssl_verify_enabled() helper reading SSL_NO_VERIFY env
  var, plumbed through to every httpx.Client in cli/gitlab.py.
- cli/main.py: --no-verify-ssl flag on batch-analyze.

Fix: cap per_page at 20 for GitLab /api/v4/.../merge_requests/:iid/diffs
(returns HTTP 500 above 20 on git.datik.io). New GITLAB_DIFFS_PER_PAGE
constant; other endpoints keep per_page=100.
The list-user-prs, add-pr-dates, and analyze_user.sh entry points were
personal scripts, not part of the production CLI. Move them to a local
scripts/ folder (gitignored) and drop the corresponding typer subcommands
and module imports from cli/main.py.

Also rolls in housekeeping that surfaced during convention alignment:
- Fix three tests/test_gitlab.py pagination cases that hung under the
  current GITLAB_DIFFS_PER_PAGE = 20 (mock data was sized for 100).
- Drop unused imports flagged by ruff in cli/ and tests/.
- Apply black formatting drift across cli/ and tests/.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants