Skip to content

feat: add ingestion, scoring, pipeline to drift-engine package (ADR-100 phase 3a-ingestion)#716

Open
mick-gsk wants to merge 1 commit into
mainfrom
split/576-708b-engine-ingestion-scoring
Open

feat: add ingestion, scoring, pipeline to drift-engine package (ADR-100 phase 3a-ingestion)#716
mick-gsk wants to merge 1 commit into
mainfrom
split/576-708b-engine-ingestion-scoring

Conversation

@mick-gsk
Copy link
Copy Markdown
Owner

@mick-gsk mick-gsk commented May 3, 2026

Split from #576 (ADR-100 monorepo migration). Part of the PR decomposition into atomic, reviewable units.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR introduces the core drift-engine package plumbing for ingestion, scoring, and end-to-end analysis orchestration as part of the ADR-100 package split.

Changes:

  • Adds the new scoring engine and pipeline phases for parsing, signal execution, scoring, and result assembly.
  • Adds ingestion utilities for file discovery, Python/TypeScript parsing, git history/blame, test detection, external report import, and GitHub API access.
  • Wires a new top-level analyzer entrypoint and adds smoke tests to verify package/module importability.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 30 comments.

Show a summary per file
File Description
packages/drift-engine/tests/test_drift_engine_smoke.py Adds import smoke tests for the new package modules.
packages/drift-engine/src/drift_engine/scoring/engine.py Implements composite scoring, calibration, gating, and path override helpers.
packages/drift-engine/src/drift_engine/scoring/init.py Re-exports scoring APIs from the new engine module.
packages/drift-engine/src/drift_engine/pipeline.py Adds the new ingestion/signal/scoring/result assembly pipeline.
packages/drift-engine/src/drift_engine/ingestion/ts_parser.py Adds TypeScript/TSX parsing and pattern extraction via tree-sitter.
packages/drift-engine/src/drift_engine/ingestion/test_detection.py Adds helpers to classify test/generated/production files.
packages/drift-engine/src/drift_engine/ingestion/github_api.py Adds a minimal stdlib GitHub REST client for calibration data.
packages/drift-engine/src/drift_engine/ingestion/git_history.py Adds git history parsing, AI attribution heuristics, and indexing.
packages/drift-engine/src/drift_engine/ingestion/git_blame.py Adds git blame parsing, caching, and parallel blame execution.
packages/drift-engine/src/drift_engine/ingestion/file_discovery.py Adds repo file discovery, language detection, and manifest caching.
packages/drift-engine/src/drift_engine/ingestion/external_report.py Adds JSON adapters for SonarQube, pylint, and CodeClimate reports.
packages/drift-engine/src/drift_engine/ingestion/ast_parser.py Adds Python AST parsing plus a minimal TS fallback parser.
packages/drift-engine/src/drift_engine/ingestion/init.py Re-exports ingestion entrypoints from the new package.
packages/drift-engine/src/drift_engine/analyzer.py Adds top-level repo/diff analysis orchestration using the new pipeline.

Comment on lines +1333 to +1340
# Apply per-path overrides (filter + re-weight) before scoring
if config.path_overrides:
all_findings = apply_path_overrides(
all_findings,
config.path_overrides,
effective_weights,
breadth_cap=breadth_cap,
)
Comment on lines +1372 to +1374
signal_scores = self._signal_score_fn(all_findings, **scoring_kwargs)
repo_score = self._repo_score_fn(signal_scores, effective_weights)
module_scores = self._module_score_fn(all_findings, effective_weights)
Comment on lines +341 to +351
"""Return (invalidator_type, invalidator_value) for cache lookup."""
invalidator_type = "git_head"
invalidator_value = _current_git_head(repo_path)
if invalidator_value is None:
invalidator_type = "mtime"
invalidator_value = _mtime_fingerprint(
repo_path,
include_patterns,
prepared_exclude,
supported,
)
Comment on lines +618 to +622
cached = _check_discovery_cache(
manifest, cache_key, invalidator_type, invalidator_value, skipped_out
)
if cached is not None:
return cached
Comment on lines +1027 to +1035
eligible: list[tuple[ParseResult, str, str]] = [] # (pr, posix, content_hash)
for pr in parsed.parse_results:
if callable(should_process) and not should_process(pr):
continue
p = pr.file_path.as_posix()
file_hash = parsed.file_hashes.get(p)
if not file_hash:
continue
eligible.append((pr, p, SignalCache.content_hash_for_file(file_hash)))
Comment on lines +189 to +197
# Re-weight if override provides custom weights
if override.weights is not None:
wd = override.weights.as_dict()
key = _SIGNAL_WEIGHT_KEYS.get(f.signal_type, f.signal_type)
w = wd.get(key, 0.1)
breadth = min(breadth_cap, 1 + math.log(1 + len(f.related_files)))
f.impact = round(w * f.score * breadth, 4)

kept.append(f)
Comment on lines +202 to +207
def compute_signal_scores(
findings: list[Finding],
*,
dampening_k: int = _DAMPENING_K,
min_findings: int = 0,
) -> dict[str, float]:
Comment on lines +220 to +222
Args:
dampening_k: count-dampening constant (default 10; small repos use 20).
min_findings: per-signal minimum finding count to score (below → 0).
Comment on lines +160 to +164
Uses tiered heuristics:
- Co-author tag from known AI tool → 0.95 confidence
- Tier 1 formulaic message (specific AI patterns) → 0.40 + boost
- Conventional commit in AI-tool repo → 0.20 + boost
- Tier 2 formulaic message (generic verb-noun) → 0.15 + boost
Comment on lines +191 to +196
# Tier 1.5: conventional-commit format — meaningful only with tool indicators
if indicator_boost > 0:
conv_match = _CONVENTIONAL_COMMIT_RE.match(msg_first_line)
if conv_match and not msg_body:
conf = min(0.40 + indicator_boost, 0.95)
return True, round(conf, 2)

def _serialize_commit(commit: CommitInfo) -> dict[str, object]:
hashed_coauthors = [
hashlib.sha256(coauthor.strip().lower().encode("utf-8")).hexdigest()
# Trend context compatibility wrappers (ADR-005)
# ---------------------------------------------------------------------------

_NOISE_FLOOR = NOISE_FLOOR
return parse_python_file(file_path, repo_path)

if language in ("typescript", "tsx", "javascript", "jsx"):
from drift_engine.ingestion.ts_parser import parse_typescript_file

if tree_sitter_available():
langs |= {"typescript", "tsx", "javascript", "jsx"}
except ImportError:
try:
ts = int(raw_line[12:])
date = datetime.date.fromtimestamp(ts)
except (ValueError, OSError):
files_changed.append(fpath)
total_ins += ins
total_del += dels
except ValueError:
"""
if not tree_sitter_available():
# Delegate to the regex stub in ast_parser
from drift_engine.ingestion.ast_parser import _parse_typescript_stub
value = int(env_override)
if value >= 1:
return value
except ValueError:


# Re-export for backwards compat; canonical implementation in models.py
_severity_for_score = severity_for_score
candidate = projected[key] + residue
if min_weight <= candidate <= max_weight:
projected[key] = candidate
residue = 0.0
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@mick-gsk mick-gsk closed this May 4, 2026
@mick-gsk mick-gsk deleted the split/576-708b-engine-ingestion-scoring branch May 4, 2026 05:09
@mick-gsk mick-gsk restored the split/576-708b-engine-ingestion-scoring branch May 4, 2026 05:14
@mick-gsk mick-gsk reopened this May 4, 2026
@mick-gsk mick-gsk added release:feature Include this PR under Features in release notes size/XL Diff ≥ 500 lines — consider splitting labels May 4, 2026
@github-actions github-actions Bot added agent-review-requested Agent review was requested automatically lane/standard Fixes and features — standard review path labels May 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent-review-requested Agent review was requested automatically lane/standard Fixes and features — standard review path release:feature Include this PR under Features in release notes size/XL Diff ≥ 500 lines — consider splitting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants