feat: add ingestion, scoring, pipeline to drift-engine package (ADR-100 phase 3a-ingestion)#716
Open
mick-gsk wants to merge 1 commit into
Open
feat: add ingestion, scoring, pipeline to drift-engine package (ADR-100 phase 3a-ingestion)#716mick-gsk wants to merge 1 commit into
mick-gsk wants to merge 1 commit into
Conversation
…00 phase 3a-ingestion)
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR introduces the core drift-engine package plumbing for ingestion, scoring, and end-to-end analysis orchestration as part of the ADR-100 package split.
Changes:
- Adds the new scoring engine and pipeline phases for parsing, signal execution, scoring, and result assembly.
- Adds ingestion utilities for file discovery, Python/TypeScript parsing, git history/blame, test detection, external report import, and GitHub API access.
- Wires a new top-level analyzer entrypoint and adds smoke tests to verify package/module importability.
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 30 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/drift-engine/tests/test_drift_engine_smoke.py | Adds import smoke tests for the new package modules. |
| packages/drift-engine/src/drift_engine/scoring/engine.py | Implements composite scoring, calibration, gating, and path override helpers. |
| packages/drift-engine/src/drift_engine/scoring/init.py | Re-exports scoring APIs from the new engine module. |
| packages/drift-engine/src/drift_engine/pipeline.py | Adds the new ingestion/signal/scoring/result assembly pipeline. |
| packages/drift-engine/src/drift_engine/ingestion/ts_parser.py | Adds TypeScript/TSX parsing and pattern extraction via tree-sitter. |
| packages/drift-engine/src/drift_engine/ingestion/test_detection.py | Adds helpers to classify test/generated/production files. |
| packages/drift-engine/src/drift_engine/ingestion/github_api.py | Adds a minimal stdlib GitHub REST client for calibration data. |
| packages/drift-engine/src/drift_engine/ingestion/git_history.py | Adds git history parsing, AI attribution heuristics, and indexing. |
| packages/drift-engine/src/drift_engine/ingestion/git_blame.py | Adds git blame parsing, caching, and parallel blame execution. |
| packages/drift-engine/src/drift_engine/ingestion/file_discovery.py | Adds repo file discovery, language detection, and manifest caching. |
| packages/drift-engine/src/drift_engine/ingestion/external_report.py | Adds JSON adapters for SonarQube, pylint, and CodeClimate reports. |
| packages/drift-engine/src/drift_engine/ingestion/ast_parser.py | Adds Python AST parsing plus a minimal TS fallback parser. |
| packages/drift-engine/src/drift_engine/ingestion/init.py | Re-exports ingestion entrypoints from the new package. |
| packages/drift-engine/src/drift_engine/analyzer.py | Adds top-level repo/diff analysis orchestration using the new pipeline. |
Comment on lines
+1333
to
+1340
| # Apply per-path overrides (filter + re-weight) before scoring | ||
| if config.path_overrides: | ||
| all_findings = apply_path_overrides( | ||
| all_findings, | ||
| config.path_overrides, | ||
| effective_weights, | ||
| breadth_cap=breadth_cap, | ||
| ) |
Comment on lines
+1372
to
+1374
| signal_scores = self._signal_score_fn(all_findings, **scoring_kwargs) | ||
| repo_score = self._repo_score_fn(signal_scores, effective_weights) | ||
| module_scores = self._module_score_fn(all_findings, effective_weights) |
Comment on lines
+341
to
+351
| """Return (invalidator_type, invalidator_value) for cache lookup.""" | ||
| invalidator_type = "git_head" | ||
| invalidator_value = _current_git_head(repo_path) | ||
| if invalidator_value is None: | ||
| invalidator_type = "mtime" | ||
| invalidator_value = _mtime_fingerprint( | ||
| repo_path, | ||
| include_patterns, | ||
| prepared_exclude, | ||
| supported, | ||
| ) |
Comment on lines
+618
to
+622
| cached = _check_discovery_cache( | ||
| manifest, cache_key, invalidator_type, invalidator_value, skipped_out | ||
| ) | ||
| if cached is not None: | ||
| return cached |
Comment on lines
+1027
to
+1035
| eligible: list[tuple[ParseResult, str, str]] = [] # (pr, posix, content_hash) | ||
| for pr in parsed.parse_results: | ||
| if callable(should_process) and not should_process(pr): | ||
| continue | ||
| p = pr.file_path.as_posix() | ||
| file_hash = parsed.file_hashes.get(p) | ||
| if not file_hash: | ||
| continue | ||
| eligible.append((pr, p, SignalCache.content_hash_for_file(file_hash))) |
Comment on lines
+189
to
+197
| # Re-weight if override provides custom weights | ||
| if override.weights is not None: | ||
| wd = override.weights.as_dict() | ||
| key = _SIGNAL_WEIGHT_KEYS.get(f.signal_type, f.signal_type) | ||
| w = wd.get(key, 0.1) | ||
| breadth = min(breadth_cap, 1 + math.log(1 + len(f.related_files))) | ||
| f.impact = round(w * f.score * breadth, 4) | ||
|
|
||
| kept.append(f) |
Comment on lines
+202
to
+207
| def compute_signal_scores( | ||
| findings: list[Finding], | ||
| *, | ||
| dampening_k: int = _DAMPENING_K, | ||
| min_findings: int = 0, | ||
| ) -> dict[str, float]: |
Comment on lines
+220
to
+222
| Args: | ||
| dampening_k: count-dampening constant (default 10; small repos use 20). | ||
| min_findings: per-signal minimum finding count to score (below → 0). |
Comment on lines
+160
to
+164
| Uses tiered heuristics: | ||
| - Co-author tag from known AI tool → 0.95 confidence | ||
| - Tier 1 formulaic message (specific AI patterns) → 0.40 + boost | ||
| - Conventional commit in AI-tool repo → 0.20 + boost | ||
| - Tier 2 formulaic message (generic verb-noun) → 0.15 + boost |
Comment on lines
+191
to
+196
| # Tier 1.5: conventional-commit format — meaningful only with tool indicators | ||
| if indicator_boost > 0: | ||
| conv_match = _CONVENTIONAL_COMMIT_RE.match(msg_first_line) | ||
| if conv_match and not msg_body: | ||
| conf = min(0.40 + indicator_boost, 0.95) | ||
| return True, round(conf, 2) |
|
|
||
| def _serialize_commit(commit: CommitInfo) -> dict[str, object]: | ||
| hashed_coauthors = [ | ||
| hashlib.sha256(coauthor.strip().lower().encode("utf-8")).hexdigest() |
| # Trend context compatibility wrappers (ADR-005) | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| _NOISE_FLOOR = NOISE_FLOOR |
| return parse_python_file(file_path, repo_path) | ||
|
|
||
| if language in ("typescript", "tsx", "javascript", "jsx"): | ||
| from drift_engine.ingestion.ts_parser import parse_typescript_file |
|
|
||
| if tree_sitter_available(): | ||
| langs |= {"typescript", "tsx", "javascript", "jsx"} | ||
| except ImportError: |
| try: | ||
| ts = int(raw_line[12:]) | ||
| date = datetime.date.fromtimestamp(ts) | ||
| except (ValueError, OSError): |
| files_changed.append(fpath) | ||
| total_ins += ins | ||
| total_del += dels | ||
| except ValueError: |
| """ | ||
| if not tree_sitter_available(): | ||
| # Delegate to the regex stub in ast_parser | ||
| from drift_engine.ingestion.ast_parser import _parse_typescript_stub |
| value = int(env_override) | ||
| if value >= 1: | ||
| return value | ||
| except ValueError: |
|
|
||
|
|
||
| # Re-export for backwards compat; canonical implementation in models.py | ||
| _severity_for_score = severity_for_score |
| candidate = projected[key] + residue | ||
| if min_weight <= candidate <= max_weight: | ||
| projected[key] = candidate | ||
| residue = 0.0 |
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Split from #576 (ADR-100 monorepo migration). Part of the PR decomposition into atomic, reviewable units.