An analyzer should inspect one aspect of a repo and return:
- a dimension name
- a score
- short findings
- any details needed by downstream report surfaces
- stay focused on one dimension
- return stable fields so reports stay additive
- keep network work minimal
- prefer repo-local evidence first
- Add the analyzer under
src/analyzers/. - Make sure it fits the existing result shape used by scoring and report writers.
- Add tests for both the analyzer output and any score/report behavior it changes.
- Re-run
pytestandmake workbook-gateif workbook-facing summaries change.
In addition to existing README quality fields, ReadmeAnalyzer now produces:
readme_last_touched_days— days since the README file was last modified, based on Git historycode_last_touched_days— days since any non-README file in the repo was last modifiedreadme_staleness_ratio—readme_last_touched_days / code_last_touched_days; higher means the README is aging faster than the codereadme_stale— boolean;truewhenreadme_staleness_ratio > 5.0ANDcode_last_touched_days < 90, i.e., the README is more than five times older than the code and the code is still being actively touched
Excel and control-center surfacing for these staleness fields is wired via S2.4.
ActivityAnalyzer now produces release signal fields via GithubClient.get_releases():
has_any_release— boolean; whether the repo has at least one published releaserelease_count— total number of releases fetched (capped at 10 per run)releases_available— whether the releases endpoint was reachablelatest_release_age_days— days since the most recent release was publishedlatest_release_is_prerelease— boolean; whether the most recent release is marked as a pre-release
Excel and control-center surfacing for these release fields is wired via S2.4.
Analyzers can opt in to per-(repo, sha, analyzer) result caching. Cached results are stored in the analyzer_cache SQLite table. A cache hit is fully transparent to callers — the framework substitutes the stored result without invoking the analyzer.
Implement cache_inputs_hash() on your analyzer class. It must return a stable string hash derived from all inputs that affect the result. Inputs that count as stable:
- Lockfile bytes (content hash, not path)
- README file content + git commit timestamps
- Sorted directory listing + primary language string
Inputs that are not stable (do not include): wall-clock time, run-specific IDs, mutable config values.
Three analyzers currently opt in:
DependenciesAnalyzer— hashes lockfile bytesReadmeAnalyzer— hashes README content + git timestampsStructureAnalyzer— hashes sorted directory listing + primary language
Pass --reconcile-cache to re-run all analyzers after the audit with the cache disabled and deep-compare the fresh results against the cached values (1e-6 float tolerance for numeric fields). The run exits non-zero on any divergence and writes output/cache-reconcile-<user>-<date>.json with a full diff. This is a CI release-gate tool — not intended for normal runs.
--no-analyzer-cache disables the cache for the entire run without the post-run comparison. Use it when you need a guaranteed-fresh pass without the overhead of reconciliation.
- The workbook and HTML surfaces consume the same scored audit facts.
- Prefer additive result fields over renaming existing ones.
- If a new analyzer changes the explanation story, update the explainability surfaces too.