fix: implement AAR recommendations 1,2,4,8 (#63)

hummbl-dev · Claude (agent) · claude · web-flow · commit be15ebec2a18 · 2026-04-19T09:18:11.000-04:00
1. Lower diff self-grade threshold 80→70 (addresses CI friction)
2. Add dep scoring floor at 20 (prevents misleading 0 for large dep counts)
4. Suppress SyntaxWarning in analyzer subprocess calls
8. Blog post draft: "We certified 170+ repos — governance is the bottleneck"

Co-authored-by: Claude (agent) &lt;claude@agents.hummbl.io&gt;
Co-authored-by: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -47,7 +47,7 @@ jobs:
         run: |
           arbiter diff . --base origin/main --json > arbiter-diff.json
           cat arbiter-diff.json
-          arbiter diff . --base origin/main --fail-under 80
+          arbiter diff . --base origin/main --fail-under 70
 
       - name: Generate HTML report
         if: always()
diff --git a/docs/blog/governance-bottleneck.md b/docs/blog/governance-bottleneck.md
@@ -0,0 +1,69 @@
+# We Certified 170+ Repos — Governance Is the Bottleneck, Not Code Quality
+
+*By Reuben Bowlby | HUMMBL | April 2026*
+
+We built [Arbiter](https://github.com/hummbl-dev/arbiter), a deterministic code quality scoring tool, and used it to certify **173 open-source repositories** across 20 industry categories. The results surprised us.
+
+## The Finding
+
+**Code quality is not the bottleneck. Governance is.**
+
+Popular open-source repos consistently score 85+ on code quality. What separates CERTIFIED from PROVISIONAL is governance maturity: CONTRIBUTING.md, SECURITY.md, Code of Conduct, DCO/CLA processes, and CI/CD configuration.
+
+## The Data
+
+### LLM Frameworks — HUMMBL's Target Market
+
+| Framework | Code Quality | Governance | Certification |
+|-----------|-------------|-----------|---------------|
+| LlamaIndex | 96.4 | 90/100 | **CERTIFIED** |
+| Instructor | 93.4 | 65/100 | CERTIFIED |
+| **LangChain** | **95.4** | **45/100** | **PROVISIONAL** |
+| Guidance | 90.7 | 55/100 | PROVISIONAL |
+| Outlines | 89.9 | 45/100 | PROVISIONAL |
+
+LangChain — the most popular LLM framework in the world — scores 95.4 on code quality but only 45 on governance. That's a D grade on the dimension enterprises care about most.
+
+### The Gold Standard
+
+Project MONAI (NVIDIA's healthcare AI toolkit) scored **98.2** — the highest of any repo we tested. Perfect governance: 100/100. LICENSE, CONTRIBUTING, SECURITY, Code of Conduct, issue templates, PR templates, CI/CD, DCO — everything. That's what CERTIFIED looks like.
+
+### The Surprise Failures
+
+- **Sentry** — 98.5 code quality (best we tested), but **FAILED** certification due to 109 unpinned dependencies
+- **Prefect** — 97.8 code quality, but FAILED due to dependency governance
+- **Flask** — foundational Python library, PROVISIONAL due to 45/100 governance
+
+## Why This Matters
+
+If you're an enterprise adopting open-source AI tools, code quality is table stakes. Every popular framework writes good code. What you should be evaluating is:
+
+1. **Do they have a security disclosure process?** (SECURITY.md)
+2. **Can contributors understand the rules?** (CONTRIBUTING.md + Code of Conduct)
+3. **Are dependencies pinned and managed?** (requirements.txt + lockfiles)
+4. **Is there CI/CD?** (automated quality gates)
+5. **Is there an audit trail?** (governance receipts, not just git log)
+
+## What We Built
+
+[Arbiter](https://github.com/hummbl-dev/arbiter) scores repositories across three dimensions:
+
+- **Code Quality** (50%): lint, security, complexity via ruff, bandit, radon, shellcheck
+- **Governance** (30%): 10 checks for governance artifacts
+- **Dependencies** (20%): version pinning, dependency count, known-good packages
+
+The certification decision is deterministic: same repo always gets the same score. No AI in the scoring path — just structured analysis.
+
+## Try It
+
+```bash
+pip install arbiter-score
+arbiter certify /path/to/your/repo
+arbiter certify --json https://github.com/your-org/your-repo
+```
+
+Or check the [public leaderboard](https://hummbl.io/audit) to see how top repos score.
+
+---
+
+*[HUMMBL](https://hummbl.io) builds governed AI infrastructure for enterprises. Arbiter is our open-source code quality and governance scoring tool.*
diff --git a/src/arbiter/__main__.py b/src/arbiter/__main__.py
@@ -20,6 +20,7 @@
 
 import argparse
 import json
+import os
 import shutil
 import subprocess
 import sys
@@ -156,6 +157,8 @@ def _get_analyzers() -> list[Analyzer]:
 
 def _run_analysis(repo_path: Path, analyzers: list[Analyzer], exclude_paths: list[str] | None = None) -> list[Finding]:
     """Run all analyzers against a repo."""
+    # Suppress SyntaxWarning from escape sequences in scanned files
+    os.environ.setdefault("PYTHONWARNINGS", "ignore::SyntaxWarning")
     all_findings: list[Finding] = []
     for analyzer in analyzers:
         try:
diff --git a/src/arbiter/dep_score.py b/src/arbiter/dep_score.py
@@ -152,7 +152,7 @@ def score_dependencies(deps: list[DepInfo]) -> DepReport:
     known_bonus = known_good * 1
 
     score = base - bloat_penalty + pinned_bonus + known_bonus - unpinned_penalty
-    score = max(0.0, min(100.0, score))
+    score = max(20.0, min(100.0, score))  # Floor at 20 — 0 is misleading for large dep counts
 
     return DepReport(
         total=len(deps),