Skip to content

Commit be15ebe

Browse files
hummbl-devClaude (agent)claude
authored
fix: implement AAR recommendations 1,2,4,8 (#63)
1. Lower diff self-grade threshold 80→70 (addresses CI friction) 2. Add dep scoring floor at 20 (prevents misleading 0 for large dep counts) 4. Suppress SyntaxWarning in analyzer subprocess calls 8. Blog post draft: "We certified 170+ repos — governance is the bottleneck" Co-authored-by: Claude (agent) <claude@agents.hummbl.io> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 4feb795 commit be15ebe

4 files changed

Lines changed: 74 additions & 2 deletions

File tree

.github/workflows/ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ jobs:
4747
run: |
4848
arbiter diff . --base origin/main --json > arbiter-diff.json
4949
cat arbiter-diff.json
50-
arbiter diff . --base origin/main --fail-under 80
50+
arbiter diff . --base origin/main --fail-under 70
5151
5252
- name: Generate HTML report
5353
if: always()

docs/blog/governance-bottleneck.md

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
# We Certified 170+ Repos — Governance Is the Bottleneck, Not Code Quality
2+
3+
*By Reuben Bowlby | HUMMBL | April 2026*
4+
5+
We built [Arbiter](https://github.com/hummbl-dev/arbiter), a deterministic code quality scoring tool, and used it to certify **173 open-source repositories** across 20 industry categories. The results surprised us.
6+
7+
## The Finding
8+
9+
**Code quality is not the bottleneck. Governance is.**
10+
11+
Popular open-source repos consistently score 85+ on code quality. What separates CERTIFIED from PROVISIONAL is governance maturity: CONTRIBUTING.md, SECURITY.md, Code of Conduct, DCO/CLA processes, and CI/CD configuration.
12+
13+
## The Data
14+
15+
### LLM Frameworks — HUMMBL's Target Market
16+
17+
| Framework | Code Quality | Governance | Certification |
18+
|-----------|-------------|-----------|---------------|
19+
| LlamaIndex | 96.4 | 90/100 | **CERTIFIED** |
20+
| Instructor | 93.4 | 65/100 | CERTIFIED |
21+
| **LangChain** | **95.4** | **45/100** | **PROVISIONAL** |
22+
| Guidance | 90.7 | 55/100 | PROVISIONAL |
23+
| Outlines | 89.9 | 45/100 | PROVISIONAL |
24+
25+
LangChain — the most popular LLM framework in the world — scores 95.4 on code quality but only 45 on governance. That's a D grade on the dimension enterprises care about most.
26+
27+
### The Gold Standard
28+
29+
Project MONAI (NVIDIA's healthcare AI toolkit) scored **98.2** — the highest of any repo we tested. Perfect governance: 100/100. LICENSE, CONTRIBUTING, SECURITY, Code of Conduct, issue templates, PR templates, CI/CD, DCO — everything. That's what CERTIFIED looks like.
30+
31+
### The Surprise Failures
32+
33+
- **Sentry** — 98.5 code quality (best we tested), but **FAILED** certification due to 109 unpinned dependencies
34+
- **Prefect** — 97.8 code quality, but FAILED due to dependency governance
35+
- **Flask** — foundational Python library, PROVISIONAL due to 45/100 governance
36+
37+
## Why This Matters
38+
39+
If you're an enterprise adopting open-source AI tools, code quality is table stakes. Every popular framework writes good code. What you should be evaluating is:
40+
41+
1. **Do they have a security disclosure process?** (SECURITY.md)
42+
2. **Can contributors understand the rules?** (CONTRIBUTING.md + Code of Conduct)
43+
3. **Are dependencies pinned and managed?** (requirements.txt + lockfiles)
44+
4. **Is there CI/CD?** (automated quality gates)
45+
5. **Is there an audit trail?** (governance receipts, not just git log)
46+
47+
## What We Built
48+
49+
[Arbiter](https://github.com/hummbl-dev/arbiter) scores repositories across three dimensions:
50+
51+
- **Code Quality** (50%): lint, security, complexity via ruff, bandit, radon, shellcheck
52+
- **Governance** (30%): 10 checks for governance artifacts
53+
- **Dependencies** (20%): version pinning, dependency count, known-good packages
54+
55+
The certification decision is deterministic: same repo always gets the same score. No AI in the scoring path — just structured analysis.
56+
57+
## Try It
58+
59+
```bash
60+
pip install arbiter-score
61+
arbiter certify /path/to/your/repo
62+
arbiter certify --json https://github.com/your-org/your-repo
63+
```
64+
65+
Or check the [public leaderboard](https://hummbl.io/audit) to see how top repos score.
66+
67+
---
68+
69+
*[HUMMBL](https://hummbl.io) builds governed AI infrastructure for enterprises. Arbiter is our open-source code quality and governance scoring tool.*

src/arbiter/__main__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020

2121
import argparse
2222
import json
23+
import os
2324
import shutil
2425
import subprocess
2526
import sys
@@ -156,6 +157,8 @@ def _get_analyzers() -> list[Analyzer]:
156157

157158
def _run_analysis(repo_path: Path, analyzers: list[Analyzer], exclude_paths: list[str] | None = None) -> list[Finding]:
158159
"""Run all analyzers against a repo."""
160+
# Suppress SyntaxWarning from escape sequences in scanned files
161+
os.environ.setdefault("PYTHONWARNINGS", "ignore::SyntaxWarning")
159162
all_findings: list[Finding] = []
160163
for analyzer in analyzers:
161164
try:

src/arbiter/dep_score.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ def score_dependencies(deps: list[DepInfo]) -> DepReport:
152152
known_bonus = known_good * 1
153153

154154
score = base - bloat_penalty + pinned_bonus + known_bonus - unpinned_penalty
155-
score = max(0.0, min(100.0, score))
155+
score = max(20.0, min(100.0, score)) # Floor at 20 — 0 is misleading for large dep counts
156156

157157
return DepReport(
158158
total=len(deps),

0 commit comments

Comments
 (0)