|
| 1 | +# We Certified 170+ Repos — Governance Is the Bottleneck, Not Code Quality |
| 2 | + |
| 3 | +*By Reuben Bowlby | HUMMBL | April 2026* |
| 4 | + |
| 5 | +We built [Arbiter](https://github.com/hummbl-dev/arbiter), a deterministic code quality scoring tool, and used it to certify **173 open-source repositories** across 20 industry categories. The results surprised us. |
| 6 | + |
| 7 | +## The Finding |
| 8 | + |
| 9 | +**Code quality is not the bottleneck. Governance is.** |
| 10 | + |
| 11 | +Popular open-source repos consistently score 85+ on code quality. What separates CERTIFIED from PROVISIONAL is governance maturity: CONTRIBUTING.md, SECURITY.md, Code of Conduct, DCO/CLA processes, and CI/CD configuration. |
| 12 | + |
| 13 | +## The Data |
| 14 | + |
| 15 | +### LLM Frameworks — HUMMBL's Target Market |
| 16 | + |
| 17 | +| Framework | Code Quality | Governance | Certification | |
| 18 | +|-----------|-------------|-----------|---------------| |
| 19 | +| LlamaIndex | 96.4 | 90/100 | **CERTIFIED** | |
| 20 | +| Instructor | 93.4 | 65/100 | CERTIFIED | |
| 21 | +| **LangChain** | **95.4** | **45/100** | **PROVISIONAL** | |
| 22 | +| Guidance | 90.7 | 55/100 | PROVISIONAL | |
| 23 | +| Outlines | 89.9 | 45/100 | PROVISIONAL | |
| 24 | + |
| 25 | +LangChain — the most popular LLM framework in the world — scores 95.4 on code quality but only 45 on governance. That's a D grade on the dimension enterprises care about most. |
| 26 | + |
| 27 | +### The Gold Standard |
| 28 | + |
| 29 | +Project MONAI (NVIDIA's healthcare AI toolkit) scored **98.2** — the highest of any repo we tested. Perfect governance: 100/100. LICENSE, CONTRIBUTING, SECURITY, Code of Conduct, issue templates, PR templates, CI/CD, DCO — everything. That's what CERTIFIED looks like. |
| 30 | + |
| 31 | +### The Surprise Failures |
| 32 | + |
| 33 | +- **Sentry** — 98.5 code quality (best we tested), but **FAILED** certification due to 109 unpinned dependencies |
| 34 | +- **Prefect** — 97.8 code quality, but FAILED due to dependency governance |
| 35 | +- **Flask** — foundational Python library, PROVISIONAL due to 45/100 governance |
| 36 | + |
| 37 | +## Why This Matters |
| 38 | + |
| 39 | +If you're an enterprise adopting open-source AI tools, code quality is table stakes. Every popular framework writes good code. What you should be evaluating is: |
| 40 | + |
| 41 | +1. **Do they have a security disclosure process?** (SECURITY.md) |
| 42 | +2. **Can contributors understand the rules?** (CONTRIBUTING.md + Code of Conduct) |
| 43 | +3. **Are dependencies pinned and managed?** (requirements.txt + lockfiles) |
| 44 | +4. **Is there CI/CD?** (automated quality gates) |
| 45 | +5. **Is there an audit trail?** (governance receipts, not just git log) |
| 46 | + |
| 47 | +## What We Built |
| 48 | + |
| 49 | +[Arbiter](https://github.com/hummbl-dev/arbiter) scores repositories across three dimensions: |
| 50 | + |
| 51 | +- **Code Quality** (50%): lint, security, complexity via ruff, bandit, radon, shellcheck |
| 52 | +- **Governance** (30%): 10 checks for governance artifacts |
| 53 | +- **Dependencies** (20%): version pinning, dependency count, known-good packages |
| 54 | + |
| 55 | +The certification decision is deterministic: same repo always gets the same score. No AI in the scoring path — just structured analysis. |
| 56 | + |
| 57 | +## Try It |
| 58 | + |
| 59 | +```bash |
| 60 | +pip install arbiter-score |
| 61 | +arbiter certify /path/to/your/repo |
| 62 | +arbiter certify --json https://github.com/your-org/your-repo |
| 63 | +``` |
| 64 | + |
| 65 | +Or check the [public leaderboard](https://hummbl.io/audit) to see how top repos score. |
| 66 | + |
| 67 | +--- |
| 68 | + |
| 69 | +*[HUMMBL](https://hummbl.io) builds governed AI infrastructure for enterprises. Arbiter is our open-source code quality and governance scoring tool.* |
0 commit comments