zenith/CITATION.cff at main · Intelligent-Internet/zenith · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
cff-version: 1.2.0
message: "If you use this work, please cite it as below."
title: "From RALPH to Zenith: Designing Harnesses for Long-Running Agents"
type: report
authors:
  - name: "Intelligent Internet"
year: 2026
date-released: 2026-05-08
url: "https://github.com/Intelligent-Internet/zenith"
repository-code: "https://github.com/Intelligent-Internet/zenith"
license: CC-BY-4.0
abstract: >-
  Long-running agents often fail not because they cannot make progress, but
  because they stop before the task is truly complete. We tested five harness
  designs across eight long-horizon tasks to isolate the control mechanisms
  that matter: repeated gap-finding, revisable planning, independent
  verification, adaptive orchestration, and stopping discipline. RALPH is the
  strongest simple baseline because it forces each new session to reopen the
  gap between the current project state and the original requirement, but it
  is expensive and has no principled stopping rule. Our Zenith method keeps
  the useful parts of repeated review while making the loop adaptive: the
  orchestrator dynamically allocates workers, testers, reusable skills,
  replanning, and stopping decisions. In this study, Zenith achieved the best
  mean rank while using less than half of RALPH's per-task cost.
keywords:
  - agent harness
  - long-horizon tasks
  - language model agents
  - test-time scaling
  - orchestration