-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathCITATION.cff
More file actions
30 lines (30 loc) · 1.38 KB
/
CITATION.cff
File metadata and controls
30 lines (30 loc) · 1.38 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
cff-version: 1.2.0
message: "If you use this work, please cite it as below."
title: "From RALPH to Zenith: Designing Harnesses for Long-Running Agents"
type: report
authors:
- name: "Intelligent Internet"
year: 2026
date-released: 2026-05-08
url: "https://github.com/Intelligent-Internet/zenith"
repository-code: "https://github.com/Intelligent-Internet/zenith"
license: CC-BY-4.0
abstract: >-
Long-running agents often fail not because they cannot make progress, but
because they stop before the task is truly complete. We tested five harness
designs across eight long-horizon tasks to isolate the control mechanisms
that matter: repeated gap-finding, revisable planning, independent
verification, adaptive orchestration, and stopping discipline. RALPH is the
strongest simple baseline because it forces each new session to reopen the
gap between the current project state and the original requirement, but it
is expensive and has no principled stopping rule. Our Zenith method keeps
the useful parts of repeated review while making the loop adaptive: the
orchestrator dynamically allocates workers, testers, reusable skills,
replanning, and stopping decisions. In this study, Zenith achieved the best
mean rank while using less than half of RALPH's per-task cost.
keywords:
- agent harness
- long-horizon tasks
- language model agents
- test-time scaling
- orchestration