Skip to content

Latest commit

 

History

History
82 lines (61 loc) · 2.28 KB

File metadata and controls

82 lines (61 loc) · 2.28 KB
id <category>_task_<N>_<short_name>
name Human-readable task name
category <category>
timeout_seconds 300

Prompt

The task instruction sent to the agent. Write it as if speaking directly to the agent.

All input files are located under /tmp_workspace/. The agent should save outputs to /tmp_workspace/results/.

Important: Do NOT use ## (level-2 headings) inside the Prompt section. The parser splits sections by ## headings, so a ## here will truncate the prompt. Use ### or lower for any sub-headings within the prompt.

Expected Behavior

Describe what a correct agent execution looks like, step by step. This section is for human readers only and is not sent to the agent.

Grading Criteria

  • Criterion 1
  • Criterion 2
  • ...

Automated Checks

def grade(**kwargs) -> dict:
    """
    Return a dict of metric_name -> float (0.0 to 1.0).
    Must include an "overall_score" key.
    Runs inside the container with cwd=/tmp_workspace.
    """
    scores = {}
    # ... grading logic ...
    scores["overall_score"] = 0.0
    return scores

Workspace Path

workspace/<category>/task_<N>_<short_name>

Skills

Env

Warmup