WildClawBench/tasks/task0_template.md at main · InternLM/WildClawBench

id	<category>_task_<N>_<short_name>
name	Human-readable task name
category	<category>
timeout_seconds	300

Prompt

The task instruction sent to the agent. Write it as if speaking directly to the agent.

All input files are located under /tmp_workspace/. The agent should save outputs to /tmp_workspace/results/.

Important: Do NOT use ## (level-2 headings) inside the Prompt section. The parser splits sections by ## headings, so a ## here will truncate the prompt. Use ### or lower for any sub-headings within the prompt.

Expected Behavior

Describe what a correct agent execution looks like, step by step. This section is for human readers only and is not sent to the agent.

Grading Criteria

Criterion 1
Criterion 2
...

Automated Checks

def grade(**kwargs) -> dict:
    """
    Return a dict of metric_name -> float (0.0 to 1.0).
    Must include an "overall_score" key.
    Runs inside the container with cwd=/tmp_workspace.
    """
    scores = {}
    # ... grading logic ...
    scores["overall_score"] = 0.0
    return scores

Workspace Path

workspace/<category>/task_<N>_<short_name>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prompt

Expected Behavior

Grading Criteria

Automated Checks

Workspace Path

Skills

Env

Warmup

FilesExpand file tree

task0_template.md

Latest commit

History

task0_template.md

File metadata and controls

Prompt

Expected Behavior

Grading Criteria

Automated Checks

Workspace Path

Skills

Env

Warmup