AnnotationGuidelines/README_en.md at main · terminal-agent/AnnotationGuidelines

LLM Agent Annotation Guidelines

There are fundamental differences between how humans and LLM Agents process information. Human cognition is closer to a recurrent neural network (RNN): information processing is incremental and subject to forgetting. In contrast, LLM Agents rely heavily on KV cache to retain large amounts of context, which can easily lead to context explosion. To effectively leverage LLM Agents while preventing context overload, we establish the following annotation guidelines to reduce token consumption and improve task efficiency.

1. Limit the Amount of Context Displayed

Before opening a file, use wc -l to check the number of lines.
Only files with fewer than 50 lines may be displayed directly using cat.

2. Precise Localization and Partial Viewing

For large files, always use grep -n or sed -n for precise localization to avoid repeated searches.
Use inspect to obtain an overview of a Python file’s structure.
Use sed -n 'a,bp' to display a specific line range.

3. Context Reservation

When using grep, prefer the -A n -B m options to reserve surrounding context.
A context window of 5–10 lines is generally appropriate.

4. Non-Interactive Editing

The use of interactive editors such as vim or nano is prohibited.
Use non-interactive tools such as replace, sed -i, or patch for text modification.

5. Incremental Modifications

For small changes, avoid rewriting the entire file.
Perform in-place, localized edits using tools like sed -i or replace.

6. Redirect and Preview Redundant Output (Optimization)

For commands with verbose output (e.g., pip install, compilation logs, long-running commands), redirect output to a file first.
Use tail -n 20 logfile to quickly assess whether the command succeeded.
If it failed, use grep -Ei "error|warning|fail" logfile to locate the issue.

Example:

pip install package > pip_log.txt 2>&1
if tail -n 20 pip_log.txt | grep -q "Successfully installed"; then
    echo "Installation succeeded; no need to inspect detailed logs."
else
    grep -Ei "error|warning|fail" pip_log.txt || tail -n 30 pip_log.txt
fi

7. Task Modularization

Decompose large tasks into independent sub-tasks.
Manage context separately for each sub-task.

8. Explicit Search Boundaries

When performing searches, explicitly specify file scopes or date ranges.
Avoid unrestricted global scans.

9. Precise Path Specification

Always use absolute paths or clearly defined relative paths.
This reduces the cost of implicit context inference.

10. Mandatory Review and Human Intervention

If the Agent continues to produce large errors after multiple modification attempts, a human may pause the Agent using wait or similar commands.
Use explicit prompts such as I think I ignore xxx or I think I miss xxx to force the Agent to re-examine its mistakes.
If the Agent still fails to identify the issue, the human should directly intervene using statements like I think I am wrong because xxx to enforce correction.
If the model becomes stuck in a loop at a certain point, the human should interrupt with a prompt such as wait, I think I was trapped xxx.

11. Writing Code

Before writing code, it is strongly recommended to have the model verbally plan the solution first; otherwise, errors and improper modularization are likely.
When browsing code, the model should open and analyze it incrementally to ensure correctness.

12. When the Annotator Cannot Judge Action Validity

If the annotator cannot accurately determine whether a particular Agent action is reasonable, allow the Agent to continue.
Once an error becomes explicit, correct it.
During correction, previous erroneous steps may be preserved, and only the current step should be forcibly corrected, to improve the model’s error-recovery capability.

13. Context Management (TODO)

Provide the model with tools to summarize and delete context.
Consider partitioning context into segments, similar to the approach proposed by Linsir.

14. Workflow Management (TODO)

Provide LLM tools with the ability to make parallel calls to multiple LLMs.

15. `replace` Tool Caveat

The replace tool is prone to errors when line numbers are involved.

16. Debugging

During debugging, the model does not use print statements.
It also fails to analyze the corresponding stack traces.

17. When an Agent Must Implement Complex Logic from Scratch (e.g., MLE Tasks)

It is recommended that the annotator manually intervene and force the Agent to interact with ipython.
This prevents bugs from accumulating in large source files.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM Agent Annotation Guidelines

1. Limit the Amount of Context Displayed

2. Precise Localization and Partial Viewing

3. Context Reservation

4. Non-Interactive Editing

5. Incremental Modifications

6. Redirect and Preview Redundant Output (Optimization)

7. Task Modularization

8. Explicit Search Boundaries

9. Precise Path Specification

10. Mandatory Review and Human Intervention

11. Writing Code

12. When the Annotator Cannot Judge Action Validity

13. Context Management (TODO)

14. Workflow Management (TODO)

15. `replace` Tool Caveat

16. Debugging

17. When an Agent Must Implement Complex Logic from Scratch (e.g., MLE Tasks)

FilesExpand file tree

README_en.md

Latest commit

History

README_en.md

File metadata and controls

LLM Agent Annotation Guidelines

1. Limit the Amount of Context Displayed

2. Precise Localization and Partial Viewing

3. Context Reservation

4. Non-Interactive Editing

5. Incremental Modifications

6. Redirect and Preview Redundant Output (Optimization)

7. Task Modularization

8. Explicit Search Boundaries

9. Precise Path Specification

10. Mandatory Review and Human Intervention

11. Writing Code

12. When the Annotator Cannot Judge Action Validity

13. Context Management (TODO)

14. Workflow Management (TODO)

15. replace Tool Caveat

16. Debugging

17. When an Agent Must Implement Complex Logic from Scratch (e.g., MLE Tasks)

15. `replace` Tool Caveat