Skip to content

Latest commit

 

History

History
137 lines (82 loc) · 4.71 KB

File metadata and controls

137 lines (82 loc) · 4.71 KB

LLM Agent Annotation Guidelines

There are fundamental differences between how humans and LLM Agents process information. Human cognition is closer to a recurrent neural network (RNN): information processing is incremental and subject to forgetting. In contrast, LLM Agents rely heavily on KV cache to retain large amounts of context, which can easily lead to context explosion. To effectively leverage LLM Agents while preventing context overload, we establish the following annotation guidelines to reduce token consumption and improve task efficiency.


1. Limit the Amount of Context Displayed

  • Before opening a file, use wc -l to check the number of lines.
  • Only files with fewer than 50 lines may be displayed directly using cat.

2. Precise Localization and Partial Viewing

  • For large files, always use grep -n or sed -n for precise localization to avoid repeated searches.
  • Use inspect to obtain an overview of a Python file’s structure.
  • Use sed -n 'a,bp' to display a specific line range.

3. Context Reservation

  • When using grep, prefer the -A n -B m options to reserve surrounding context.
  • A context window of 5–10 lines is generally appropriate.

4. Non-Interactive Editing

  • The use of interactive editors such as vim or nano is prohibited.
  • Use non-interactive tools such as replace, sed -i, or patch for text modification.

5. Incremental Modifications

  • For small changes, avoid rewriting the entire file.
  • Perform in-place, localized edits using tools like sed -i or replace.

6. Redirect and Preview Redundant Output (Optimization)

  • For commands with verbose output (e.g., pip install, compilation logs, long-running commands), redirect output to a file first.
  • Use tail -n 20 logfile to quickly assess whether the command succeeded.
  • If it failed, use grep -Ei "error|warning|fail" logfile to locate the issue.

Example:

pip install package > pip_log.txt 2>&1
if tail -n 20 pip_log.txt | grep -q "Successfully installed"; then
    echo "Installation succeeded; no need to inspect detailed logs."
else
    grep -Ei "error|warning|fail" pip_log.txt || tail -n 30 pip_log.txt
fi

7. Task Modularization

  • Decompose large tasks into independent sub-tasks.
  • Manage context separately for each sub-task.

8. Explicit Search Boundaries

  • When performing searches, explicitly specify file scopes or date ranges.
  • Avoid unrestricted global scans.

9. Precise Path Specification

  • Always use absolute paths or clearly defined relative paths.
  • This reduces the cost of implicit context inference.

10. Mandatory Review and Human Intervention

  • If the Agent continues to produce large errors after multiple modification attempts, a human may pause the Agent using wait or similar commands.
  • Use explicit prompts such as I think I ignore xxx or I think I miss xxx to force the Agent to re-examine its mistakes.
  • If the Agent still fails to identify the issue, the human should directly intervene using statements like I think I am wrong because xxx to enforce correction.
  • If the model becomes stuck in a loop at a certain point, the human should interrupt with a prompt such as wait, I think I was trapped xxx.

11. Writing Code

  • Before writing code, it is strongly recommended to have the model verbally plan the solution first; otherwise, errors and improper modularization are likely.
  • When browsing code, the model should open and analyze it incrementally to ensure correctness.

12. When the Annotator Cannot Judge Action Validity

  • If the annotator cannot accurately determine whether a particular Agent action is reasonable, allow the Agent to continue.
  • Once an error becomes explicit, correct it.
  • During correction, previous erroneous steps may be preserved, and only the current step should be forcibly corrected, to improve the model’s error-recovery capability.

13. Context Management (TODO)

  • Provide the model with tools to summarize and delete context.
  • Consider partitioning context into segments, similar to the approach proposed by Linsir.

14. Workflow Management (TODO)

  • Provide LLM tools with the ability to make parallel calls to multiple LLMs.

15. replace Tool Caveat

  • The replace tool is prone to errors when line numbers are involved.

16. Debugging

  • During debugging, the model does not use print statements.
  • It also fails to analyze the corresponding stack traces.

17. When an Agent Must Implement Complex Logic from Scratch (e.g., MLE Tasks)

  • It is recommended that the annotator manually intervene and force the Agent to interact with ipython.
  • This prevents bugs from accumulating in large source files.