There are fundamental differences between how humans and LLM Agents process information. Human cognition is closer to a recurrent neural network (RNN): information processing is incremental and subject to forgetting. In contrast, LLM Agents rely heavily on KV cache to retain large amounts of context, which can easily lead to context explosion. To effectively leverage LLM Agents while preventing context overload, we establish the following annotation guidelines to reduce token consumption and improve task efficiency.
- Before opening a file, use
wc -lto check the number of lines. - Only files with fewer than 50 lines may be displayed directly using
cat.
- For large files, always use
grep -norsed -nfor precise localization to avoid repeated searches. - Use
inspectto obtain an overview of a Python file’s structure. - Use
sed -n 'a,bp'to display a specific line range.
- When using
grep, prefer the-A n -B moptions to reserve surrounding context. - A context window of 5–10 lines is generally appropriate.
- The use of interactive editors such as
vimornanois prohibited. - Use non-interactive tools such as
replace,sed -i, orpatchfor text modification.
- For small changes, avoid rewriting the entire file.
- Perform in-place, localized edits using tools like
sed -iorreplace.
- For commands with verbose output (e.g.,
pip install, compilation logs, long-running commands), redirect output to a file first. - Use
tail -n 20 logfileto quickly assess whether the command succeeded. - If it failed, use
grep -Ei "error|warning|fail" logfileto locate the issue.
Example:
pip install package > pip_log.txt 2>&1
if tail -n 20 pip_log.txt | grep -q "Successfully installed"; then
echo "Installation succeeded; no need to inspect detailed logs."
else
grep -Ei "error|warning|fail" pip_log.txt || tail -n 30 pip_log.txt
fi- Decompose large tasks into independent sub-tasks.
- Manage context separately for each sub-task.
- When performing searches, explicitly specify file scopes or date ranges.
- Avoid unrestricted global scans.
- Always use absolute paths or clearly defined relative paths.
- This reduces the cost of implicit context inference.
- If the Agent continues to produce large errors after multiple modification attempts, a human may pause the Agent using
waitor similar commands. - Use explicit prompts such as
I think I ignore xxxorI think I miss xxxto force the Agent to re-examine its mistakes. - If the Agent still fails to identify the issue, the human should directly intervene using statements like
I think I am wrong because xxxto enforce correction. - If the model becomes stuck in a loop at a certain point, the human should interrupt with a prompt such as
wait, I think I was trapped xxx.
- Before writing code, it is strongly recommended to have the model verbally plan the solution first; otherwise, errors and improper modularization are likely.
- When browsing code, the model should open and analyze it incrementally to ensure correctness.
- If the annotator cannot accurately determine whether a particular Agent action is reasonable, allow the Agent to continue.
- Once an error becomes explicit, correct it.
- During correction, previous erroneous steps may be preserved, and only the current step should be forcibly corrected, to improve the model’s error-recovery capability.
- Provide the model with tools to summarize and delete context.
- Consider partitioning context into segments, similar to the approach proposed by Linsir.
- Provide LLM tools with the ability to make parallel calls to multiple LLMs.
- The
replacetool is prone to errors when line numbers are involved.
- During debugging, the model does not use
printstatements. - It also fails to analyze the corresponding stack traces.
- It is recommended that the annotator manually intervene and force the Agent to interact with
ipython. - This prevents bugs from accumulating in large source files.