|
| 1 | +# Issue Triage Workflow: Human + Agent Collaboration |
| 2 | + |
| 3 | +This document describes the interactive workflow for triaging GitHub issues |
| 4 | +using a human maintainer and a Claude Code agent working together. This is |
| 5 | +how we reduced the bitsandbytes issue tracker from 152 open issues to ~60 |
| 6 | +in a single session. |
| 7 | + |
| 8 | +The key insight: the agent handles volume (reading every issue, spotting |
| 9 | +patterns, drafting comments, executing closures) while the human handles |
| 10 | +judgment (deciding what's a real bug, what tone to strike, what the project's |
| 11 | +priorities are). Neither could do this efficiently alone. |
| 12 | + |
| 13 | +## How It Works |
| 14 | + |
| 15 | +### Phase 1: Landscape scan |
| 16 | + |
| 17 | +The agent fetches all open issues and groups them by pattern. This is the |
| 18 | +most time-consuming step if done manually, but an agent can read 150+ |
| 19 | +issues and classify them in minutes. |
| 20 | + |
| 21 | +What the agent does: |
| 22 | +- Fetches issue data with `fetch_issues.py` |
| 23 | +- Queries by label (`Duplicate`, `Proposing to Close`, `Waiting for Info`, etc.) |
| 24 | +- Reads every issue with `show --brief` in batches of 10-15 |
| 25 | +- Identifies clusters: issues that share the same root cause, error message, |
| 26 | + or theme |
| 27 | + |
| 28 | +What the agent produces: |
| 29 | +- A grouped table of issues, organized by pattern |
| 30 | +- For each group: issue numbers, titles, and a short rationale for why |
| 31 | + they're closeable |
| 32 | +- An estimate of how many issues can be closed |
| 33 | + |
| 34 | +The human reviews the groups and says which ones to proceed with. The agent |
| 35 | +does not close anything without human approval. |
| 36 | + |
| 37 | +### Phase 2: Iterative triage |
| 38 | + |
| 39 | +This is the core loop. It works in rounds: |
| 40 | + |
| 41 | +1. **Agent presents a group** (e.g., "13 issues all report the same legacy |
| 42 | + CUDA setup error on bnb 0.41.x-0.42.x"). |
| 43 | + |
| 44 | +2. **Human decides** — close all, close some, investigate further, or skip. |
| 45 | + The human may also: |
| 46 | + - Ask the agent to investigate a specific issue more deeply |
| 47 | + - Provide domain context ("this was fixed in v0.43.0", "FSDP1 is not |
| 48 | + going to be supported", "the offset value was empirically optimized") |
| 49 | + - Override the agent's recommendation ("don't close that, it's a real bug") |
| 50 | + - Specify tone ("no comment needed", "explain what they were asking", |
| 51 | + "say we're working on it but no ETA") |
| 52 | + |
| 53 | +3. **Agent executes** — closes issues with tailored comments, using `gh |
| 54 | + issue close --comment`. The agent adapts the comment to each issue's |
| 55 | + specific context (version, platform, error message) rather than |
| 56 | + copy-pasting a template. |
| 57 | + |
| 58 | +4. **Agent reports back** — confirms what was closed, then identifies the |
| 59 | + next group. |
| 60 | + |
| 61 | +This loop typically runs 5-8 rounds in a session. Each round closes 5-25 |
| 62 | +issues depending on the cluster size. |
| 63 | + |
| 64 | +### Phase 3: Discussion and documentation |
| 65 | + |
| 66 | +Some issues are not simply closeable — they reveal gaps in documentation, |
| 67 | +recurring user confusion, or real bugs that need work. The triage session |
| 68 | +naturally surfaces these: |
| 69 | + |
| 70 | +- **Documentation gaps**: If 5 issues ask the same question about NF4, the |
| 71 | + code needs better docstrings. The agent drafts the documentation, the |
| 72 | + human reviews, and they commit together. |
| 73 | + |
| 74 | +- **Real bugs that need work**: The agent writes a dispatch prompt file |
| 75 | + (see `dispatch_guide.md`) so another agent session can work on the fix |
| 76 | + independently. |
| 77 | + |
| 78 | +- **Pattern documentation**: New patterns discovered during triage get added |
| 79 | + to `issue_patterns.md` so future triage sessions can reference them. |
| 80 | + |
| 81 | +## The Human's Role |
| 82 | + |
| 83 | +The human's judgment is essential for: |
| 84 | + |
| 85 | +- **Deciding what's a real bug vs. user error.** The agent can identify |
| 86 | + patterns, but the human knows the codebase history and what's been fixed. |
| 87 | + |
| 88 | +- **Setting project priorities.** "We're not going to support FSDP1" or |
| 89 | + "mixed quantization is something we're working toward" — these are |
| 90 | + project decisions the agent can't make. |
| 91 | + |
| 92 | +- **Tone and messaging.** The human decides whether an issue gets a detailed |
| 93 | + explanation, a brief "this was fixed, please upgrade", or no comment at |
| 94 | + all. Some issues deserve a thoughtful response even when being closed. |
| 95 | + |
| 96 | +- **Catching false positives.** The agent may recommend closing something |
| 97 | + that looks stale but is actually an important edge case. The human's |
| 98 | + domain knowledge catches these. |
| 99 | + |
| 100 | +- **Cross-referencing.** "Before closing duplicates, are they |
| 101 | + cross-referenced to the canonical issue?" — the human ensures no |
| 102 | + information is lost. |
| 103 | + |
| 104 | +## The Agent's Role |
| 105 | + |
| 106 | +The agent handles the work that's tedious for humans but trivial for an LLM: |
| 107 | + |
| 108 | +- **Reading every issue.** An agent can read and classify 150 issues in a |
| 109 | + few minutes. A human doing this manually would spend hours. |
| 110 | + |
| 111 | +- **Pattern detection.** The agent identifies that 15 issues all reference |
| 112 | + `cuda_setup/main.py` line 166, or that 5 issues all load `.so` files |
| 113 | + on Windows — patterns a human might miss when reading issues one at a time. |
| 114 | + |
| 115 | +- **Comment drafting.** Each closed issue gets a tailored comment explaining |
| 116 | + why it's being closed and what the user should do. The agent writes these |
| 117 | + with the specific context of each issue (version, platform, error message). |
| 118 | + |
| 119 | +- **Cross-reference checking.** Before closing a duplicate, the agent |
| 120 | + verifies the canonical issue exists, is still open, and already |
| 121 | + cross-references the duplicate. |
| 122 | + |
| 123 | +- **Batch execution.** Closing 15 issues with individual comments would |
| 124 | + take a human 30+ minutes of copy-paste. The agent does it in parallel. |
| 125 | + |
| 126 | +## Practical Tips |
| 127 | + |
| 128 | +### Starting a session |
| 129 | + |
| 130 | +``` |
| 131 | +cd ~/git/bitsandbytes |
| 132 | +claude |
| 133 | +``` |
| 134 | + |
| 135 | +Then say something like: "Look at the open issues and identify groups of |
| 136 | +issues that can be closed — duplicates, stale issues, old version problems, |
| 137 | +questions that aren't bugs. Give me an overview before closing anything." |
| 138 | + |
| 139 | +### Pacing |
| 140 | + |
| 141 | +Don't try to close everything at once. Work in groups: |
| 142 | +1. Start with the lowest-hanging fruit (already labeled Duplicate, Proposing |
| 143 | + to Close) |
| 144 | +2. Move to pattern clusters (CUDA setup, Windows pre-support, etc.) |
| 145 | +3. Then handle the one-offs (stale questions, third-party app issues) |
| 146 | +4. End with discussion items that need human judgment |
| 147 | + |
| 148 | +### When the agent is wrong |
| 149 | + |
| 150 | +The agent will occasionally recommend closing something that shouldn't be |
| 151 | +closed. This is expected and fine — that's why the human reviews before |
| 152 | +execution. Common false positives: |
| 153 | +- Issues that look stale but are actually waiting on a specific release |
| 154 | +- Feature requests that look like questions but represent real community |
| 155 | + demand |
| 156 | +- Issues the agent thinks are old-version problems but actually reproduce |
| 157 | + on current code |
| 158 | + |
| 159 | +Just say "don't close that one" and move on. |
| 160 | + |
| 161 | +### Turning triage into action |
| 162 | + |
| 163 | +The best outcome of a triage session isn't just fewer open issues — it's |
| 164 | +discovering what work actually needs to be done. Issues that survive triage |
| 165 | +are the real backlog. During the session: |
| 166 | + |
| 167 | +- If an issue is a real bug, consider generating a dispatch prompt |
| 168 | + (see `dispatch_guide.md`) so a worker agent can fix it. |
| 169 | +- If multiple issues reveal the same documentation gap, fix the docs in |
| 170 | + the same session and reference the commit when closing the issues. |
| 171 | +- If a cluster of issues reveals a systemic problem (e.g., "everyone on |
| 172 | + Jetson hits the same error"), that's a signal to prioritize platform |
| 173 | + support work. |
| 174 | + |
| 175 | +## Related Documents |
| 176 | + |
| 177 | +- `issue_patterns.md` — catalog of known closeable patterns with templates |
| 178 | +- `issue_maintenance_guide.md` — autonomous agent guide for triage (no |
| 179 | + human in the loop) |
| 180 | +- `dispatch_guide.md` — how to generate prompts for worker agents to fix |
| 181 | + real bugs |
| 182 | +- `github_tools_guide.md` — reference for the issue query tools |
0 commit comments