Skip to content

Commit f9be77e

Browse files
authored
fix: reduce agentic-workflows test scope and strengthen safe-output instructions in Agent Persona Explorer (#26152)
1 parent 61febad commit f9be77e

File tree

1 file changed

+10
-4
lines changed

1 file changed

+10
-4
lines changed

.github/workflows/agent-persona-explorer.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ Store all scenarios in cache memory.
7373

7474
## Phase 3: Test Agent Responses (15 minutes)
7575

76-
**Token Budget Optimization**: Test a **representative subset of 6-8 scenarios** (not all scenarios) to reduce token consumption while maintaining quality insights.
76+
**Token Budget Optimization**: Test a **representative subset of 3-4 scenarios** (not all scenarios) to reduce token consumption and ensure budget remains for Phase 5 publishing.
7777

7878
For each selected scenario, invoke the "agentic-workflows" custom agent tool and:
7979

@@ -99,6 +99,7 @@ For each selected scenario, invoke the "agentic-workflows" custom agent tool and
9999
- You are ONLY testing the agent's responses, NOT creating actual workflows
100100
- **Keep responses focused and concise** - summarize findings instead of verbose descriptions
101101
- Aim for quality over quantity - fewer well-analyzed scenarios are better than many shallow ones
102+
- **If any tool call fails, record the error briefly and move on to the next scenario** - do NOT retry or get stuck
102103

103104
## Phase 4: Analyze Results (4 minutes)
104105

@@ -124,7 +125,9 @@ Review all captured responses and identify:
124125

125126
## Phase 5: Document and Publish Findings (1 minute)
126127

127-
Create a GitHub discussion with a **concise** summary report. Use the `create discussion` safe-output to publish your findings.
128+
**MANDATORY OUTPUT**: Regardless of how many phases completed successfully, you MUST call either the `create discussion` or the `noop` safe-output tool before finishing. Failing to call a safe-output tool is the most common cause of workflow failures.
129+
130+
Create a GitHub discussion with a **concise** summary report. Use the `create discussion` safe-output to publish your findings. Even if only 1-2 scenarios were tested, create the discussion with partial results.
128131

129132
**Discussion title**: "Agent Persona Exploration - [DATE]" (e.g., "Agent Persona Exploration - 2024-01-16")
130133

@@ -221,15 +224,18 @@ Example:
221224
## Success Criteria
222225

223226
Your effectiveness is measured by:
227+
- **Safe output**: ALWAYS call either `create discussion` or `noop` — this is the most critical requirement
224228
- **Efficiency**: Complete analysis within token budget (timeout: 180 minutes, concise outputs)
225-
- **Quality over quantity**: Test 6-8 representative scenarios thoroughly rather than all scenarios superficially
229+
- **Quality over quantity**: Test 3-4 representative scenarios thoroughly rather than many scenarios superficially
226230
- **Actionable insights**: Provide 3-5 concrete, implementable recommendations
227231
- **Concise documentation**: Report under 1000 words with progressive disclosure
228232
- **Consistency**: Maintain objective, research-focused methodology
229233

230234
Execute all phases systematically and maintain an objective, research-focused approach to understanding the agentic-workflows custom agent's capabilities and limitations.
231235

232-
**Important**: If no action is needed after completing your analysis, you **MUST** call the `noop` safe-output tool with a brief explanation. Failing to call any safe-output tool is the most common cause of safe-output workflow failures.
236+
**CRITICAL**: You MUST call a safe-output tool before finishing. Choose one:
237+
1. Call `create discussion` to publish findings (preferred — even partial results are valuable)
238+
2. Call `noop` if you were completely unable to gather any data
233239

234240
```json
235241
{"noop": {"message": "No action needed: [brief explanation of what was analyzed and why]"}}

0 commit comments

Comments
 (0)