Skip to content

Commit 0b335d0

Browse files
committed
feat: add /strands test command for TUI testing via MCP harness
- Add tester mode to process-inputs.cjs (routes /strands test) - Add task-tester.sop.md with TUI testing instructions - Add tui-test-flows.md with 5 test flows - Add Node.js setup + build steps for tester mode in workflow - Wire TUI harness MCP server (stdio) into the Strands agent
1 parent d41e14b commit 0b335d0

File tree

4 files changed

+163
-5
lines changed

4 files changed

+163
-5
lines changed
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Task Tester SOP
2+
3+
## Role
4+
5+
You are a TUI Tester. Your goal is to verify the AgentCore CLI's interactive TUI behavior by driving it through
6+
predefined test flows using the TUI harness MCP tools. You post results as PR comments.
7+
8+
You MUST NOT modify any code, create branches, or push commits. Your only output is test result comments.
9+
10+
## Tools Available
11+
12+
You have TUI harness MCP tools: `tui_launch`, `tui_send_keys`, `tui_action`, `tui_wait_for`, `tui_screenshot`,
13+
`tui_read_screen`, `tui_close`, `tui_list_sessions`.
14+
15+
You also have `shell` for setup commands and GitHub tools for posting comments.
16+
17+
## Steps
18+
19+
### 1. Setup
20+
21+
- Read the test spec file at `.github/agent-sops/tui-test-flows.md`
22+
- The CLI is already built and available. Launch TUI sessions from the repo root using the default command (which runs
23+
`node dist/cli/index.mjs`).
24+
25+
### 2. Run Test Flows
26+
27+
For each flow in the test spec:
28+
29+
1. Create any required setup (e.g., temp directories, minimal projects) using `shell`
30+
2. Use `tui_launch` to start the CLI with the specified arguments and `cwd`
31+
3. Follow the flow steps: use `tui_action` (preferred — combines send + wait + read in one call) or `tui_wait_for` +
32+
`tui_send_keys` for multi-step interactions
33+
4. Verify each expectation against the screen content
34+
5. On **pass**: record the flow name as passed
35+
6. On **failure**: use `tui_screenshot` to capture the terminal state, record the flow name, expected behavior, actual
36+
behavior, and the screenshot text
37+
7. Always `tui_close` the session when done, even on failure
38+
39+
**Constraints:**
40+
41+
- Use `timeoutMs: 10000` (10 seconds) minimum for all `tui_wait_for` and `tui_action` pattern waits
42+
- Use small terminal dimensions: `cols: 100, rows: 24`
43+
- If a wait times out, retry once before declaring failure
44+
- Use text format screenshots only (not SVG)
45+
- Keep terminal dimensions consistent across all flows
46+
47+
### 3. Post Results
48+
49+
Post a single summary comment on the PR with this format:
50+
51+
```markdown
52+
## 🧪 TUI Test Results
53+
54+
**X/Y flows passed**
55+
56+
### ✅ Passed
57+
58+
- Flow name 1
59+
- Flow name 2
60+
61+
### ❌ Failed
62+
63+
#### Flow name 3
64+
65+
**Expected:** description of what should have happened **Actual:** description of what happened
66+
67+
<details>
68+
<summary>Screenshot</summary>
69+
```
70+
71+
(terminal screenshot here)
72+
73+
```
74+
75+
</details>
76+
```
77+
78+
If all flows pass, omit the Failed section.
79+
80+
## Forbidden Actions
81+
82+
- You MUST NOT modify, create, or delete any source files
83+
- You MUST NOT run git add, git commit, or git push
84+
- You MUST NOT create or update branches
85+
- You MUST NOT approve or merge the pull request
86+
- You MUST NOT run deploy, invoke, or any command that creates AWS resources
87+
- Your ONLY output is test result comments on the pull request
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# TUI Test Flows
2+
3+
Each flow describes a user interaction to verify. The tester agent drives these using the TUI harness MCP tools.
4+
5+
---
6+
7+
## Flow: Help text lists all subcommands
8+
9+
1. Launch: `agentcore --help` (use `tui_launch` with args `["--help"]`)
10+
2. Wait for: "Usage:" on screen
11+
3. Expect all of these subcommands visible: `create`, `deploy`, `invoke`, `status`, `logs`, `add`, `remove`
12+
4. Close session
13+
14+
---
15+
16+
## Flow: Create wizard prompts for project name
17+
18+
1. Launch: `agentcore create` (no flags, in a temp directory)
19+
2. Wait for: a prompt asking for the project name (look for "name" or "project")
20+
3. Expect: an input field or prompt is visible
21+
4. Close session (Ctrl+C)
22+
23+
---
24+
25+
## Flow: Create with --json produces valid JSON
26+
27+
1. In a temp directory, run via shell:
28+
`agentcore create --name TestProj --language Python --framework Strands --model-provider Bedrock --memory none --json`
29+
2. Expect: stdout contains valid JSON with `"success": true` and `"projectPath"`
30+
3. Verify the project directory was created
31+
32+
---
33+
34+
## Flow: Add agent shows framework selection
35+
36+
1. First create a project via shell: `agentcore create --name AgentTest --no-agent --json` (in a temp directory)
37+
2. Launch: `agentcore add agent` in the created project directory
38+
3. Wait for: agent name prompt
39+
4. Type a name, press Enter
40+
5. Wait for: framework or language selection to appear
41+
6. Expect: at least "Strands" and "LangChain_LangGraph" visible as options
42+
7. Close session (Ctrl+C)
43+
44+
---
45+
46+
## Flow: Invalid project name shows error
47+
48+
1. In a temp directory, run via shell:
49+
`agentcore create --name "123invalid" --language Python --framework Strands --model-provider Bedrock --memory none --json`
50+
2. Expect: exit code is non-zero OR output contains an error about the project name (must start with a letter)

.github/scripts/javascript/process-inputs.cjs

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,7 @@ function buildPrompts(mode, issueId, isPullRequest, command, branchName, inputs)
7878
implementer: '.github/agent-sops/task-implementer.sop.md',
7979
reviewer: '.github/agent-sops/task-reviewer.sop.md',
8080
refiner: '.github/agent-sops/task-refiner.sop.md',
81+
tester: '.github/agent-sops/task-tester.sop.md',
8182
};
8283
const scriptFile = sopFiles[mode] || sopFiles.refiner;
8384

@@ -94,11 +95,13 @@ module.exports = async (context, github, core, inputs) => {
9495
const { issueId, command, issue } = await getIssueInfo(github, context, inputs);
9596

9697
const isPullRequest = !!issue.data.pull_request;
97-
const mode = command.startsWith('review')
98-
? 'reviewer'
99-
: isPullRequest || command.startsWith('implement')
100-
? 'implementer'
101-
: 'refiner';
98+
const mode = command.startsWith('test')
99+
? 'tester'
100+
: command.startsWith('review')
101+
? 'reviewer'
102+
: isPullRequest || command.startsWith('implement')
103+
? 'implementer'
104+
: 'refiner';
102105
console.log(`Is PR: ${isPullRequest}, Mode: ${mode}`);
103106

104107
const branchName = await determineBranch(github, context, issueId, mode, isPullRequest);
@@ -113,6 +116,7 @@ module.exports = async (context, github, core, inputs) => {
113116
core.setOutput('session_id', sessionId);
114117
core.setOutput('system_prompt', systemPrompt);
115118
core.setOutput('prompt', prompt);
119+
core.setOutput('mode', mode);
116120
} catch (error) {
117121
const errorMsg = `Failed: ${error.message}`;
118122
console.error(errorMsg);

.github/workflows/strands-command.yml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,20 @@ jobs:
9494
};
9595
await processInputs(context, github, core, inputs);
9696
97+
- name: Setup Node.js (tester mode)
98+
if: steps.process-inputs.outputs.mode == 'tester'
99+
uses: actions/setup-node@v6
100+
with:
101+
node-version: 20.x
102+
cache: 'npm'
103+
104+
- name: Build CLI and TUI harness (tester mode)
105+
if: steps.process-inputs.outputs.mode == 'tester'
106+
run: |
107+
npm ci
108+
npm run build
109+
npm run build:harness
110+
97111
- name: Run Strands Agent
98112
uses: ./.github/actions/strands-action
99113
with:
@@ -102,6 +116,9 @@ jobs:
102116
provider: 'bedrock'
103117
model: 'us.anthropic.claude-sonnet-4-5-20250929-v1:0'
104118
tools: 'strands_tools:shell,retrieve'
119+
mcp_servers:
120+
${{ steps.process-inputs.outputs.mode == 'tester' &&
121+
'{"mcpServers":{"tui-harness":{"command":"node","args":["dist/mcp-harness/index.mjs"]}}}' || '' }}
105122
aws_role_arn: ${{ secrets.AWS_ROLE_ARN }}
106123
aws_region: 'us-west-2'
107124
pat_token: ${{ secrets.GITHUB_TOKEN }}

0 commit comments

Comments
 (0)