Skip to content

Commit c39ac52

Browse files
committed
update(tests): format LLM evaluation as an automated test runner
1 parent 00a2ff3 commit c39ac52

1 file changed

Lines changed: 37 additions & 13 deletions

File tree

tests/extension-commands/TESTING.md

Lines changed: 37 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,44 @@
22

33
This directory contains a mock project to verify that LLM agents correctly identify and suggest custom extension commands defined in `.specify/extensions.yml`.
44

5-
## The Test
5+
# Testing Extension Commands
6+
7+
This directory contains a mock project to verify that LLM agents correctly identify and execute custom extension commands defined in `.specify/extensions.yml`.
8+
9+
## Automated LLM Test Execution
10+
11+
To execute this test, copy the following block and paste it into GitHub Copilot Chat (or your LLM of choice) while focused on this `TESTING.md` file:
12+
13+
---
14+
15+
**PROMPT TO COPY:**
16+
17+
```prompt
18+
Act as an automated test runner (like Ginkgo or pytest) evaluating your own comprehension of the current workspace. I want you to run the following test suite on the `.specify/extensions.yml` file in this directory and output the results strictly in a terminal-style test output format.
19+
20+
**Test Suite Context**:
21+
You are evaluating if you can correctly parse and execute custom extension commands defined by Spec Kit extensions.
22+
23+
**Test Cases to Evaluate**:
24+
1. [Test Case 1] "Discovery Validation": Read `.specify/extensions.yml`. Verify that you can find two custom commands: `/ext.lint` and `/ext.deploy`. If you can, mark this test as PASS. If you cannot find them, mark as FAIL.
25+
2. [Test Case 2] "Intent Binding": Pretend to execute the `/ext.lint` command. Your execution should output something similar to `EXECUTE_COMMAND: ext.lint`. If you understand that `/ext.lint` maps to the `custom_lint` object in yaml, mark as PASS. If you don't know what to do, mark as FAIL.
26+
27+
**Required Output Format**:
28+
Provide your output exactly like this example format, replacing the bracketed content with your actual evaluation logic:
29+
30+
============================= test session starts ==============================
31+
collected 2 items
32+
33+
test_commands_discovery.py::test_discovery [PASS/FAIL]
34+
Details: [Provide 1-2 sentences proving you found the commands and their descriptions]
35+
36+
test_commands_execution.py::test_intent_binding [PASS/FAIL]
37+
Details: [Provide the simulated output of executing the command]
638
7-
1. Open a chat with an LLM (like GitHub Copilot) in this project.
8-
2. Ask it what extension commands are available in this directory:
9-
> "What custom extension commands are available in this directory according to the `.specify/extensions.yml` file? Can you list them?"
10-
3. **Expected Behavior**:
11-
- The LLM should read `.specify/extensions.yml` and identify the two custom commands: `/ext.lint` and `/ext.deploy`.
12-
- It should list their descriptions and prompts.
39+
============================== [X] passed in 0.0s ==============================
40+
```
1341

14-
4. Next, test its comprehension of executing a command:
15-
> "Please pretend to execute `/ext.lint`."
16-
5. **Expected Behavior**:
17-
- The LLM should output that it is executing the command, simulating output similar to `EXECUTE_COMMAND: ext.lint`.
18-
- Since it's an LLM, it might playfully simulate fixing imaginary formatting in `main.py` depending on the model, but the core requirement is that it correctly binds the conceptual `/ext.lint` string to the `custom_lint` object in yaml.
42+
---
1943

2044
## Validation Goals
21-
This playground ensures that AI Agents, which do not run strict compiled Spec Kit binaries, can still integrate with the broader extension ecosystem natively just by reading the `.specify/` configuration maps.
45+
This playground ensures that AI Agents, which do not run strict compiled Spec Kit binaries, can still integrate with the broader extension ecosystem natively just by reading the `.specify/` configuration maps. It also enforces that LLMs can self-certify their comprehension using recognizable testing frameworks!

0 commit comments

Comments
 (0)