test(commands): update TESTING.md to evaluate discovery, lint, and deploy explicitly

dhilipkumars · dhilipkumars · commit ea3a460a2d0a · 2026-03-04T15:50:56.000-05:00
diff --git a/tests/extension-commands/TESTING.md b/tests/extension-commands/TESTING.md
@@ -1,9 +1,5 @@
 # Testing Extension Commands
 
-This directory contains a mock project to verify that LLM agents correctly identify and suggest custom extension commands defined in `.specify/extensions.yml`.
-
-# Testing Extension Commands
-
 This directory contains a mock project to verify that LLM agents correctly identify and execute custom extension commands defined in `.specify/extensions.yml`.
 
 ## Automated LLM Test Execution
@@ -23,20 +19,24 @@ Before you begin, read `../../extensions/EXTENSION-API-REFERENCE.md` to understa
 You are evaluating if you can correctly parse `.specify/extensions.yml` to map a command name to a markdown file, and then execute the script described in that markdown file.
 
 **Test Cases to Evaluate**:
-1. [Test Case 1] "Discovery Validation": Read `.specify/extensions.yml`. Verify that you can find the commands `speckit.test.lint` mapping to `lint.md`, and `speckit.test.deploy` mapping to `deploy.md`. If you can, mark this test as PASS. If you cannot find them, mark as FAIL.
-2. [Test Case 2] "Intent Binding": Pretend to execute the `speckit.test.lint` command. First, look at `.specify/extensions.yml` to find its matching file (`lint.md`). Open `lint.md` to find the exact python command required for the linter. Since this is a sandbox, to successfully evaluate the command you must actually run that python command here to get the execution output. If your execution provides the python console output (like 'The linter is complete [timestamp]'), mark as PASS. If you don't know what to do, mark as FAIL.
+1. [Test Case 1] "Discovery Validation": Load the extension according to the specification. Verify that you have discovered the commands `speckit.test.lint` and `speckit.test.deploy`. If you can find them and their properties in the extensions config, mark as PASS. If not, mark as FAIL.
+2. [Test Case 2] "Simulate Lint Command": Simulate a user running `/speckit.test.lint`. Look up its execution configuration in `.specify/extensions.yml` and find its matching script file (`lint.md`). Since this is a testing sandbox, actually execute the required python command to get the execution output. Validate the output of the python program (it should be 'The linter is complete [timestamp]'). If you successfully produce this output, mark as PASS.
+3. [Test Case 3] "Simulate Deploy Command": Simulate a user running `/speckit.test.deploy`. Find its matching script file (`deploy.md`). Actually execute the required python command to get the execution output. Validate the output of the python program (it should be 'Staging deployment is completed at [timestamp]'). If you successfully produce this output, mark as PASS.
 
 **Required Output Format**:
 Provide your output exactly like this example format, replacing the bracketed content with your actual evaluation logic:
 
 ============================= test session starts ==============================
-collected 2 items
+collected 3 items
 
 test_commands_discovery.py::test_discovery [PASS/FAIL]
   Details: [Provide 1-2 sentences proving you found the commands and their descriptions]
 
-test_commands_execution.py::test_intent_binding [PASS/FAIL]
-  Details: [Provide the specific command output, including the generated python timestamp string]
+test_commands_execution.py::test_lint_command [PASS/FAIL]
+  Details: [Provide the specific command output for lint, including the generated python timestamp string]
+
+test_commands_execution.py::test_deploy_command [PASS/FAIL]
+  Details: [Provide the specific command output for deploy, including the generated python timestamp string]
 
 ============================== [X] passed in 0.0s ==============================
 ```