Skip to content

fix: wrap command templates and stabilize integration tests#78

Merged
marcusrbrown merged 4 commits into
mainfrom
fix/integration-tests
Feb 14, 2026
Merged

fix: wrap command templates and stabilize integration tests#78
marcusrbrown merged 4 commits into
mainfrom
fix/integration-tests

Conversation

@marcusrbrown
Copy link
Copy Markdown
Owner

No description provided.

@fro-bot
Copy link
Copy Markdown
Collaborator

fro-bot commented Feb 14, 2026

Code Review

Summary

The PR makes three changes: (1) wraps command templates with XML instruction tags, (2) removes fragile CEP-rooted integration tests, and (3) stabilizes opencode integration tests with explicit model specification.

Issues Found

1. Failing Test - False Positive Detection Logic Bug ⚠️

The new test is failing:

Root Cause: The function has flawed logic. It checks if lowercase "task tracking" appears in the converted text, but doesn't verify whether the original already had lowercase "task":

  • Original: "Use task tracking for ongoing work." (already lowercase)
  • Converter correctly leaves it unchanged
  • But detector finds "task tracking" in converted and incorrectly flags it

Fix needed: The detector should compare original vs. converted to verify uppercase "Task" was actually lowercased, not just detect lowercase occurrences:

2. Command Template Wrapping ✓

The new template structure in looks correct:

This properly wraps command content and adds a placeholder for user arguments.

3. Test Stabilization ✓

  • Removing CEP-rooted integration tests is appropriate (they were fragile)
  • Adding explicit model specification () should improve test reliability

Verification

Check Status
TypeScript typecheck ✓ Pass
Build ✓ Pass
Biome lint ✓ Pass
Unit tests 18/19 Pass (1 failing)

Recommendation

Request changes - The failing test needs to be fixed before merge. The false positive detector needs to check the original text, not just the converted output.

@fro-bot
Copy link
Copy Markdown
Collaborator

fro-bot commented Feb 14, 2026

Code Review

Summary

The PR makes three changes: (1) wraps command templates with XML instruction tags, (2) removes fragile CEP-rooted integration tests, and (3) stabilizes opencode integration tests with explicit model specification.

Issues Found

1. Failing Test - False Positive Detection Logic Bug ⚠️

The test does not flag lowercase task nouns as false positives is failing:

error: expect(received).toEqual(expected)

- []
+ [
+   "False positive: \"task tracking\" - \"Task\" as noun incorrectly lowercased",
+ ]

Root Cause: The detectFalsePositives function at line 8 has flawed logic. It receives _original (note the underscore prefix - intentionally ignored) but only checks if lowercase "task" patterns appear in the converted text, without verifying whether the original already had lowercase "task":

  • Original: "Use task tracking for ongoing work." (already lowercase)
  • Converter correctly leaves it unchanged
  • But detector finds "task tracking" in converted and incorrectly flags it as a false positive

Fix needed: The detector should compare original vs. converted to verify uppercase "Task" was actually lowercased, not just detect lowercase occurrences:

function detectFalsePositives(original: string, converted: string): string[] {
  const falsePositives: string[] = []
  const taskNounLowercased = [
    /complete the task\b/g,
    /\beach task\b/g,
    /\btask management\b/g,
    /\btask tracking\b/g,
    /\btask list\b(?! |$)/g,
  ]
  for (const pattern of taskNounLowercased) {
    // Check BOTH: exists in converted AND NOT in original
    const inConverted = pattern.test(converted)
    const inOriginal = pattern.test(original)
    if (inConverted && !inOriginal) {
      falsePositives.push(
        `False positive: \"${matches[0]}\" - \"Task\" as noun incorrectly lowercased`,
      )
    }
  }
  return falsePositives
}

2. Command Template Wrapping ✓

The new template structure in config-handler.ts:106-113 looks correct:

const wrappedTemplate = `<command-instruction>
${body.trim()}
</command-instruction>

<user-request>
$ARGUMENTS
</user-request>`

This properly wraps command content and adds a placeholder for user arguments, consistent with how skills are handled in skill-loader.ts.

3. Test Stabilization ✓

  • Removing CEP-rooted integration tests is appropriate (they were fragile and depended on external CEP repo)
  • Adding explicit model specification (OPENCODE_TEST_MODEL = opencode/big-pickle) should improve test reliability

4. Workflow Concurrency Fix ✓

The .github/workflows/fro-bot.yaml change from cancel-in-progress: true to false and fixing the group name from github.event.issue.number to github.event.number is appropriate for allowing parallel review runs.

Verification

Check Status
TypeScript typecheck ✓ Pass
Biome lint ✓ Pass
Unit tests 11/12 Pass (1 failing - the bug above)

Recommendation

Request changes - The failing test needs to be fixed before merge. The false positive detector needs to check both original and converted text, not just converted output.

@marcusrbrown
Copy link
Copy Markdown
Owner Author

@fro-bot the tests are working in CI, confirm the check is green. if not, resolve the issue on this branch

The detectFalsePositives function was incorrectly flagging lowercase task
nouns as false positives due to regex state persistence. The .test() method
with 'g' flag maintains lastIndex between calls, causing the second test
(on original string) to start from the wrong position and return false.

Fix: Reset lastIndex=0 before each .test() call to ensure consistent
behavior across original and converted strings.
@marcusrbrown marcusrbrown merged commit 5a81409 into main Feb 14, 2026
11 checks passed
@marcusrbrown marcusrbrown deleted the fix/integration-tests branch February 14, 2026 07:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants