fix: wrap command templates and stabilize integration tests by marcusrbrown · Pull Request #78 · marcusrbrown/systematic

marcusrbrown · 2026-02-14T04:19:35Z

No description provided.

fro-bot · 2026-02-14T06:34:21Z

Code Review

Summary

The PR makes three changes: (1) wraps command templates with XML instruction tags, (2) removes fragile CEP-rooted integration tests, and (3) stabilizes opencode integration tests with explicit model specification.

Issues Found

1. Failing Test - False Positive Detection Logic Bug ⚠️

The new test is failing:

Root Cause: The function has flawed logic. It checks if lowercase "task tracking" appears in the converted text, but doesn't verify whether the original already had lowercase "task":

Original: "Use task tracking for ongoing work." (already lowercase)
Converter correctly leaves it unchanged
But detector finds "task tracking" in converted and incorrectly flags it

Fix needed: The detector should compare original vs. converted to verify uppercase "Task" was actually lowercased, not just detect lowercase occurrences:

2. Command Template Wrapping ✓

The new template structure in looks correct:

This properly wraps command content and adds a placeholder for user arguments.

3. Test Stabilization ✓

Removing CEP-rooted integration tests is appropriate (they were fragile)
Adding explicit model specification () should improve test reliability

Verification

Check	Status
TypeScript typecheck	✓ Pass
Build	✓ Pass
Biome lint	✓ Pass
Unit tests	18/19 Pass (1 failing)

Recommendation

Request changes - The failing test needs to be fixed before merge. The false positive detector needs to check the original text, not just the converted output.

fro-bot · 2026-02-14T06:43:36Z

Code Review

Summary

The PR makes three changes: (1) wraps command templates with XML instruction tags, (2) removes fragile CEP-rooted integration tests, and (3) stabilizes opencode integration tests with explicit model specification.

Issues Found

1. Failing Test - False Positive Detection Logic Bug ⚠️

The test does not flag lowercase task nouns as false positives is failing:

error: expect(received).toEqual(expected)

- []
+ [
+   "False positive: \"task tracking\" - \"Task\" as noun incorrectly lowercased",
+ ]

Root Cause: The detectFalsePositives function at line 8 has flawed logic. It receives _original (note the underscore prefix - intentionally ignored) but only checks if lowercase "task" patterns appear in the converted text, without verifying whether the original already had lowercase "task":

Original: "Use task tracking for ongoing work." (already lowercase)
Converter correctly leaves it unchanged
But detector finds "task tracking" in converted and incorrectly flags it as a false positive

Fix needed: The detector should compare original vs. converted to verify uppercase "Task" was actually lowercased, not just detect lowercase occurrences:

function detectFalsePositives(original: string, converted: string): string[] {
  const falsePositives: string[] = []
  const taskNounLowercased = [
    /complete the task\b/g,
    /\beach task\b/g,
    /\btask management\b/g,
    /\btask tracking\b/g,
    /\btask list\b(?! |$)/g,
  ]
  for (const pattern of taskNounLowercased) {
    // Check BOTH: exists in converted AND NOT in original
    const inConverted = pattern.test(converted)
    const inOriginal = pattern.test(original)
    if (inConverted && !inOriginal) {
      falsePositives.push(
        `False positive: \"${matches[0]}\" - \"Task\" as noun incorrectly lowercased`,
      )
    }
  }
  return falsePositives
}

2. Command Template Wrapping ✓

The new template structure in config-handler.ts:106-113 looks correct:

const wrappedTemplate = `<command-instruction>
${body.trim()}
</command-instruction>

<user-request>
$ARGUMENTS
</user-request>`

This properly wraps command content and adds a placeholder for user arguments, consistent with how skills are handled in skill-loader.ts.

3. Test Stabilization ✓

Removing CEP-rooted integration tests is appropriate (they were fragile and depended on external CEP repo)
Adding explicit model specification (OPENCODE_TEST_MODEL = opencode/big-pickle) should improve test reliability

4. Workflow Concurrency Fix ✓

The .github/workflows/fro-bot.yaml change from cancel-in-progress: true to false and fixing the group name from github.event.issue.number to github.event.number is appropriate for allowing parallel review runs.

Verification

Check	Status
TypeScript typecheck	✓ Pass
Biome lint	✓ Pass
Unit tests	11/12 Pass (1 failing - the bug above)

Recommendation

Request changes - The failing test needs to be fixed before merge. The false positive detector needs to check both original and converted text, not just converted output.

marcusrbrown · 2026-02-14T06:50:00Z

@fro-bot the tests are working in CI, confirm the check is green. if not, resolve the issue on this branch

The detectFalsePositives function was incorrectly flagging lowercase task nouns as false positives due to regex state persistence. The .test() method with 'g' flag maintains lastIndex between calls, causing the second test (on original string) to start from the wrong position and return false. Fix: Reset lastIndex=0 before each .test() call to ensure consistent behavior across original and converted strings.

wrap command templates and stabilize integration tests

387b8ab

marcusrbrown self-assigned this Feb 14, 2026

fro-bot added agent: working and removed agent: working labels Feb 14, 2026

test: remove CEP-rooted converter integration checks

f3a4c76

fro-bot added the agent: working label Feb 14, 2026

ci(fro-bot): change cancel-in-progress to false

67d4ef4

fro-bot added agent: working and removed agent: working labels Feb 14, 2026

marcusrbrown merged commit 5a81409 into main Feb 14, 2026
11 checks passed

marcusrbrown deleted the fix/integration-tests branch February 14, 2026 07:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: wrap command templates and stabilize integration tests#78

fix: wrap command templates and stabilize integration tests#78
marcusrbrown merged 4 commits into
mainfrom
fix/integration-tests

marcusrbrown commented Feb 14, 2026

Uh oh!

fro-bot commented Feb 14, 2026

Uh oh!

fro-bot commented Feb 14, 2026

Uh oh!

marcusrbrown commented Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

marcusrbrown commented Feb 14, 2026

Uh oh!

fro-bot commented Feb 14, 2026

Code Review

Summary

Issues Found

1. Failing Test - False Positive Detection Logic Bug ⚠️

2. Command Template Wrapping ✓

3. Test Stabilization ✓

Verification

Recommendation

Uh oh!

fro-bot commented Feb 14, 2026

Code Review

Summary

Issues Found

1. Failing Test - False Positive Detection Logic Bug ⚠️

2. Command Template Wrapping ✓

3. Test Stabilization ✓

4. Workflow Concurrency Fix ✓

Verification

Recommendation

Uh oh!

marcusrbrown commented Feb 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants