Skip to content

feat: add evals for fork-specific features (v0.1.2)#79

Merged
kanfil merged 2 commits intomainfrom
evals-refactor-v0.1.2
Mar 17, 2026
Merged

feat: add evals for fork-specific features (v0.1.2)#79
kanfil merged 2 commits intomainfrom
evals-refactor-v0.1.2

Conversation

@kanfil
Copy link
Copy Markdown
Member

@kanfil kanfil commented Mar 14, 2026

Summary

Add test coverage for agentic-sdlc preset functionality that was missing from the upstream evals framework.

Changes

New Tests (7 total)

  • Mission Brief suite (4 tests): Tests the new Mission Brief enforcement in /adlc.spec.specify

    • Completeness: Goal, Success Criteria, Constraints, Demo Sentence present
    • Quality: Goal is concise, criteria are measurable, demo is observable
    • Constraint extraction: technical, business, regulatory constraints captured
    • Approval flow: "Proceed with this Mission Brief?" prompt present
  • Fork spec sections (3 tests): Tests fork-specific spec template sections

    • Goal, Demo Sentence, Boundary Map present
    • Boundary Map has Produces/Consumes structure
    • Constraints are extracted and documented

New Graders

  • check_mission_brief_completeness() - validates Mission Brief has all required elements
  • check_mission_brief_quality() - validates quality of each Mission Brief element
  • check_fork_spec_sections() - validates fork-specific spec sections

Fixes

  • check_extension_manifest() now accepts both speckit.* and adlc.* command patterns

Files Changed

  • evals/prompts/spec-prompt.txt - added fork sections (Goal, Demo Sentence, Boundary Map)
  • evals/prompts/mission-brief-prompt.txt - new prompt for Mission Brief tests
  • evals/configs/promptfooconfig-mission-brief.js - new config for Mission Brief suite
  • evals/configs/promptfooconfig.js - added 4 Mission Brief tests
  • evals/configs/promptfooconfig-spec.js - added 3 fork section tests
  • evals/graders/custom_graders.py - added 3 new graders, fixed command pattern regex
  • evals/README.md - updated test counts and documentation
  • pyproject.toml - version bump to 0.1.2
  • CHANGELOG.md - added v0.1.2 entry

Test Results

  • All 356 pytest tests pass
  • Test counts: 29 LLM tests + 39 unit tests across 7 suites (was 22+39 / 6)

Related

  • Follows v0.1.1 which added Mission Brief enforcement to /adlc.spec.specify
  • Part of the fork-specific feature testing initiative

Add test coverage for agentic-sdlc preset functionality:

- Mission Brief test suite: 4 tests for completeness, quality,
  constraint extraction, approval flow
- Fork spec section tests: 3 tests for Goal, Demo Sentence,
  Boundary Map, Constraints
- New graders: check_mission_brief_completeness,
  check_mission_brief_quality, check_fork_spec_sections
- spec-prompt.txt updated with fork-specific sections
- Command pattern grader now accepts both speckit.* and adlc.*

Evals: 29 LLM tests + 39 unit tests across 7 suites (was 22+39 / 6)
@kfinkels
Copy link
Copy Markdown
Collaborator

LGTM, but you have conflicts that need to be addressed

@kanfil
Copy link
Copy Markdown
Member Author

kanfil commented Mar 16, 2026

@kfinkels merge it, but we do need to do src evals as well

@kanfil kanfil merged commit 3959c83 into main Mar 17, 2026
8 checks passed
kanfil added a commit that referenced this pull request Mar 18, 2026
Merge Strategy:
- Reset to pre-merge commit 2f0852a (last clean tikalk state)
- Re-applied PR #79 evals-refactor changes
- Merged upstream/main with careful conflict resolution

Tikalk-specific code preserved:
- Config management (get_global_config_path, load_config, save_config, etc.)
- Architecture config (get_architecture_diagram_format, get_adr_heuristic, etc.)
- Skills config (get_skills_config, set_skills_config)
- Skill subcommand app with all skill commands (search, install, update, etc.)
- show_skills_banner function
- Orange theme colors (ACCENT_COLOR, BANNER_COLORS)
- Agentic SDLC branding (TAGLINE, show_banner extensions display)
- _run_git_command and sync_team_ai_directives functions
- install_bundled_extensions and install_bundled_presets
- _ensure_commands_for_agent function
- _validate_ai_assistant and _validate_ai_commands_dir callbacks

Upstream features merged:
- New agents: kimi, trae, pi, bob, vibe, tabnine, etc.
- Updated preset system with new PresetCatalog features
- Test improvements and new test files
- Extension system enhancements
- Various bug fixes and improvements

Conflict Resolution:
- Kept tikalk versions: README, pyproject.toml, docs, bash scripts
- Kept upstream + tikalk additions: src/specify_cli/__init__.py
- Kept upstream versions: agents.py, presets.py, tests

All 444 tests pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants