Problem: Different AI agents use different tool names for the same action. Your test expects
read_file, but the agent usesbash cat. The test fails even though the behavior is correct.Solution: EvalView's tool categories let you test by intent instead of exact tool name. Define
file_readand it matchesread_file,bash cat,text_editor, and more.
expected:
tools:
- read_file # Fails if agent uses bash, text_editor, etc.expected:
categories:
- file_read # Passes for read_file, bash cat, text_editor, etc.| Category | Matches |
|---|---|
file_read |
read_file, bash, text_editor, cat, view, str_replace_editor |
file_write |
write_file, bash, text_editor, edit_file, create_file |
file_list |
list_directory, bash, ls, find, directory_tree |
search |
grep, ripgrep, bash, search_files, code_search |
shell |
bash, shell, terminal, execute, run_command |
web |
web_search, browse, fetch_url, http_request, curl |
git |
git, bash, git_commit, git_push, github |
python |
python, bash, python_repl, execute_python, jupyter |
Add project-specific categories in config.yaml:
# .evalview/config.yaml
tool_categories:
database:
- postgres_query
- mysql_execute
- sql_run
my_custom_api:
- internal_api_call
- legacy_endpointDifferent agents use different tools for the same task. Categories let you test behavior, not implementation.
For example, all of these accomplish "read a file":
- Claude Code:
read_file - OpenAI:
bashwithcat - Custom agent:
text_editor
With categories, your test passes for all of them:
expected:
categories:
- file_read # All three approaches passYou can mix exact tool names and categories:
expected:
tools:
- my_specific_tool # Must use this exact tool
categories:
- file_read # Plus any file reading approach