Tool Categories — Flexible Tool Matching by Intent for AI Agent Tests

Problem: Different AI agents use different tool names for the same action. Your test expects read_file, but the agent uses bash cat. The test fails even though the behavior is correct.

Solution: EvalView's tool categories let you test by intent instead of exact tool name. Define file_read and it matches read_file, bash cat, text_editor, and more.

Before (Brittle)

expected:
  tools:
    - read_file      # Fails if agent uses bash, text_editor, etc.

After (Flexible)

expected:
  categories:
    - file_read      # Passes for read_file, bash cat, text_editor, etc.

Built-in Categories

Category	Matches
`file_read`	read_file, bash, text_editor, cat, view, str_replace_editor
`file_write`	write_file, bash, text_editor, edit_file, create_file
`file_list`	list_directory, bash, ls, find, directory_tree
`search`	grep, ripgrep, bash, search_files, code_search
`shell`	bash, shell, terminal, execute, run_command
`web`	web_search, browse, fetch_url, http_request, curl
`git`	git, bash, git_commit, git_push, github
`python`	python, bash, python_repl, execute_python, jupyter

Custom Categories

Add project-specific categories in config.yaml:

# .evalview/config.yaml
tool_categories:
  database:
    - postgres_query
    - mysql_execute
    - sql_run
  my_custom_api:
    - internal_api_call
    - legacy_endpoint

Why This Matters

Different agents use different tools for the same task. Categories let you test behavior, not implementation.

For example, all of these accomplish "read a file":

Claude Code: read_file
OpenAI: bash with cat
Custom agent: text_editor

With categories, your test passes for all of them:

expected:
  categories:
    - file_read  # All three approaches pass

Combining Tools and Categories

You can mix exact tool names and categories:

expected:
  tools:
    - my_specific_tool    # Must use this exact tool
  categories:
    - file_read           # Plus any file reading approach

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tool Categories — Flexible Tool Matching by Intent for AI Agent Tests

Before (Brittle)

After (Flexible)

Built-in Categories

Custom Categories

Why This Matters

Combining Tools and Categories

Related Documentation

FilesExpand file tree

TOOL_CATEGORIES.md

Latest commit

History

TOOL_CATEGORIES.md

File metadata and controls

Tool Categories — Flexible Tool Matching by Intent for AI Agent Tests

Before (Brittle)

After (Flexible)

Built-in Categories

Custom Categories

Why This Matters

Combining Tools and Categories

Related Documentation