Commit e79b753
feat: Adding new set validator (#543)
Issue description:
Need a validator that checks if specific tools were called during test
execution without requiring a specific order, unlike the existing
ToolCallValidator which enforces sequential order. This enables more
vague prompts to be tested such as `My payments-latency SLO is
breaching. Can you investigate the root cause?` and test only whether a
specific tool(s) is being called or not. Introducing this for the new
tests being authored (will be part of a separate PR) for the new Change
Indicators tool introduced as part of
awslabs/mcp#1944
Description of changes:
Added ToolCallSetValidator class to mcp-testing/evals/core/validator.py:
- Validates that all expected tools are called, regardless of order
- Supports filtering out file-related tools via ignore_file_tools
parameter
- Returns detailed validation results including missing tools, extra
tools called, and the complete list of called tools
- Provides clear pass/fail criteria with reasoning for test results
Rollback procedure:
Yes, this commit can be safely reverted. It only adds a new validator
class without modifying existing functionality. No migration or cleanup
steps are required.
By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license.
---------
Co-authored-by: Omkar Masur <omasur@amazon.com>1 parent 067eb03 commit e79b753
1 file changed
Lines changed: 78 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
316 | 316 | | |
317 | 317 | | |
318 | 318 | | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
319 | 397 | | |
320 | 398 | | |
321 | 399 | | |
| |||
0 commit comments