While these evals are now running we are seeing failures that need to be addressed:
$ mcpchecker summary ~/Downloads/mcpchecker-results/mcpchecker-openai-agent-kubernetes-test-out.json
=== Evaluation Summary ===
✓ clone-vm (assertions: 3/3)
✓ create-basic-vm (assertions: 3/3)
✓ create-ubuntu-vm (assertions: 3/3)
✓ create-vm-with-instancetype (assertions: 3/3)
✓ create-vm-with-size (assertions: 3/3)
✓ create-vm-with-vlan (assertions: 3/3)
✗ delete-vm
failed to setup task: setup[0] failed: script execution failed: exit status 1
output: namespace/vm-test created
virtualmachine.kubevirt.io/deleted-vm created
virtualmachine.kubevirt.io/deleted-vm condition met
error: timed out waiting for the condition on virtualmachines/deleted-vm
✗ pause-vm (assertions: 3/3)
verification failed: verify[0] failed: script execution failed: exit status 1
output: error: timed out waiting for the condition on virtualmachines/paused-vm
✗ restore-vm (assertions: 1/3)
verification failed: verify[0] failed: script execution failed: exit status 1
output: ERROR: VirtualMachineRestore 'test-restore' not found
- ToolsUsed: Required tool not called: server=kubernetes, tool=, pattern=.*
- MinToolCalls: Too few tool calls: expected >= 1, got 0
✗ snapshot-vm (assertions: 1/3)
verification failed: verify[0] failed: script execution failed: exit status 1
output: ERROR: VirtualMachineSnapshot 'test-snapshot' not found
- ToolsUsed: Required tool not called: server=kubernetes, tool=, pattern=.*
- MinToolCalls: Too few tool calls: expected >= 1, got 0
✗ troubleshoot-vm (assertions: 3/3)
verification failed: verify[0] failed: script execution failed: exit status 1
output: === Verification: Checking if agent fixed the VM ===
✓ VirtualMachine broken-vm exists
✗ Secret vm-cloud-init was not created - agent did not fix the issue
✗ update-vm-resources (assertions: 1/3)
verification failed: verify[0] failed: script execution failed: exit status 1
output: virtualmachine.kubevirt.io/test-vm-update condition met
VirtualMachine test-vm-update created successfully
✗ Expected 2 CPU cores, found: 1
- ToolsUsed: Required tool not called: server=kubernetes, tool=, pattern=.*
- MinToolCalls: Too few tool calls: expected >= 1, got 0
Tasks: 6/12 passed (50.00%)
Assertions: 27/33 passed (81.82%)
Agent used tokens:
Input: 48283 tokens
Output: 20761 tokens
/cc @ksimon1
/cc @codingben
/cc @Ruclo
While these evals are now running we are seeing failures that need to be addressed:
#835 (comment)