Skip to content

Commit 1a4b043

Browse files
author
Tyler Payne
committed
--model is optional if no --test
Update README to reflect this new usage as well as some additional cleanup
1 parent 852702a commit 1a4b043

7 files changed

Lines changed: 1662 additions & 436 deletions

File tree

README.md

Lines changed: 146 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,35 @@
1-
# mcp-interviewer
2-
3-
The MCP Interviewer is a Python CLI tool that helps you ***catch MCP server issues before your agents do.***
4-
5-
It does this via the following features:
1+
<div align="center">
2+
3+
<h1>MCP Interviewer</h1>
4+
<div>
5+
<i>
6+
A Python CLI tool that helps you catch MCP server issues before your agents do.
7+
</i>
8+
</div>
9+
<a href="./README.md#installation">PyPi (coming soon!)</a> | <a href="./README.md">Blog</a> | <a href="./mcp-interview.md">Example</a>
10+
</div>
11+
12+
---
13+
14+
## Table of Contents
15+
16+
- [How it works](#how-it-works)
17+
- [🔎 Constraint checking](#-constraint-checking)
18+
- [🛠️ Functional testing](#️-functional-testing)
19+
- [🤖 LLM evaluation](#-llm-evaluation)
20+
- [📋 Report generation](#-report-generation)
21+
- [Quick Start](#quick-start)
22+
- [Installation](#installation)
23+
- [Example](#example)
24+
- [Usage](#usage)
25+
- [CLI](#cli)
26+
- [Bring Your Own Models](#bring-your-own-models)
27+
- [Python](#python)
28+
- [Limitations](#limitations)
29+
- [Contributing](#contributing)
30+
- [Trademarks](#trademarks)
31+
32+
## How it works
633

734
### 🔎 Constraint checking
835

@@ -28,16 +55,16 @@ Use `--constraints [CODE ...]` to customize output.
2855

2956
### 🛠️ Functional testing
3057

31-
MCP servers are intended to be used by LLM agents, so we can optionally test them with an LLM agent. When enabled with the `--test` flag, the interviewer uses your specified LLM to generate a test plan based on the MCP server's capabilities and then executes that plan (e.g. by calling tools), collecting statistics about observed tool behavior.
58+
MCP servers are intended to be used by LLM agents, so the interviewer can optionally test them with an LLM agent. When enabled with the `--test` flag, the interviewer uses your specified LLM to generate a test plan based on the MCP server's capabilities and then executes that plan (e.g. by calling tools), collecting statistics about observed tool behavior.
3259

33-
### 🧪 LLM evaluation
60+
### 🤖 LLM evaluation
3461

3562
***Note: this is an experimental feature. All LLM generated evaluations should be manually inspected for errors.***
3663

3764
The interviewer can also use your specified LLM to provide structured and natural language evaluations of the server's features.
3865

3966

40-
### 📋 Reports
67+
### 📋 Report generation
4168

4269
The interviewer generates a Markdown report (and accompanying `.json` file with raw data) summarizing the interview results.
4370

@@ -63,39 +90,62 @@ Use `--reports [CODE ...]` to customize output.
6390

6491
</details>
6592

93+
## Installation
94+
95+
### As a CLI tool
96+
97+
The easiest way to install `mcp-interviewer` is as a `uv` tool. Follow [these instructions](https://docs.astral.sh/uv/getting-started/installation/) to install uv.
98+
99+
```bash
100+
uv tool install --from "git+ssh://git@github.com/microsoft/mcp-interviewer.git" mcp-interviewer
101+
102+
# Then,
103+
mcp-interviewer ...
104+
```
105+
106+
Read more about [CLI usage](./README.md#cli).
107+
108+
### As a dependency
109+
110+
Via `uv`
111+
112+
```bash
113+
uv add git+ssh://git@github.com/microsoft/mcp-interviewer.git
114+
```
115+
116+
Via `pip`
117+
118+
```bash
119+
pip install git+ssh://git@github.com/microsoft/mcp-interviewer.git
120+
```
121+
122+
Read more about [Python usage](./README.md#python).
66123

67124
## Quick Start
68125

69126
⚠️ ***mcp-interviewer arbitrarily executes the provided MCP server command in a child process. Whenever possible, run your server in a container like in the examples below to isolate the server from your host system.***
70127

71-
🚨 ***mcp-interviewer actually invokes the server's tools, DO NOT use mcp-interviewer with admin privileges etc***
128+
First, [install](./README.md#as-a-cli-tool) `mcp-interviewer` as a CLI tool.
72129

73130
```bash
74131
# Command to run npx safely inside a Docker container
75132
NPX_CONTAINER="docker run -i --rm node:lts npx"
76133

77-
# Test any MCP server with one command
78-
uvx mcp-interviewer \
79-
--model gpt-4o \
134+
# Interview the MCP reference server
135+
mcp-interviewer \
80136
"$NPX_CONTAINER -y @modelcontextprotocol/server-everything"
81137
```
82138

83-
Generates `mcp-interview.md` and `mcp-interview.json` with a full evaluation report.
84-
85-
## Installation
86-
87-
```bash
88-
pip install git+ssh://git@github.com/microsoft/mcp-interviewer.git
89-
```
139+
Generates a report Markdown `mcp-interview.md` and corresponding JSON data `mcp-interview.json`.
90140

91141
## Example
92142

93-
To interview the MCP reference server, you can run the following command:
143+
To interview the MCP reference server with constraint checking and functional testing you can run the following command:
94144

95145
```bash
96146
NPX_CONTAINER="docker run -i --rm node:lts npx"
97147

98-
mcp-interviewer --model gpt-4o "$NPX_CONTAINER -y @modelcontextprotocol/server-everything"
148+
mcp-interviewer --test --model gpt-4.1 "$NPX_CONTAINER -y @modelcontextprotocol/server-everything"
99149
```
100150

101151
Which will generate a report like [this](./mcp-interview.md).
@@ -105,32 +155,47 @@ Which will generate a report like [this](./mcp-interview.md).
105155
### CLI
106156

107157
**Key Flags:**
108-
- `--test`: Enable functional testing (disabled by default for faster execution)
109-
- `--judge`: Enable experimental LLM evaluation of tools and tests
110-
- `--reports [CODE ...]`: Customize which report sections to include
158+
111159
- `--constraints [CODE ...]`: Customize which constraints to check
160+
- `--reports [CODE ...]`: Customize which report sections to include
161+
162+
163+
164+
- `--test`: Enable functional testing. 🚨 ***This option causes mcp-interviewer to invoke the server's tools. Be careful to limit the server's access to your host system, sensitive data, etc before using these options.***
165+
- `--judge-tools`: Enable experimental LLM evaluation of tools
166+
- `--judge-test`: Enable experimental LLM evaluation of functional tests (requires `--test`)
167+
- `--judge`: Enable all LLM evaluation (equivalent to `--judge-tools --judge-test`)
112168

113169
```bash
114170
# Docker command to run uvx inside a container
115171
UVX_CONTAINER="docker run -i --rm ghcr.io/astral-sh/uv:python3.12-alpine uvx"
116172

117-
# Basic constraint checking and server inspection (no functional testing)
118-
mcp-interviewer --model gpt-4o "$UVX_CONTAINER mcp-server-fetch"
173+
# Basic constraint checking, server inspection, and report generation (no --model needed)
174+
mcp-interviewer "$UVX_CONTAINER mcp-server-fetch"
175+
176+
# Add functional testing with --test (requires --model)
177+
mcp-interviewer --model gpt-4.1 --test "$UVX_CONTAINER mcp-server-fetch"
178+
179+
# Add LLM tool evaluation with --judge-tools (requires --model)
180+
mcp-interviewer --model gpt-4.1 --judge-tools "$UVX_CONTAINER mcp-server-fetch"
181+
182+
# Add LLM test evaluation with --judge-test (requires --model and --test)
183+
mcp-interviewer --model gpt-4.1 --test --judge-test "$UVX_CONTAINER mcp-server-fetch"
119184

120-
# Constraint checking with functional testing and default report generation
121-
mcp-interviewer --model gpt-4o --test "$UVX_CONTAINER mcp-server-fetch"
185+
# Add all LLM evaluation with --judge (requires --model and --test)
186+
mcp-interviewer --model gpt-4.1 --test --judge "$UVX_CONTAINER mcp-server-fetch"
122187

123-
# Constraint checking, functional testing, LLM evaluation, default report generation
124-
mcp-interviewer --model gpt-4o --test --judge "$UVX_CONTAINER mcp-server-fetch"
188+
# Customize report sections with --reports
189+
mcp-interviewer --model gpt-4.1 --test --reports SI TS FT CV "$UVX_CONTAINER mcp-server-fetch"
125190

126-
# Constraint checking with functional testing and custom report generation
127-
mcp-interviewer --model gpt-4o --test --reports SI TS FT CV "$UVX_CONTAINER mcp-server-fetch"
191+
# Customize constraint checking with --constraints
192+
mcp-interviewer --constraints OTC ONL "$UVX_CONTAINER mcp-server-fetch"
128193

129-
# Custom constraint checking with functional testing and report generation
130-
mcp-interviewer --model gpt-4o --test --constraints OTC ONL "$UVX_CONTAINER mcp-server-fetch"
194+
# Fail on constraint warnings for CI/CD pipelines
195+
mcp-interviewer --fail-on-warnings "$UVX_CONTAINER mcp-server-fetch"
131196

132197
# Test remote servers
133-
mcp-interviewer --model gpt-4o "https://my-mcp-server.com/sse"
198+
mcp-interviewer "https://my-mcp-server.com/sse"
134199
```
135200

136201
### Bring Your Own Models
@@ -141,14 +206,15 @@ The CLI provides two ways of customizing your model client:
141206

142207
1. `openai.OpenAI` keyword arguments
143208

144-
You can provide keyword arguments to the OpenAI client constructor via the "--client-kwargs" CLI option. For example, to connect to gpt-oss:20b running locally via Ollama:
209+
You can provide keyword arguments to the OpenAI client constructor via the "--client-kwargs" CLI option. For example, to connect to gpt-oss:20b running locally via Ollama for LLM features:
145210

146211
```bash
147212
mcp-interviewer \
148213
--client-kwargs \
149214
"base_url=http://localhost:11434/v1" \
150215
"api_key=ollama" \
151216
--model "gpt-oss:20b" \
217+
--test \
152218
"docker run -i --rm node:lts npx -y @modelcontextprotocol/server-everything"
153219
```
154220

@@ -166,13 +232,31 @@ The CLI provides two ways of customizing your model client:
166232
```bash
167233
mcp-interviewer \
168234
--client "my_client.azure_client" \
169-
--model "gpt-4o_2024-11-20" \
235+
--model "gpt-4.1_2024-11-20" \
236+
--test \
170237
"docker run -i --rm node:lts npx -y @modelcontextprotocol/server-everything"
171238
```
172239
173240
174241
### Python
175242
243+
**Basic usage (constraint checking and server inspection only):**
244+
245+
```python
246+
from mcp_interviewer import MCPInterviewer, StdioServerParameters
247+
248+
params = StdioServerParameters(
249+
command="docker",
250+
args=["run", "-i", "--rm", "node:lts", "npx", "-y", "@modelcontextprotocol/server-everything"]
251+
)
252+
253+
# No client or model needed for basic functionality
254+
interviewer = MCPInterviewer(None, None)
255+
interview = await interviewer.interview_server(params)
256+
```
257+
258+
**With LLM features (functional testing and evaluation):**
259+
176260
```python
177261
from openai import OpenAI
178262
from mcp_interviewer import MCPInterviewer, StdioServerParameters
@@ -184,13 +268,36 @@ params = StdioServerParameters(
184268
args=["run", "-i", "--rm", "node:lts", "npx", "-y", "@modelcontextprotocol/server-everything"]
185269
)
186270
187-
interviewer = MCPInterviewer(client, "gpt-4o", should_run_functional_test=True)
271+
interviewer = MCPInterviewer(client, "gpt-4.1", should_run_functional_test=True)
188272
interview = await interviewer.interview_server(params)
189273
```
190274
275+
**Using the main function directly (includes constraint checking and report generation):**
276+
277+
```python
278+
from mcp_interviewer import main, StdioServerParameters
279+
280+
params = StdioServerParameters(
281+
command="docker",
282+
args=["run", "-i", "--rm", "node:lts", "npx", "-y", "@modelcontextprotocol/server-everything"]
283+
)
284+
285+
# Basic usage - no client or model needed
286+
exit_code = main(None, None, params)
287+
288+
# With LLM features
289+
from openai import OpenAI
290+
client = OpenAI()
291+
exit_code = main(client, "gpt-4.1", params, should_run_functional_test=True)
292+
```
293+
191294
## Limitations
192295
193-
MCP Interviewer was developed for research and experimental purposes. Further testing and validation are needed before considering its application in commercial or real-world scenarios. The MCP Python SDK executes arbitrary commands on the host machine, so users should run server commands in isolated containers and use external security tools to validate MCP server safety before running MCP Interviewer. Additionally, MCP Servers may have malicious or misleading tool metadata that may cause inaccurate MCP Interviewer outputs. Users should manually examine MCP Interviewer outputs for signs of malicious manipulation.
296+
MCP Interviewer was developed for research and experimental purposes. Further testing and validation are needed before considering its application in commercial or real-world scenarios.
297+
298+
The MCP Python SDK executes arbitrary commands on the host machine, so users should run server commands in isolated containers and use external security tools to validate MCP server safety before running MCP Interviewer.
299+
300+
Additionally, MCP Servers may have malicious or misleading tool metadata that may cause inaccurate MCP Interviewer outputs. Users should manually examine MCP Interviewer outputs for signs of malicious manipulation.
194301
195302
See [TRANSPARENCY.md](./TRANSPARENCY.md) for more information.
196303

0 commit comments

Comments
 (0)