Skip to content

Commit bd43191

Browse files
committed
chore: add root pickled.yml for agent-legibility checks
Pickled (https://docs.pickled.dev) runs scripted scenarios across a matrix of interfaces, sources, and toolsets and scores answers with deterministic checks. This config covers one scenario today (the custom React toolbar question) across two interfaces (Claude Code haiku, OpenAI Responses) and four context-delivery paths (none, web, the official SuperDoc Mintlify docs MCP, Context7 MCP). 16 cells per run. Sits alongside evals/ rather than under it: evals/ is the Promptfoo suite that scores the SuperDoc tool surface; this is the outside-in view of how agents talk about SuperDoc when asked to build with it. Run with: bunx @pickled-dev/cli check .
1 parent 177596b commit bd43191

1 file changed

Lines changed: 91 additions & 0 deletions

File tree

pickled.yml

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# 🥒 pickled.yml - measure what agents understand about SuperDoc
2+
#
3+
# Pickled runs the scenarios below across a matrix of interfaces,
4+
# sources, and toolsets, then scores answers with deterministic checks.
5+
# Each cell isolates one context-delivery path (docs link / web tools /
6+
# MCP server) so the report shows where agents do well and where they
7+
# do not.
8+
#
9+
# Quick start:
10+
# bunx @pickled-dev/cli check .
11+
#
12+
# Docs: https://docs.pickled.dev
13+
14+
tool:
15+
name: superdoc
16+
description: "Document engine for the modern web (.docx-native editor + SDK + MCP)"
17+
18+
docs:
19+
sources:
20+
# Official SuperDoc docs bundle. Injected only in cells where
21+
# `source: superdoc_docs` is selected; the MCP and web cells use
22+
# `source: none` so the toolset is the only delivery path.
23+
superdoc_docs: https://docs.superdoc.dev/llms-full.txt
24+
25+
targets:
26+
# Claude Code via the Agent SDK. Cheap, fast, matches how most
27+
# external users first try SuperDoc inside their IDE.
28+
quick:
29+
category: cli
30+
provider: claude-code
31+
model: claude-haiku-4-5
32+
maxTurns: 10
33+
34+
# OpenAI Responses API. The other interface that today supports both
35+
# `web` and `mcp` toolsets, so the matrix can cover the same context
36+
# modes across two providers.
37+
openai_api:
38+
category: api
39+
provider: openai
40+
model: gpt-5.2
41+
temperature: 0
42+
maxTokens: 4096
43+
44+
toolsets:
45+
none: {}
46+
47+
# Each interface's built-in web tools. On Claude Code this scopes to
48+
# WebSearch + WebFetch; on OpenAI it uses the server-side web_search.
49+
web:
50+
webSearch: true
51+
webFetch: true
52+
53+
# SuperDoc's official Mintlify docs MCP server. Public HTTP endpoint,
54+
# no auth. Exposes search_super_doc + query_docs_filesystem_super_doc
55+
# so the agent can search docs and read pages as files.
56+
superdoc_mintlify_mcp:
57+
mcpServers:
58+
superdoc:
59+
type: http
60+
url: https://docs.superdoc.dev/mcp
61+
62+
# Third-party Context7 index. Requires CONTEXT7_API_KEY in the env.
63+
# Kept as a comparison surface alongside the official Mintlify server.
64+
context7_mcp:
65+
mcpServers:
66+
context7:
67+
type: http
68+
url: https://mcp.context7.com/mcp
69+
headers:
70+
CONTEXT7_API_KEY: ${CONTEXT7_API_KEY}
71+
72+
scenarios:
73+
# Custom React toolbar. The correct answer names SuperDocUIProvider
74+
# and useSuperDocUI from the superdoc/ui/react surface. The wrong
75+
# answers name the legacy headless toolbar (createHeadlessToolbar)
76+
# or reach for activeEditor.commands.
77+
- name: "Custom React toolbar surface"
78+
prompt: "I am building with SuperDoc in React and want to add a custom toolbar. Which SuperDoc surface should I use, what should I import, and what should I avoid?"
79+
matrix:
80+
interfaces: [quick, openai_api]
81+
sources: [none, superdoc_docs]
82+
toolsets: [none, web, superdoc_mintlify_mcp, context7_mcp]
83+
expected:
84+
symbols:
85+
- "SuperDocUIProvider"
86+
- "useSuperDocUI"
87+
paths:
88+
- "superdoc/ui/react"
89+
excludes:
90+
- "createHeadlessToolbar"
91+
- "activeEditor.commands"

0 commit comments

Comments
 (0)