Skip to content

Commit 674c994

Browse files
LEANDERANTONYclaude
andcommitted
feat(resume-builder): Slice 1F — web_search via function-wrapped OpenAI built-in
User's original ask included "you have all the capabilities to access urls if provided or browse web yourself" — Slice 1A delivered the GitHub-URL path (fetch_github_readme), Slice 1F delivers the general-web path. ARCHITECTURE DECISION (the non-obvious part) First attempt: add OpenAI's built-in {"type": "web_search"} directly to RESUME_BUILDER_TOOL_SPECS. Probed in isolation, works fine. But running it in the actual intake loop blew up every call with: 400 - "Web Search cannot be used with JSON mode." Our intake contract REQUIRES text.format = json_object (structured envelope with draft_updates / assistant_message / status / etc.). OpenAI rejects the combination at the API boundary. Adding web_search silently 400'd every intake turn and the service fell back to the regex step-machine — exactly the silent-fallback pattern Slice 1D pact-tests were built to catch (the agentic eval immediately surfaced the regression: 3/10 passing). Considered alternatives: - Two-call pattern: intake (JSON) decides if search needed → second call (no JSON) executes. Doubles latency, search results only visible to the agent on the NEXT turn. Worse UX than not searching. - External provider (Tavily / Brave / Exa) as a local function tool: clean architecture but adds a new external dependency + API key + cost commitment we shouldn't make without operator approval. - Function-wrap (this slice): expose web_search as a FUNCTION tool to the agent. When the agent calls it, our dispatcher fires its own inner responses.create — WITHOUT json_object format, WITH OpenAI's built-in {"type": "web_search"} — and returns the synthesized text as the function_call_output. Main loop stays JSON-mode safe; agent gets a research capability on-demand. Zero new dependencies, no new API key, same shape as fetch_github_readme. The function wrap is the cleanest landing. WHAT LANDED backend/services/resume_builder_tools.py - _web_search(query, *, openai_service) — fires the inner non-JSON call with built-in web_search tool, extracts synthesized text, caps at 8KB, returns {"ok": True, "result": str} or {"ok": False, "error": str} - WEB_SEARCH_TOOL_SPEC — function-tool shape with required `query` arg - _TOOLS_REQUIRING_OPENAI sentinel set so the dispatcher knows which tools need the OpenAIService forwarded (only web_search; fetch is HTTP-only) - execute_tool() gains an optional openai_service kwarg, forwards to tools that need it backend/services/resume_builder_service.py - _run_llm_turn now binds openai_service into the tool_executor closure so the agent can dispatch web_search through the loop prompts/resume_builder/v1.json + test_prompts.py byte-mirror - "Tools you can call" block now lists web_search alongside fetch_github_readme - Prompt teaches the agent WHEN to use it (external context, company / role norms, industry questions) and WHEN NOT TO (anything the user already shared, generic advice, small talk, speculative queries). Cites the "use sparingly" rule explicitly because each search is a separate API call (latency + cost). tests/backend/test_resume_builder_tools.py - Updated test_tool_spec_includes_web_search to assert the function- tool shape (NOT the server-side shape) - 5 new hermetic tests for _web_search via a stubbed OpenAI client: success path, empty-query reject, no-service reject, dispatch exception captured (never raised), oversize-result truncation, and a guard that fetch_github_readme does NOT receive openai_service (would crash if it did) tests/quality/resume_builder_agentic_runner.py - 2 new LLM scenarios: web_search_fires_on_external_context_question (positive — user asks "what does Anthropic look for on a Senior MLE resume?") web_search_skipped_for_user_provided_info (negative — user is sharing their own background, no search should fire) VERIFICATION - 145 hermetic tests across affected suites green. - 10/10 LLM scenarios pass on the live API (gpt-5.4). Inspection of the web_search scenario shows the model fires the tool ONCE, receives a grounded answer ("Anthropic's Senior MLE postings tend to emphasize strong Python + ML + software engineering, production ML systems, and measurable impact like scale, latency, reliability, or cost improvements..."), and synthesizes a tailored reply with no hallucination. Cost-of-search: each web_search invocation adds one extra responses.create call (gpt-5.4-mini, ~600 tokens). Realistic usage per session: 0-2 invocations (prompt explicitly tells the agent to use this sparingly). Latency: ~1-2s per search. WHAT'S STILL PARKED (Phase 2 remainder) - Full eval expansion to 15-20 fixtures with rubric scoring - ADR-031 documenting the agentic shape Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent a2699d8 commit 674c994

6 files changed

Lines changed: 462 additions & 4 deletions

File tree

backend/services/resume_builder_service.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2034,7 +2034,16 @@ def _run_llm_turn(
20342034
prompt["system"],
20352035
prompt["user"],
20362036
tools=RESUME_BUILDER_TOOL_SPECS,
2037-
tool_executor=execute_resume_builder_tool,
2037+
# web_search needs an OpenAI client (fires its own
2038+
# responses.create with the built-in search tool from
2039+
# outside JSON mode), so we bind the service into the
2040+
# executor closure. fetch_github_readme ignores the
2041+
# openai_service kwarg and runs HTTP-only.
2042+
tool_executor=lambda call_name, call_args_json: execute_resume_builder_tool(
2043+
call_name,
2044+
call_args_json,
2045+
openai_service=openai_service,
2046+
),
20382047
expected_keys=prompt["expected_keys"],
20392048
# Twelve iterations: covers the realistic worst case of a
20402049
# user pasting ~10 GitHub URLs in one turn and the model

backend/services/resume_builder_tools.py

Lines changed: 205 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -279,26 +279,225 @@ def fetch_github_readme(url: str) -> dict:
279279
}
280280

281281

282+
# ---------------------------------------------------------------------------
283+
# web_search (FUNCTION-wrapped, dispatches to OpenAI's built-in web_search)
284+
# ---------------------------------------------------------------------------
285+
#
286+
# Why this is wrapped as a function tool instead of being a raw
287+
# {"type": "web_search"} server-side tool: OpenAI's API rejects
288+
# combining ``web_search`` with ``text.format = json_object`` —
289+
# the 400 reads "Web Search cannot be used with JSON mode." Our
290+
# main intake loop NEEDS json_object (we return a structured envelope
291+
# with draft_updates / assistant_message / status / etc.), so we
292+
# can't put web_search on the same call.
293+
#
294+
# The wrap: the agent calls ``web_search(query)`` like any other
295+
# function tool. The dispatcher makes a SEPARATE internal
296+
# responses.create call — no json_object format, with OpenAI's
297+
# built-in web_search enabled — and returns the synthesized text as
298+
# the function_call_output. Main loop stays JSON-mode safe; the
299+
# agent gets a research capability on-demand.
300+
#
301+
# Cost shape: each ``web_search`` invocation is ONE extra
302+
# responses.create call (so two API calls per search round-trip:
303+
# the model decides to search, the dispatcher fires the search,
304+
# the model uses the result on the next iteration). The prompt
305+
# tells the agent to use this SPARINGLY, so realistic usage is
306+
# 0–2 web_search calls per session.
307+
#
308+
# Search execution itself runs on OpenAI infrastructure (US-hosted).
309+
# Our EU-data-residency posture (docs/competitive-landscape.md)
310+
# applies to user data we store, not to outbound research queries
311+
# — so this is acceptable for v1. Revisit when ADR-028 D1 multi-
312+
# provider eval lands and a concrete EU alternative exists.
313+
WEB_SEARCH_TIMEOUT_SECONDS = 30.0
314+
# Hard cap on the synthesized result string — searches can return
315+
# long answers, and we feed the result back into the main loop's
316+
# input list (which we want to keep manageable). 8KB ≈ ~2k tokens,
317+
# enough for a research summary, small enough not to bloat history.
318+
WEB_SEARCH_MAX_RESULT_CHARS = 8 * 1024
319+
320+
321+
def _web_search(
322+
query: str,
323+
*,
324+
openai_service: Any | None = None,
325+
) -> dict:
326+
"""Run a web_search via OpenAI's built-in tool and return the
327+
synthesized text result.
328+
329+
Routed through a fresh ``responses.create`` call WITHOUT json
330+
mode so OpenAI's strict-mode check doesn't reject the combination.
331+
The model in this inner call is small + cheap (the search
332+
synthesis doesn't need the main intake model's reasoning depth);
333+
we hard-code ``gpt-5.4-mini`` to keep latency + cost predictable.
334+
335+
Returns ``{"ok": True, "result": str, "query": str}`` on success
336+
or ``{"ok": False, "error": str, "message": str}`` on failure.
337+
Like ``fetch_github_readme``, errors are first-class outputs —
338+
never raised across the tool boundary.
339+
"""
340+
cleaned_query = str(query or "").strip()
341+
if not cleaned_query:
342+
return {
343+
"ok": False,
344+
"error": "empty_query",
345+
"message": "web_search needs a non-empty query.",
346+
}
347+
if openai_service is None or not getattr(openai_service, "is_available", lambda: False)():
348+
return {
349+
"ok": False,
350+
"error": "no_openai_service",
351+
"message": (
352+
"Web search requires an OpenAI client. Ask the user "
353+
"to describe the external context directly."
354+
),
355+
}
356+
client = getattr(openai_service, "_client", None)
357+
if client is None:
358+
return {
359+
"ok": False,
360+
"error": "no_openai_client",
361+
"message": "OpenAI client is not initialized.",
362+
}
363+
364+
try:
365+
response = client.responses.create(
366+
model="gpt-5.4-mini",
367+
instructions=(
368+
"You are a research assistant. Use the web_search tool to "
369+
"answer the user's question with concrete sources. Cite the "
370+
"source domain inline. Keep the answer short — 4 sentences "
371+
"max. If the search returns nothing useful, say so plainly "
372+
"instead of inventing."
373+
),
374+
input=cleaned_query,
375+
store=False,
376+
max_output_tokens=600,
377+
tools=[{"type": "web_search"}],
378+
tool_choice="auto",
379+
)
380+
except Exception as exc:
381+
LOGGER.exception("web_search dispatch failed.")
382+
return {
383+
"ok": False,
384+
"error": "search_dispatch_failed",
385+
"message": f"{type(exc).__name__}: {exc}"[:240],
386+
}
387+
388+
# Extract the synthesized text. The inner response can carry both
389+
# ``web_search_call`` items (OpenAI's record of the search) and
390+
# ``message`` items containing the answer text. We only need the
391+
# text — the search-call items are an implementation detail.
392+
collected: list[str] = []
393+
for item in getattr(response, "output", None) or []:
394+
item_type = getattr(item, "type", None)
395+
if isinstance(item, dict):
396+
item_type = item.get("type")
397+
if item_type != "message":
398+
continue
399+
content = (
400+
item.content
401+
if not isinstance(item, dict)
402+
else item.get("content")
403+
) or []
404+
for part in content:
405+
part_type = getattr(part, "type", None)
406+
if isinstance(part, dict):
407+
part_type = part.get("type")
408+
if part_type == "output_text":
409+
text = (
410+
part.text
411+
if not isinstance(part, dict)
412+
else part.get("text")
413+
)
414+
if text:
415+
collected.append(str(text))
416+
result_text = "\n".join(collected).strip()
417+
if not result_text:
418+
return {
419+
"ok": False,
420+
"error": "empty_result",
421+
"message": "Search returned no synthesized text.",
422+
}
423+
if len(result_text) > WEB_SEARCH_MAX_RESULT_CHARS:
424+
result_text = (
425+
result_text[: WEB_SEARCH_MAX_RESULT_CHARS].rstrip()
426+
+ "…[truncated]"
427+
)
428+
return {"ok": True, "query": cleaned_query, "result": result_text}
429+
430+
431+
WEB_SEARCH_TOOL_SPEC: dict[str, Any] = {
432+
"type": "function",
433+
"name": "web_search",
434+
"description": (
435+
"Search the web (via OpenAI's built-in search) for EXTERNAL CONTEXT "
436+
"the user is asking about — company profile, role expectations, "
437+
"industry norms, what a specific employer typically looks for on a "
438+
"resume. Returns a short synthesized answer with source attribution. "
439+
"Use SPARINGLY: don't search for info the user already shared, don't "
440+
"search for generic resume advice, don't search for speculative "
441+
"questions like 'what salary will I get'. The query should be "
442+
"specific and short — one focused question, not a paragraph."
443+
),
444+
"parameters": {
445+
"type": "object",
446+
"additionalProperties": False,
447+
"properties": {
448+
"query": {
449+
"type": "string",
450+
"description": (
451+
"A specific, focused search query — e.g. 'Senior MLE "
452+
"role expectations at Anthropic 2026' or 'standard "
453+
"sections on a fintech compliance resume'."
454+
),
455+
},
456+
},
457+
"required": ["query"],
458+
},
459+
}
460+
461+
282462
RESUME_BUILDER_TOOL_SPECS: list[dict[str, Any]] = [
283463
FETCH_GITHUB_README_TOOL_SPEC,
464+
WEB_SEARCH_TOOL_SPEC,
284465
]
285466

286467

287468
# Internal registry — maps the Responses-API ``name`` field on a
288469
# function_call item to the Python callable that implements it.
289470
_TOOL_IMPLEMENTATIONS: dict[str, Callable[..., dict]] = {
290471
"fetch_github_readme": fetch_github_readme,
472+
"web_search": _web_search,
291473
}
292474

293475

294-
def execute_tool(name: str, arguments_json: str) -> str:
476+
# Tools that need the ``OpenAIService`` injected into their call (set
477+
# by name to avoid signature introspection at hot-path execute time).
478+
# ``_web_search`` needs it to fire its inner ``responses.create`` for
479+
# the search; ``fetch_github_readme`` is HTTP-only and doesn't.
480+
_TOOLS_REQUIRING_OPENAI: frozenset[str] = frozenset({"web_search"})
481+
482+
483+
def execute_tool(
484+
name: str,
485+
arguments_json: str,
486+
*,
487+
openai_service: Any | None = None,
488+
) -> str:
295489
"""Dispatch a Responses-API function_call to the matching tool.
296490
297491
Inputs:
298492
name: The function name the LLM asked for.
299493
arguments_json: Raw JSON string of arguments. The Responses API
300494
always passes arguments as a JSON-encoded string; we parse it
301495
here so each tool can sign its own typed signature.
496+
openai_service: Optional OpenAIService instance. Forwarded only
497+
to tools that declare in ``_TOOLS_REQUIRING_OPENAI`` that they
498+
need it (e.g. ``web_search``, which makes its own internal
499+
``responses.create`` call to use OpenAI's built-in search
500+
tool from outside JSON mode).
302501
303502
Returns a string. The agentic-loop driver attaches this string to a
304503
``function_call_output`` item, which is fed back to the model on
@@ -333,6 +532,11 @@ def execute_tool(name: str, arguments_json: str) -> str:
333532
"message": "Arguments must be a JSON object.",
334533
}
335534
)
535+
if name in _TOOLS_REQUIRING_OPENAI:
536+
# Inject as a kwarg so the tool's signature can declare it
537+
# explicitly (and so a model-emitted ``openai_service`` arg
538+
# can't slip in via ``args`` — we always overwrite).
539+
args["openai_service"] = openai_service
336540
try:
337541
result = impl(**args)
338542
except TypeError as exc:

0 commit comments

Comments
 (0)