@@ -83,6 +83,8 @@ p + dl.props { margin-top: -0.5em; }
8383<pre class="link-defaults">
8484spec:html; type:dfn;
8585 text:form-associated element
86+ text:browsing context group set
87+ text:unique internal value
8688</pre>
8789
8890<h2 id="intro">Introduction</h2>
@@ -439,6 +441,100 @@ The <dfn>synthesize a declarative JSON Schema object algorithm</dfn>, given a <{
439441}
440442</pre>
441443
444+ <h2 id="interaction-with-agents">Interaction with agents</h2>
445+
446+ <h3 id="event-loop">Event loop integration</h3>
447+
448+ A web site's functionality is exposed to [=agents=] as tools that live in a [=Document=] 's [=event
449+ loop=] , that get registered with the APIs in this specification.
450+
451+ The [=user agent=] 's [=browser agent=] runs [=in parallel=] to any [=event loops=] associated
452+ with a {{ModelContext}} [=relevant global object=] . Steps running on the [=browser agent=] get
453+ queued on its <dfn>AI agent queue</dfn> , which is the result of [=starting a new parallel queue=] .
454+
455+ Conversely, steps queued *from* the [=browser agent=] onto the [=event loop=] of a given
456+ {{ModelContext}} object (i.e., the "main thread" where JavaScript runs) are queued on its [=relevant
457+ global object=] 's <dfn noexport>tool calling task source</dfn> .
458+
459+ <h3 id="observations">Page observations</h3>
460+
461+ <em> This section is non-normative. It contains an example of infrastructure that a [=user agent=] might
462+ employ to expose a tab's tools to a [=browser agent=] , and illustrates how that infrastructure
463+ interacts with the web platform, for the purposes of implementer guidance.</em>
464+
465+ <hr>
466+
467+ In-page [=agents=] implemented in JavaScript can "observe" the tools that a page offers by using the
468+ {{ModelContext}} APIs directly, and any other platform APIs to obtain necessary context about the
469+ page in order to actuate it appropriately.
470+
471+ The [=browser agent=] , on the other hand, does not run JavaScript on the page. Instead, it obtains a
472+ view of the page's tools and any other relevant context by getting an [=observation=] . An
473+ <dfn>observation</dfn> is an [=implementation-defined=] data structure containing at least a <dfn
474+ for=observation> tool map</dfn> , which is a [=map=] whose [=map/keys=] are [=Document/unique ID=] s,
475+ and whose [=map/values=] are [=lists=] of [=tool definition=] [=structs=] .
476+
477+ Note: An [=observation=] is usually a "snapshot" distillation of a page being presented to the user,
478+ along with any other state the [=user agent=] believes is relevant for the [=browser agent=] ; this
479+ often includes screenshots of the page, not just a DOM serialization. See [Annotated Page Content
480+ (APC)](https://chromium.googlesource.com/chromium/src.git/+/main/third_party/blink/renderer/modules/content_extraction/readme.md)
481+ in the Chromium project for an example of what might contribute to an observation.
482+
483+ <hr>
484+
485+ <div algorithm>
486+ To <dfn>perform an observation</dfn> given a [=top-level traversable=] |traversable|, run these
487+ steps:
488+
489+ 1. [=Assert=] : This algorithm is running in the [=browser agent=] 's [=AI agent queue=] .
490+
491+ 1. [=Assert=] : |traversable|'s [=navigable/active document=] is not [=Document/fully active=] .
492+
493+ 1. Let |observation| be a new [=observation=] .
494+
495+ 1. Let |flat descendants| be the [=Document/inclusive descendant navigables=] of |traversable|'s
496+ [=navigable/active document=] .
497+
498+ 1. [=list/For each=] [=navigable=] |descendant| of |flat descendants|:
499+
500+ 1. Let |document| be |descendant|'s [=navigable/active document=]' s.
501+
502+ 1. Let |id| be |document|'s [=Document/unique ID=] .
503+
504+ 1. Set |observation|'s [=observation/tool map=][|id|] = |document|' s [=relevant global
505+ object=] 's {{Navigator}}' s [=Navigator/modelContext=] 's [=ModelContext/internal context=]' s
506+ [=model context/tool map=] 's [=map/values=] , which are [=tool definitions=] .
507+
508+ 1. Perform any [=implementation-defined=] steps to add anything to |observation| that the [=user
509+ agent=] might deem useful or necessary, besides just populating the [=observation/tool map=] .
510+ This might include annotated screenshots of the page, parts of the accessibility tree, etc.
511+
512+ 1. Perform any [=implementation-defined=] steps with |observation| and the [=browser agent=] , to
513+ expose the |observation|'s [=observation/tool map=] to the [=browser agent=] in whatever way it
514+ accepts.
515+
516+ Note: Despite the name of this API (i., Web*MCP*), this specification does not prescribe the
517+ format in which tools are exposed to the [=browser agent=] . Browsers are free to distill and
518+ expose tools via Model Context Protocol, other proprietary "function calling" methods, or any
519+ other way it deems appropriate.
520+
521+ Advisement: Implementations are expected to convey to the [=browser agent=] any relevant
522+ security information associated with [=tool definitions=] , such as the originating [=origin=] ,
523+ among other things, so that the backing model has an idea of the different parties at play, and
524+ can most safely carry out the end user's intent.
525+
526+ </div>
527+
528+ Each {{Document}} object has a <dfn for=Document>unique ID</dfn> , which is a [=unique internal
529+ value=] .
530+
531+ The times at which a [=browser agent=] [=performs an observation=] are [=implementation-defined=] .
532+ A [=browser agent=] may [=parallel queue/enqueue steps=] to the [=AI agent queue=] to [=perform an
533+ observation=] given any [=top-level browsing context=] in the [=user agent=] [=browsing context
534+ group set=] , at any time, although implementations typically reserve this operation for when the
535+ user is interacting with a [=browser agent=] while web content is in view.
536+
537+
442538<h2 id="security-privacy">Security and privacy considerations</h2>
443539
444540<!--
0 commit comments