Skip to content

Commit e1ddba1

Browse files
authored
Add basic page observation infrastructure (#164)
1 parent a383af1 commit e1ddba1

1 file changed

Lines changed: 96 additions & 0 deletions

File tree

index.bs

Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,8 @@ p + dl.props { margin-top: -0.5em; }
8383
<pre class="link-defaults">
8484
spec:html; type:dfn;
8585
text:form-associated element
86+
text:browsing context group set
87+
text:unique internal value
8688
</pre>
8789

8890
<h2 id="intro">Introduction</h2>
@@ -439,6 +441,100 @@ The <dfn>synthesize a declarative JSON Schema object algorithm</dfn>, given a <{
439441
}
440442
</pre>
441443

444+
<h2 id="interaction-with-agents">Interaction with agents</h2>
445+
446+
<h3 id="event-loop">Event loop integration</h3>
447+
448+
A web site's functionality is exposed to [=agents=] as tools that live in a [=Document=]'s [=event
449+
loop=], that get registered with the APIs in this specification.
450+
451+
The [=user agent=]'s [=browser agent=] runs [=in parallel=] to any [=event loops=] associated
452+
with a {{ModelContext}} [=relevant global object=]. Steps running on the [=browser agent=] get
453+
queued on its <dfn>AI agent queue</dfn>, which is the result of [=starting a new parallel queue=].
454+
455+
Conversely, steps queued *from* the [=browser agent=] onto the [=event loop=] of a given
456+
{{ModelContext}} object (i.e., the "main thread" where JavaScript runs) are queued on its [=relevant
457+
global object=]'s <dfn noexport>tool calling task source</dfn>.
458+
459+
<h3 id="observations">Page observations</h3>
460+
461+
<em>This section is non-normative. It contains an example of infrastructure that a [=user agent=] might
462+
employ to expose a tab's tools to a [=browser agent=], and illustrates how that infrastructure
463+
interacts with the web platform, for the purposes of implementer guidance.</em>
464+
465+
<hr>
466+
467+
In-page [=agents=] implemented in JavaScript can "observe" the tools that a page offers by using the
468+
{{ModelContext}} APIs directly, and any other platform APIs to obtain necessary context about the
469+
page in order to actuate it appropriately.
470+
471+
The [=browser agent=], on the other hand, does not run JavaScript on the page. Instead, it obtains a
472+
view of the page's tools and any other relevant context by getting an [=observation=]. An
473+
<dfn>observation</dfn> is an [=implementation-defined=] data structure containing at least a <dfn
474+
for=observation>tool map</dfn>, which is a [=map=] whose [=map/keys=] are [=Document/unique ID=]s,
475+
and whose [=map/values=] are [=lists=] of [=tool definition=] [=structs=].
476+
477+
Note: An [=observation=] is usually a "snapshot" distillation of a page being presented to the user,
478+
along with any other state the [=user agent=] believes is relevant for the [=browser agent=]; this
479+
often includes screenshots of the page, not just a DOM serialization. See [Annotated Page Content
480+
(APC)](https://chromium.googlesource.com/chromium/src.git/+/main/third_party/blink/renderer/modules/content_extraction/readme.md)
481+
in the Chromium project for an example of what might contribute to an observation.
482+
483+
<hr>
484+
485+
<div algorithm>
486+
To <dfn>perform an observation</dfn> given a [=top-level traversable=] |traversable|, run these
487+
steps:
488+
489+
1. [=Assert=]: This algorithm is running in the [=browser agent=]'s [=AI agent queue=].
490+
491+
1. [=Assert=]: |traversable|'s [=navigable/active document=] is not [=Document/fully active=].
492+
493+
1. Let |observation| be a new [=observation=].
494+
495+
1. Let |flat descendants| be the [=Document/inclusive descendant navigables=] of |traversable|'s
496+
[=navigable/active document=].
497+
498+
1. [=list/For each=] [=navigable=] |descendant| of |flat descendants|:
499+
500+
1. Let |document| be |descendant|'s [=navigable/active document=]'s.
501+
502+
1. Let |id| be |document|'s [=Document/unique ID=].
503+
504+
1. Set |observation|'s [=observation/tool map=][|id|] = |document|'s [=relevant global
505+
object=]'s {{Navigator}}'s [=Navigator/modelContext=]'s [=ModelContext/internal context=]'s
506+
[=model context/tool map=]'s [=map/values=], which are [=tool definitions=].
507+
508+
1. Perform any [=implementation-defined=] steps to add anything to |observation| that the [=user
509+
agent=] might deem useful or necessary, besides just populating the [=observation/tool map=].
510+
This might include annotated screenshots of the page, parts of the accessibility tree, etc.
511+
512+
1. Perform any [=implementation-defined=] steps with |observation| and the [=browser agent=], to
513+
expose the |observation|'s [=observation/tool map=] to the [=browser agent=] in whatever way it
514+
accepts.
515+
516+
Note: Despite the name of this API (i., Web*MCP*), this specification does not prescribe the
517+
format in which tools are exposed to the [=browser agent=]. Browsers are free to distill and
518+
expose tools via Model Context Protocol, other proprietary "function calling" methods, or any
519+
other way it deems appropriate.
520+
521+
Advisement: Implementations are expected to convey to the [=browser agent=] any relevant
522+
security information associated with [=tool definitions=], such as the originating [=origin=],
523+
among other things, so that the backing model has an idea of the different parties at play, and
524+
can most safely carry out the end user's intent.
525+
526+
</div>
527+
528+
Each {{Document}} object has a <dfn for=Document>unique ID</dfn>, which is a [=unique internal
529+
value=].
530+
531+
The times at which a [=browser agent=] [=performs an observation=] are [=implementation-defined=].
532+
A [=browser agent=] may [=parallel queue/enqueue steps=] to the [=AI agent queue=] to [=perform an
533+
observation=] given any [=top-level browsing context=] in the [=user agent=] [=browsing context
534+
group set=], at any time, although implementations typically reserve this operation for when the
535+
user is interacting with a [=browser agent=] while web content is in view.
536+
537+
442538
<h2 id="security-privacy">Security and privacy considerations</h2>
443539

444540
<!--

0 commit comments

Comments
 (0)