Status: draft Owner: jcode Audience: jcode core, browser bridge authors, adapter authors
jcode should expose a single first-class browser tool while remaining compatible with multiple browser automation backends:
- Firefox Agent Bridge
- Chrome Agent Bridge
- Chrome remote debugging / CDP adapters
- WebDriver / BiDi adapters
- Safari automation adapters
- other third-party browser control systems
The protocol in this document defines the normalized contract between jcode and a browser provider.
This is intentionally not a demand that every bridge speak exactly the same native command language. Instead:
- jcode defines a core semantic layer it can rely on
- providers declare the capabilities and commands they support
- providers may expose provider-specific commands beyond the core
- adapters can translate a provider's native model into this protocol
That gives us both consistency and room for bridge-specific power features.
-
One first-class tool in jcode
- The model should use a single
browsertool.
- The model should use a single
-
Multiple provider implementations
- Firefox, Chrome, Safari, Edge, WebDriver, and other systems should fit.
-
Capability negotiation
- jcode should know what each provider can and cannot do.
-
Extensibility without fragmentation
- We need a standard core, but providers must have room for browser-specific features.
-
Stable session and element references
- The model should be able to snapshot a page, then act on returned references.
-
Transport-neutral semantics
- The semantic protocol should be the same whether the provider is in-process, over stdio, over a socket, or wrapped through another adapter.
- Standardizing every low-level browser primitive.
- Forcing all providers to support deep DOM, network, or JS introspection.
- Requiring all providers to attach to the user's existing browser profile.
- Making provider-specific commands part of the required core.
- browser tool: the user/model-facing jcode tool.
- provider: a backend implementation that satisfies this protocol.
- bridge: an external browser integration such as Firefox Agent Bridge.
- adapter: glue code that translates a bridge's native API into this protocol.
- browser session: the provider's isolated session or attachment scope for a jcode session.
- page: a tab, target, or browsing surface under a session.
- element ref: an opaque provider-issued handle for an actionable element.
Providers do not need to implement everything.
A provider should support these normalized operations to be considered certified:
provider.describeprovider.statussession.ensuresession.closepage.openpage.snapshotpage.clickpage.typepage.waitpage.screenshot
page.go_backpage.go_forwardpage.reloadtab.listtab.activatetab.closepage.evalpage.presspage.scrollpage.selectdownload.list
Providers may expose additional commands such as:
firefox.install_extensionchrome.attach_debug_targetcdp.sendwebdriver.perform_actions
These are allowed, but they are not part of the required core.
This protocol defines message semantics, not one required wire format.
Supported implementation styles may include:
- direct Rust trait calls inside jcode
- stdio JSON request/response
- local socket RPC
- wrapped remote API
For external-process integrations, the recommended envelope is a JSON-RPC-like shape.
For external providers, requests and responses should use a stable envelope.
{
"protocol_version": "0.1",
"id": "req_123",
"method": "page.open",
"params": {
"session_id": "sess_abc",
"url": "https://example.com"
}
}{
"protocol_version": "0.1",
"id": "req_123",
"ok": true,
"result": {
"page_id": "page_1",
"url": "https://example.com",
"title": "Example Domain"
},
"warnings": []
}{
"protocol_version": "0.1",
"id": "req_123",
"ok": false,
"error": {
"code": "unsupported_method",
"message": "This provider does not implement page.eval",
"retryable": false,
"details": {}
}
}If a provider emits async events, use:
{
"protocol_version": "0.1",
"event": "page.navigated",
"payload": {
"session_id": "sess_abc",
"page_id": "page_1",
"url": "https://example.com/next"
}
}Events are optional in v1.
Returns static and semi-static metadata about the provider.
Example:
{
"provider_id": "firefox_agent_bridge",
"provider_label": "Firefox Agent Bridge",
"provider_version": "1.2.3",
"protocol_version": "0.1",
"browser_families": ["firefox"],
"transport": "stdio-json",
"certification_tier": "candidate",
"capabilities": {
"core_methods": [
"session.ensure",
"session.close",
"page.open",
"page.snapshot",
"page.click",
"page.type",
"page.wait",
"page.screenshot"
],
"optional_methods": [
"tab.list",
"tab.activate",
"page.eval"
],
"features": [
"element_refs",
"a11y_snapshot",
"attach_existing_browser",
"persistent_profile"
],
"custom_methods": [
{
"name": "firefox.install_extension",
"stability": "experimental",
"description": "Install or verify the Firefox extension"
}
]
}
}Returns current availability and setup state.
Example fields:
{
"availability": "ready",
"browser_detected": true,
"browser_running": true,
"setup_state": "complete",
"requires_manual_setup": false,
"recommended_browser": "firefox",
"manual_steps": [],
"diagnostics": [
{
"level": "info",
"code": "native_host_detected",
"message": "Native host manifest found"
}
]
}Suggested enums:
availability:ready | degraded | unavailablesetup_state:complete | partial | required | broken
jcode should not care whether a provider uses tabs, contexts, profiles, or remote targets internally. It only needs a stable handle it can reuse.
Creates or reuses a browser session for a jcode session.
Request:
{
"client_session_id": "jcode_session_123",
"browser_preference": "auto",
"isolation": "per_jcode_session",
"attach": "prefer",
"persist": true,
"metadata": {
"owner": "agent",
"purpose": "browser_tool"
}
}Response:
{
"session_id": "browser_sess_1",
"browser_family": "firefox",
"browser_label": "Firefox",
"attached_to_existing_browser": true,
"isolation": "per_jcode_session",
"default_page_id": "page_1"
}Closes or detaches the provider session.
Providers may choose whether this closes tabs, detaches from a target, or merely releases provider-side state. The behavior should be documented in provider.describe or provider.status diagnostics.
All resource identifiers are opaque strings.
Examples:
session_idpage_idtab_idelement_refdownload_id
jcode must not assume identifier shape or encode browser semantics into them.
These are the semantics jcode can rely on.
Open a URL in the current page or a new page.
Request fields:
session_idrequiredurlrequiredpage_idoptionalnew_pageoptionalforegroundoptionalwait_untiloptional:load | domcontentloaded | networkidle | provider_default
Response fields:
page_idurltitleoptionalnavigation_stateoptional
Return a normalized view of the current page for agent reasoning.
This is the most important method for model use.
Request fields:
session_idrequiredpage_idoptionalinclude_screenshotoptionalinclude_htmloptionalinclude_domoptionalinclude_a11yoptionalinclude_textoptionalmax_nodesoptional
Response fields:
page_idurltitlesnapshotelementstextscreenshot_refoptionalprovider_dataoptional
Providers may use different internal representations, but page.snapshot should normalize into a common minimum format:
{
"snapshot": {
"format": "jcode.page_snapshot.v1",
"root": {
"node_id": "n1",
"role": "document",
"name": "Example Domain",
"children": ["n2", "n3"]
},
"nodes": [
{
"node_id": "n2",
"role": "heading",
"name": "Example Domain",
"text": "Example Domain",
"element_ref": "el_1",
"actionable": false
},
{
"node_id": "n3",
"role": "link",
"name": "More information...",
"text": "More information...",
"element_ref": "el_2",
"actionable": true
}
]
}
}For agent convenience, providers should also return a flattened actionable list when possible:
{
"elements": [
{
"element_ref": "el_2",
"role": "link",
"name": "More information...",
"text": "More information...",
"actionable": true,
"enabled": true,
"selector_hint": "a"
}
]
}A provider that cannot produce rich DOM/a11y data may still return a weaker snapshot, but it should say so in capabilities.
Click an element.
Request should support multiple targeting modes:
element_refselectortext_queryposition
At least one must be provided.
Response fields:
page_idclickedbooleannavigation_occurredoptionalurloptional
Providers should prefer element_ref when available.
Type or set text into an input-like target.
Request fields:
element_refoptionalselectoroptionaltextrequiredreplaceoptionalsubmitoptional
Response fields:
page_idtypedboolean
Wait for a condition.
Request fields may include:
text_presenttext_absentselector_presentselector_absentelement_ref_presenturl_matchesnavigation_completetimeout_ms
Response fields:
satisfiedbooleanmatched_conditionoptionalurloptional
Capture a screenshot.
Request fields:
session_idpage_idoptionalfull_pageoptionalclipoptionalelement_refoptional
Response fields:
page_idimageorimage_refmedia_typewidthheight
Providers may return inline base64 data or a provider-managed image reference depending on transport constraints.
These methods are standardized when present, but not required for certification in the first pass.
page.go_backpage.go_forwardpage.reload
page.presspage.selectpage.hoverpage.scroll
tab.listtab.activatetab.closetab.new
page.evalnetwork.listconsole.liststorage.getcookie.list
download.listdownload.waitupload.set_files
This is the key part that allows leeway for provider-specific commands.
Custom methods should use a namespaced method name, for example:
firefox.install_extensionchrome.attach_debug_targetcdp.sendwebdriver.actions
Every custom method should appear in provider.describe.capabilities.custom_methods with:
namedescriptionstability:stable | experimental | deprecated- optional
input_schema - optional
output_schema
The main browser tool should prefer the standard core and optional normalized methods.
Provider-specific methods should only be used when:
- the user explicitly asks for them
- a jcode-side adapter knows how to use them safely
- or a future advanced/debug mode is enabled
If we want an escape hatch, the browser tool can support something like:
{
"action": "provider_command",
"provider_method": "cdp.send",
"params": {
"method": "Network.enable"
}
}This should be considered advanced/debug behavior, not the primary UX.
Providers should report both methods and higher-level features.
Concrete callable operations:
page.openpage.snapshottab.list
Semantics or qualities that influence jcode behavior:
element_refsa11y_snapshotdom_snapshothtml_snapshotfull_page_screenshotattach_existing_browserpersistent_profileisolated_contextsjs_evalnetwork_observeconsole_observefile_uploaddownload_observemanual_setup_requiredextension_requiredremote_debugging_required
Each feature or method may optionally include a stability tag:
stableexperimentaldeprecated
A browser provider often requires manual setup. The protocol should make that machine-readable.
{
"level": "warning",
"code": "extension_missing",
"message": "Firefox extension is not installed",
"manual_steps": [
"Open Firefox",
"Install the extension from /path/to/bridge.xpi",
"Restart Firefox if prompted"
]
}provider.statusprovider.setup_guideoptionalprovider.verifyoptional
provider.setup_guide may return browser-specific instructions, URLs, file paths, permissions, or troubleshooting steps.
Standard error codes should include:
unsupported_methodunsupported_targetinvalid_requestinvalid_selectorelement_not_foundelement_not_actionablenavigation_timeoutnot_readysetup_requiredpermission_deniedbrowser_not_runningsession_not_foundpage_not_foundinternal_error
Providers may add provider-specific detail codes in error.details.
The protocol should be versioned independently from provider versions.
protocol_versionidentifies the semantic protocol version.- Providers should declare the protocol version they implement.
- Minor additive changes should not break existing certified providers.
- Breaking changes require a new protocol version.
For now use:
protocol_version = "0.1"
A provider can be classified as:
- passes conformance tests for required core methods
- returns stable identifiers and normalized results
- reports setup/diagnostics correctly
- behaves predictably across repeated runs
- supports some or most normalized methods
- may have missing features or partial behavior
- useful, but not yet fully certified
- adapter exists, but semantics are incomplete or unstable
A future conformance suite should verify at least:
provider.describesucceedsprovider.statusreports a coherent statesession.ensurecreates or reuses a sessionpage.opennavigates to a test pagepage.snapshotreturns usable text and at least one actionable reference when applicablepage.clickcan activate a known elementpage.typecan fill a known inputpage.waitobserves a deterministic page changepage.screenshotreturns an imagesession.closecleans up or detaches cleanly
The jcode browser tool should:
- prefer normalized core methods
- choose a provider based on user preference, availability, and capability quality
- expose provider-specific methods only behind an explicit advanced path
- return setup guidance when no ready provider is available
- avoid baking Firefox-specific or Chrome-specific assumptions into the core tool API
These are intentionally left open for the next iteration.
- Should screenshots always be inline, or can providers return file/image handles?
- Should event streaming be required for advanced integrations?
- How much of raw HTML/DOM should be normalized versus returned as provider data?
- Should
page.snapshotsupport multiple named formats beyondjcode.page_snapshot.v1? - Should provider-specific methods be invokable through the same
browsertool or only via debug mode? - Should setup/install flows themselves be standardized beyond status and diagnostics?
- Review this document and tighten the core method set.
- Decide the exact normalized
page.snapshotformat. - Define a Rust trait matching this protocol.
- Implement the first provider adapter for Firefox Agent Bridge.
- Build a conformance test harness.
- Add README browser setup and compatibility documentation after the protocol stabilizes.