You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement a module that constructs an Action Graph from an interaction log and corresponding scene snapshots. The Action Graph models UI states as nodes and UI actions as edges, capturing transitions between visual states triggered by user or agent interactions.
Supports both real and synthetic data sources.
Motivation
Provides a unified, structured representation of recorded UI behavior over time.
Enables downstream planning, summarization, visualization, and analysis.
Forms the backbone of OmniMCP’s process abstraction stack: Parser → Segments → Tracks → Scene Graph → Action Graph → Plan → Actions → API
Can optionally use an Interaction Log (real or synthetic) to help derive the Action Graph.
Can later be converted into symbolic process logs for use with PM4Py or other process mining tools.
Diagram
graph TD
Parser --> Segments
Segments --> Tracks
Tracks --> SceneGraph
SceneGraph --> ActionGraph
InteractionLog -.-> ActionGraph
ActionGraph --> Plan
Plan --> Actions
Actions --> API
InteractionLog[Interaction Log]
Loading
Scope
Inputs
interaction_log (optional): List of structured user/agent interactions (click, type, scroll, etc.), each with:
timestamp or step
action_type
element_id or selector
(optional) element_description, bounding_box, value
scene_snapshots: List of scene graph snapshots (UI state summaries or raw graph objects), aligned with interaction steps.
Outputs
action_graph: A JSON or in-memory object with:
nodes: One per unique UI state (e.g., via hash or semantic description)
edges: One per interaction, with:
source_node_id
target_node_id
action_type, element_id, timestamp
Features
Node deduplication: similar scene snapshots map to the same node
Edge labeling with action metadata
Optional use of interaction log for state transition alignment
Support for synthetic logs to bootstrap development and testing
Easy export to JSON for visualization/debugging
Integration-ready for prompt-based planners and optional PM4Py pipeline
Summary
Implement a module that constructs an Action Graph from an interaction log and corresponding scene snapshots. The Action Graph models UI states as nodes and UI actions as edges, capturing transitions between visual states triggered by user or agent interactions.
Supports both real and synthetic data sources.
Motivation
Parser → Segments → Tracks → Scene Graph → Action Graph → Plan → Actions → APIDiagram
Scope
Inputs
interaction_log(optional): List of structured user/agent interactions (click,type,scroll, etc.), each with:timestamporstepaction_typeelement_idor selectorelement_description,bounding_box,valuescene_snapshots: List of scene graph snapshots (UI state summaries or raw graph objects), aligned with interaction steps.Outputs
action_graph: A JSON or in-memory object with:nodes: One per unique UI state (e.g., via hash or semantic description)edges: One per interaction, with:source_node_idtarget_node_idaction_type,element_id,timestampFeatures
Example
Given:
[ { "step": 0, "action": "type", "element_id": "email", "value": "rich@gmail.com", "scene": "Login page with empty fields" }, { "step": 1, "action": "type", "element_id": "password", "value": "hunter2", "scene": "Login page with email filled" }, { "step": 2, "action": "click", "element_id": "login_button", "scene": "Login page with both fields filled" }, { "step": 3, "action": "wait", "duration": 2, "scene": "Dashboard with welcome message" } ]The resulting Action Graph:
{ "nodes": [ { "id": "n0", "description": "Login page with empty fields" }, { "id": "n1", "description": "Login page with email filled" }, { "id": "n2", "description": "Login page with both fields filled" }, { "id": "n3", "description": "Dashboard with welcome message" } ], "edges": [ { "source": "n0", "target": "n1", "action": "type", "element": "email", "step": 0 }, { "source": "n1", "target": "n2", "action": "type", "element": "password", "step": 1 }, { "source": "n2", "target": "n3", "action": "click", "element": "login_button", "step": 2 } ] }Tasks
ActionGraphdata model (nodes, edges)Notes