Skip to content

v0.3.0

Choose a tag to compare

@adi-wan-askui adi-wan-askui released this 23 Apr 09:17
· 1071 commits to main since this release

What's Changed

🚀 Features

  • locate ui element using images, prompts, element classes (e.g., "textfield"), relations among ui elements
  • use json schema to extract more complex data (other than strings)
  • new reporters + inject your own custom reporter
  • more flexible model selection + stable ocr model per default (inject through model parameter on initialization of VisionAgent or on call of method, e.g., VisionAgent.click(), as str or ModelComposition)

🐞 Bug Fixes

  • fix reports overriding each other (names not unique enough)

Other

  • improve and add documentation
  • start AskUiControllerServer not on initialization but lazily on entering the VisionAgent context
  • improve configurability/testability by allowing injection of ModelRouter, AgentToolbox and Reporter into VisionAgent on initialization (replaces enable_report, enable_askui_controller parameters)
  • improve input parameter validation by validating arguments of all public function calls
  • add VisionAgent.locate() method for locating (returning center position of) ui elements

🚨 Breaking Changes

  • model_name parameter (e.g., VisionAgent.click() or VisionAgent.mouse_move()) renamed to model
  • model value "claude" for VisionAgent.act() changed to
    "anthropic-claude-3-5-sonnet-20241022"
  • change of order of parameters of VisionAgent.get()
  • removed parameters of VisionAgent() (VisionAgent.__init__()): enable_report, enable_askui_controller
  • "askui" model only chosen as default model if ASKUI_WORKSPACE_ID and ASKUI_TOKEN environment variables are set
  • rename instruction parameter to locator (e.g., VisionAgent.click()) or query (VisionAgent.get())
  • remove PC_AND_MODIFIER_KEY type in favor of PcKey | ModifierKey

Full Changelog: v0.2.5...v0.3.0