v0.3.0
What's Changed
- Optimize docs, naming, UX etc. by @adi-wan-askui in #42
- Model Composition / Selection by @adi-wan-askui in #41
- Get command with "askui" model (including response schema) by @adi-wan-askui in #40
- Add locators by @adi-wan-askui in #36
🚀 Features
- locate ui element using images, prompts, element classes (e.g., "textfield"), relations among ui elements
- use json schema to extract more complex data (other than strings)
- new reporters + inject your own custom reporter
- more flexible model selection + stable ocr model per default (inject through
modelparameter on initialization ofVisionAgentor on call of method, e.g.,VisionAgent.click(), asstrorModelComposition)
🐞 Bug Fixes
- fix reports overriding each other (names not unique enough)
Other
- improve and add documentation
- start
AskUiControllerServernot on initialization but lazily on entering theVisionAgentcontext - improve configurability/testability by allowing injection of
ModelRouter,AgentToolboxandReporterintoVisionAgenton initialization (replacesenable_report,enable_askui_controllerparameters) - improve input parameter validation by validating arguments of all public function calls
- add
VisionAgent.locate()method for locating (returning center position of) ui elements
🚨 Breaking Changes
model_nameparameter (e.g.,VisionAgent.click()orVisionAgent.mouse_move()) renamed tomodel- model value "claude" for
VisionAgent.act()changed to
"anthropic-claude-3-5-sonnet-20241022" - change of order of parameters of
VisionAgent.get() - removed parameters of
VisionAgent()(VisionAgent.__init__()):enable_report,enable_askui_controller "askui"model only chosen as default model ifASKUI_WORKSPACE_IDandASKUI_TOKENenvironment variables are set- rename
instructionparameter tolocator(e.g.,VisionAgent.click()) orquery(VisionAgent.get()) - remove
PC_AND_MODIFIER_KEYtype in favor ofPcKey | ModifierKey
Full Changelog: v0.2.5...v0.3.0