Skip to content

v0.24.0

Choose a tag to compare

@philipph-askui philipph-askui released this 03 Mar 10:30
· 253 commits to main since this release
46d3113

v0.24.0

🎉 Overview

v0.24.0 is a major architectural release that fundamentally refactors how custom models can be integrated into the AskUI SDK. The centerpiece is the new Bring-Your-Own-Model-Provider system: instead of configuring models via string identifiers and using a complicated ModelRouter abstraction, you can now simply plug typed provider instances directly into AgentSettings. Three clean interfaces, VlmProvider, ImageQAProvider, and DetectionProvider, make it straightforward to swap in your own model backends for acting, querying, and locating UI elements. Built-in providers for Anthropic and Google are included alongside the AskUI defaults.

🚨 Breaking Changes

  • Model Provider Overhaul: We removed the ModelRouter and ModelRegistry and replaced them with a new model_providers architecture. You can now bring your own model providers through three typed interfaces: VlmProvider (for act), ImageQAProvider (for get), and DetectionProvider (for locate). Built-in providers include AnthropicVlmProvider, GoogleImageQAProvider, AskUIVlmProvider, and more. Please see the new Bring Your Own Model Provider docs for detailed instructions.

    Migration: Replace model/models constructor parameters with the new settings: AgentSettings parameter:

    # Before
    agent = VisionAgent(model="claude-sonnet-4-20250514")
    
    # After
    from askui import ComputerAgent, AgentSettings
    from askui.model_providers import AnthropicVlmProvider
    
    agent = ComputerAgent(settings=AgentSettings(
        vlm_provider=AnthropicVlmProvider(model_id="claude-sonnet-4-20250514"),
    ))
  • VisionAgent renamed to ComputerAgent: The main agent class is now ComputerAgent. VisionAgent still works but emits a DeprecationWarning. Similarly, AndroidVisionAgent is now AndroidAgent.

  • click()/mouse_move() model parameter replaced: The model parameter on click(), mouse_move(), and locate() has been replaced by locate_settings: LocateSettings for controlling resolution and other locate options.

  • betas parameter removed from MessageSettings: The Anthropic-specific betas parameter was replaced with a generic provider_options: dict[str, Any] field. To pass betas, use provider_options={"betas": [...]}.

  • Chat API removed: The Chat API (src/askui/chat/) has been removed from the package along with its dependencies (sqlalchemy, alembic, fastapi, uvicorn, apscheduler, etc.).

  • pynput AgentOs backend removed: The PynputAgentOs implementation and the askui[pynput] optional dependency group have been removed. Use the default AskUiControllerClient (gRPC) backend instead.

  • UITars model removed: The UITars model integration (src/askui/models/ui_tars_ep/) has been removed.

  • OpenAI integration removed: The OpenAI-compatible model provider (src/askui/models/openai/) has been removed. Use the new provider interfaces for custom model integrations.

  • ModelComposition and ModelDefinition removed: These classes have been replaced by the new provider system.


✨ New Features

  • AgentSettings for centralized configuration: A new AgentSettings class provides a clean, typed configuration surface for agents with three provider slots: vlm_provider, image_qa_provider, and detection_provider — each with sensible AskUI defaults.

  • Bring-Your-Own-Model-Provider: Three abstract provider interfaces (VlmProvider, ImageQAProvider, DetectionProvider) allow users to plug in their own models. Built-in implementations:

    • AskUIVlmProvider, AskUIImageQAProvider, AskUIDetectionProvider (defaults)
    • AnthropicVlmProvider, AnthropicImageQAProvider (direct Anthropic API)
    • GoogleImageQAProvider (direct Google Gemini API)
  • mouse_movement accepts a duration parameter to control mouse movement speed (in milliseconds, default: 500ms) by @philipph-askui in #233

  • Time and wait tools added to universal tool store by @mlikasam-askui in #234:

    • GetCurrentTimeTool — returns current date/time for time-aware agent decisions
    • WaitTool — pauses execution for a specified duration
    • WaitWithProgressTool — wait with a visual progress bar
    • WaitUntilConditionTool — polls a condition with configurable interval and timeout
  • LocateSettings and GetSettings exposed in public API: Users can now control per-call locate/get behavior including resolution, max_tokens, temperature, and system_prompt.

  • FallbackLocateModel and FallbackGetModel: New utility classes that try multiple models in sequence until one succeeds, replacing the old ModelComposition pattern.

  • get and locate tools in act loop: The LLM can now use get and locate as tools during act() calls (only when an AgentOs is available).


🐛 Bug Fixes

  • Fixed agent crash without AgentOs: get and locate tools are now only added to the act loop when agent_os is set. Agents used without an AgentOs (e.g., pure LLM pipelines) no longer crash on act(). by @philipph-askui in #237

  • Fixed OpenTelemetry import errors: opentelemetry-sdk is now a default dependency. Instrumentor imports (FastAPIInstrumentor, HTTPXClientInstrumentor, etc.) are safely guarded with try/except so installing without [otel] extras no longer causes import failures. by @philipph-askui in #238

  • Fixed typechecking issue in not_given.py — added @final decorator to resolve mypy ambiguity.

  • Fixed Display default value for name parameter in AgentOS (was raising an error when executing from cache).


📚 Documentation

  • Complete restructuring of docs (00_overview.md through 10_extracting_data.md)
  • Removed outdated docs for chat API, MCP, and direct tool use
  • New Bring Your Own Model Provider guide
  • Updated reporting docs to distinguish between execution reports and test reports
  • Updated README to reflect new ComputerAgent class name, corrected Python version requirement (>=3.10, <3.14), and fixed broken links

Dependencies

Removed: openai, fastapi, uvicorn, sqlalchemy, alembic, apscheduler, pynput, mss, structlog, asgi-correlation-id, starlette-context, anyio, bson, aiofiles

Added to core: opentelemetry-sdk>=1.38.0 (promoted from optional chat extras)

Optional extras changed:

  • askui[chat] — removed
  • askui[pynput] — removed
  • askui[otel] — now contains only the instrumentor packages (the base SDK is always available)
  • askui[all] — now includes android, bedrock, tracing, vertex, web

📝 Full Changelog: v0.23.1...v0.24.0