Skip to content

Latest commit

 

History

History
166 lines (111 loc) · 7.97 KB

File metadata and controls

166 lines (111 loc) · 7.97 KB

CHANGELOG

v0.1.2 (2026-01-29)

Bug Fixes

  • Use filename-based GitHub Actions badge URL (#2, 0d58eee)

The workflow-name-based badge URL was showing "no status" because GitHub requires workflow runs on the specified branch. Using the filename-based URL format (actions/workflows/publish.yml/badge.svg) is more reliable and works regardless of when the workflow last ran.

Co-authored-by: Claude Sonnet 4.5 noreply@anthropic.com

v0.1.1 (2026-01-29)

Bug Fixes

  • ci: Remove build_command from semantic-release config (db01f86)

The python-semantic-release action runs in a Docker container where uv is not available. Let the workflow handle building instead.

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com

Continuous Integration

  • Switch to python-semantic-release for automated versioning (a2dd7d5)

Replaces manual tag-triggered publish with python-semantic-release: - Automatic version bumping based on conventional commits - feat: -> minor, fix:/perf: -> patch - Creates GitHub releases automatically - Publishes to PyPI on release

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com

v0.1.0 (2026-01-16)

Bug Fixes

  • uitars: Fix Dockerfile for vLLM deployment (5d457ed)

  • Fix CMD format: vLLM image has ENTRYPOINT, CMD should be args only - Fix --limit-mm-per-prompt format: use KEY=VALUE instead of JSON - Reduce max-model-len from 32768 to 8192 to fix CUDA OOM on L4 24GB - Remove model pre-download (causes disk space issues, download at runtime) - Increase health check start-period to 600s for model download

Also adds CLI commands: - cleanup: docker system prune for disk space recovery - wait: poll for server health with configurable timeout - setup_autoshutdown: create CloudWatch/Lambda infrastructure - build --clean: option to cleanup before building - logs: fix stderr capture

Updates CLAUDE.md with non-interactive operations requirements.

Chores

  • Prepare for PyPI publishing and update Gemini models (7be8b1f)

  • Add PyPI metadata: maintainers, classifiers, keywords, project URLs - Create LICENSE file (MIT) - Add GitHub workflow for trusted PyPI publishing - Update Google provider to use current Gemini models (3.x, 2.5.x) - Remove deprecated models (2.0, 1.5) that are retired or retiring - Update tests to reflect new model names

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com

Documentation

  • Add literature review, experiment plan, and evaluation harness (13e69f8)

  • Add literature review: UI-TARS (61.6%), OmniParser (39.6%), ScreenSeekeR (+254%) - Add experiment plan: 6 methods comparison across 3 datasets - Add evaluation harness with metrics and dataset formats - Update README with documentation links - Add test assets from OmniParser deployment - Fix Dockerfile for Conda ToS and PaddleOCR compatibility - Add deploy CLI commands: logs, ps, build, run, test

  • Move outdated robust_detection to legacy, update evaluation format (2f32d31)

  • Move robust_detection.md to docs/legacy/ (superseded by ScreenSeekeR approach) - Update evaluation.md Section 6-7 to align with new experiment plan - Compare OmniParser vs UI-TARS instead of baseline vs robust transforms

  • readme: Add CLI usage examples with output (4d5f392)

  • Add status, ps, logs command output examples - Show deploy workflow with .env setup - Document all available commands

  • readme: Use uv sync for dev setup (8087ca5)

Features

  • Add UI-TARS deployment and client (da67e66)

  • Add UITarsSettings config class for UI-TARS deployment - Create deploy/uitars module with vLLM-based Dockerfile - Implement UITarsClient for grounding via OpenAI-compatible API - Add GroundingResult dataclass with coordinate conversion - Include smart_resize() for Qwen2.5-VL coordinate scaling - Add [uitars] optional dependency group (openai) - Update CLAUDE.md with UI-TARS CLI commands - Update README.md with usage examples and API docs - Add uitars_deployment_design.md with full design spec

  • deploy: Add auto-shutdown and fix PaddleOCR compatibility (ab758df)

  • pin PaddleOCR to v2.8.1 for API compatibility - add auto-shutdown for cost management - add config.py and .env.example

  • deploy: Add CLI commands and fix transformers version (9e6398b)

  • add logs, ps, build, run, test CLI commands - add CLAUDE.md with deployment instructions - pin transformers==4.44.2 for Florence-2 compatibility

  • eval: Add evaluation framework for comparing grounding methods (064a431)

Implement comprehensive evaluation framework:

  • Dataset schema with AnnotatedElement, Sample, Dataset classes - Synthetic UI dataset generator (buttons, icons, text, links) - Evaluation methods for OmniParser and UI-TARS - Cropping strategies: baseline, fixed, ScreenSeekeR-style - Metrics: detection rate, IoU, latency by size/type - Results storage and multi-method comparison - Visualization: charts (matplotlib) and console tables - CLI: generate, run, compare, list commands

Usage: python -m openadapt_grounding.eval generate --type synthetic --count 500 python -m openadapt_grounding.eval run --method omniparser --dataset synthetic python -m openadapt_grounding.eval compare --charts-dir evaluation/charts

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com

  • eval: Add synthetic_hard evaluation dataset (c97cf70)

Add a more challenging synthetic evaluation dataset with 48 samples for testing VLM API providers and grounding methods. Contains synthetic UI screenshots with annotations for element localization testing.

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com

  • providers: Add VLM API providers for Claude, GPT, and Gemini (9a766d1)

Add a unified provider abstraction for Visual Language Model APIs: - Base provider class with coordinate normalization and response parsing - Anthropic provider for Claude models (claude-sonnet-4-20250514) - OpenAI provider for GPT models (gpt-4o) - Google provider for Gemini models (gemini-2.0-flash-exp)

Features: - Lazy loading with optional dependencies per provider - Factory function get_provider() with name aliases - Coordinate extraction from model responses with regex fallback - Image encoding utilities (base64 conversion) - Comprehensive test suite with mocking

Optional dependencies added to pyproject.toml: - providers-anthropic, providers-openai, providers-google - providers (all providers combined)

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com