|
| 1 | +# CHANGELOG |
| 2 | + |
| 3 | + |
| 4 | +## v0.1.1 (2026-01-29) |
| 5 | + |
| 6 | +### Bug Fixes |
| 7 | + |
| 8 | +- **ci**: Remove build_command from semantic-release config |
| 9 | + ([`db01f86`](https://github.com/OpenAdaptAI/openadapt-grounding/commit/db01f86bb0e62df393b93c297374325169f893c0)) |
| 10 | + |
| 11 | +The python-semantic-release action runs in a Docker container where uv is not available. Let the |
| 12 | + workflow handle building instead. |
| 13 | + |
| 14 | +Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |
| 15 | + |
| 16 | +### Continuous Integration |
| 17 | + |
| 18 | +- Switch to python-semantic-release for automated versioning |
| 19 | + ([`a2dd7d5`](https://github.com/OpenAdaptAI/openadapt-grounding/commit/a2dd7d5f8dde359396b5c81515f38bbca3c7a33b)) |
| 20 | + |
| 21 | +Replaces manual tag-triggered publish with python-semantic-release: - Automatic version bumping |
| 22 | + based on conventional commits - feat: -> minor, fix:/perf: -> patch - Creates GitHub releases |
| 23 | + automatically - Publishes to PyPI on release |
| 24 | + |
| 25 | +Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |
| 26 | + |
| 27 | + |
| 28 | +## v0.1.0 (2026-01-16) |
| 29 | + |
| 30 | +### Bug Fixes |
| 31 | + |
| 32 | +- **uitars**: Fix Dockerfile for vLLM deployment |
| 33 | + ([`5d457ed`](https://github.com/OpenAdaptAI/openadapt-grounding/commit/5d457ed464bb13e10d0fbe8d57a23f85a8f6a31a)) |
| 34 | + |
| 35 | +- Fix CMD format: vLLM image has ENTRYPOINT, CMD should be args only - Fix --limit-mm-per-prompt |
| 36 | + format: use KEY=VALUE instead of JSON - Reduce max-model-len from 32768 to 8192 to fix CUDA OOM on |
| 37 | + L4 24GB - Remove model pre-download (causes disk space issues, download at runtime) - Increase |
| 38 | + health check start-period to 600s for model download |
| 39 | + |
| 40 | +Also adds CLI commands: - cleanup: docker system prune for disk space recovery - wait: poll for |
| 41 | + server health with configurable timeout - setup_autoshutdown: create CloudWatch/Lambda |
| 42 | + infrastructure - build --clean: option to cleanup before building - logs: fix stderr capture |
| 43 | + |
| 44 | +Updates CLAUDE.md with non-interactive operations requirements. |
| 45 | + |
| 46 | +### Chores |
| 47 | + |
| 48 | +- Prepare for PyPI publishing and update Gemini models |
| 49 | + ([`7be8b1f`](https://github.com/OpenAdaptAI/openadapt-grounding/commit/7be8b1f734453ab390cf8188d55ce5cdb9932272)) |
| 50 | + |
| 51 | +- Add PyPI metadata: maintainers, classifiers, keywords, project URLs - Create LICENSE file (MIT) - |
| 52 | + Add GitHub workflow for trusted PyPI publishing - Update Google provider to use current Gemini |
| 53 | + models (3.x, 2.5.x) - Remove deprecated models (2.0, 1.5) that are retired or retiring - Update |
| 54 | + tests to reflect new model names |
| 55 | + |
| 56 | +Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |
| 57 | + |
| 58 | +### Documentation |
| 59 | + |
| 60 | +- Add literature review, experiment plan, and evaluation harness |
| 61 | + ([`13e69f8`](https://github.com/OpenAdaptAI/openadapt-grounding/commit/13e69f8d3d74dbc5eafe72affd46ce61026c86fe)) |
| 62 | + |
| 63 | +- Add literature review: UI-TARS (61.6%), OmniParser (39.6%), ScreenSeekeR (+254%) - Add experiment |
| 64 | + plan: 6 methods comparison across 3 datasets - Add evaluation harness with metrics and dataset |
| 65 | + formats - Update README with documentation links - Add test assets from OmniParser deployment - |
| 66 | + Fix Dockerfile for Conda ToS and PaddleOCR compatibility - Add deploy CLI commands: logs, ps, |
| 67 | + build, run, test |
| 68 | + |
| 69 | +- Move outdated robust_detection to legacy, update evaluation format |
| 70 | + ([`2f32d31`](https://github.com/OpenAdaptAI/openadapt-grounding/commit/2f32d3196212d5cb5c0aaf70cb93190bed0e2205)) |
| 71 | + |
| 72 | +- Move robust_detection.md to docs/legacy/ (superseded by ScreenSeekeR approach) - Update |
| 73 | + evaluation.md Section 6-7 to align with new experiment plan - Compare OmniParser vs UI-TARS |
| 74 | + instead of baseline vs robust transforms |
| 75 | + |
| 76 | +- **readme**: Add CLI usage examples with output |
| 77 | + ([`4d5f392`](https://github.com/OpenAdaptAI/openadapt-grounding/commit/4d5f392e36d14cf7d47dc6139353e93667a0791b)) |
| 78 | + |
| 79 | +- Add status, ps, logs command output examples - Show deploy workflow with .env setup - Document all |
| 80 | + available commands |
| 81 | + |
| 82 | +- **readme**: Use uv sync for dev setup |
| 83 | + ([`8087ca5`](https://github.com/OpenAdaptAI/openadapt-grounding/commit/8087ca5a17b3cf9a240fb2886390e41e2b1b5571)) |
| 84 | + |
| 85 | +### Features |
| 86 | + |
| 87 | +- Add UI-TARS deployment and client |
| 88 | + ([`da67e66`](https://github.com/OpenAdaptAI/openadapt-grounding/commit/da67e66a789f1c0a6a0bfd8117068fd2921f48b5)) |
| 89 | + |
| 90 | +- Add UITarsSettings config class for UI-TARS deployment - Create deploy/uitars module with |
| 91 | + vLLM-based Dockerfile - Implement UITarsClient for grounding via OpenAI-compatible API - Add |
| 92 | + GroundingResult dataclass with coordinate conversion - Include smart_resize() for Qwen2.5-VL |
| 93 | + coordinate scaling - Add [uitars] optional dependency group (openai) - Update CLAUDE.md with |
| 94 | + UI-TARS CLI commands - Update README.md with usage examples and API docs - Add |
| 95 | + uitars_deployment_design.md with full design spec |
| 96 | + |
| 97 | +- **deploy**: Add auto-shutdown and fix PaddleOCR compatibility |
| 98 | + ([`ab758df`](https://github.com/OpenAdaptAI/openadapt-grounding/commit/ab758df747ca1f1924787368b58dce3a5de66655)) |
| 99 | + |
| 100 | +- pin PaddleOCR to v2.8.1 for API compatibility - add auto-shutdown for cost management - add |
| 101 | + config.py and .env.example |
| 102 | + |
| 103 | +- **deploy**: Add CLI commands and fix transformers version |
| 104 | + ([`9e6398b`](https://github.com/OpenAdaptAI/openadapt-grounding/commit/9e6398b569918f40356d6c25bdf32ec50b2b3688)) |
| 105 | + |
| 106 | +- add logs, ps, build, run, test CLI commands - add CLAUDE.md with deployment instructions - pin |
| 107 | + transformers==4.44.2 for Florence-2 compatibility |
| 108 | + |
| 109 | +- **eval**: Add evaluation framework for comparing grounding methods |
| 110 | + ([`064a431`](https://github.com/OpenAdaptAI/openadapt-grounding/commit/064a4314cef63bd778ef419d4cd646e4afbe6c93)) |
| 111 | + |
| 112 | +Implement comprehensive evaluation framework: |
| 113 | + |
| 114 | +- Dataset schema with AnnotatedElement, Sample, Dataset classes - Synthetic UI dataset generator |
| 115 | + (buttons, icons, text, links) - Evaluation methods for OmniParser and UI-TARS - Cropping |
| 116 | + strategies: baseline, fixed, ScreenSeekeR-style - Metrics: detection rate, IoU, latency by |
| 117 | + size/type - Results storage and multi-method comparison - Visualization: charts (matplotlib) and |
| 118 | + console tables - CLI: generate, run, compare, list commands |
| 119 | + |
| 120 | +Usage: python -m openadapt_grounding.eval generate --type synthetic --count 500 python -m |
| 121 | + openadapt_grounding.eval run --method omniparser --dataset synthetic python -m |
| 122 | + openadapt_grounding.eval compare --charts-dir evaluation/charts |
| 123 | + |
| 124 | +Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |
| 125 | + |
| 126 | +- **eval**: Add synthetic_hard evaluation dataset |
| 127 | + ([`c97cf70`](https://github.com/OpenAdaptAI/openadapt-grounding/commit/c97cf70e060ddafcf178ab00169d3d2ee30db52f)) |
| 128 | + |
| 129 | +Add a more challenging synthetic evaluation dataset with 48 samples for testing VLM API providers |
| 130 | + and grounding methods. Contains synthetic UI screenshots with annotations for element localization |
| 131 | + testing. |
| 132 | + |
| 133 | +Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |
| 134 | + |
| 135 | +- **providers**: Add VLM API providers for Claude, GPT, and Gemini |
| 136 | + ([`9a766d1`](https://github.com/OpenAdaptAI/openadapt-grounding/commit/9a766d1d92c4312b917406f61d65a8139180226e)) |
| 137 | + |
| 138 | +Add a unified provider abstraction for Visual Language Model APIs: - Base provider class with |
| 139 | + coordinate normalization and response parsing - Anthropic provider for Claude models |
| 140 | + (claude-sonnet-4-20250514) - OpenAI provider for GPT models (gpt-4o) - Google provider for Gemini |
| 141 | + models (gemini-2.0-flash-exp) |
| 142 | + |
| 143 | +Features: - Lazy loading with optional dependencies per provider - Factory function get_provider() |
| 144 | + with name aliases - Coordinate extraction from model responses with regex fallback - Image |
| 145 | + encoding utilities (base64 conversion) - Comprehensive test suite with mocking |
| 146 | + |
| 147 | +Optional dependencies added to pyproject.toml: - providers-anthropic, providers-openai, |
| 148 | + providers-google - providers (all providers combined) |
| 149 | + |
| 150 | +Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> |
0 commit comments