Skip to content

Prebuild/feat/autonomous agents merge#1588

Open
A-makarim wants to merge 178 commits into
cnoe-io:mainfrom
A-makarim:prebuild/feat/autonomous-agents-merge
Open

Prebuild/feat/autonomous agents merge#1588
A-makarim wants to merge 178 commits into
cnoe-io:mainfrom
A-makarim:prebuild/feat/autonomous-agents-merge

Conversation

@A-makarim
Copy link
Copy Markdown

Description

This PR introduces the autonomous task feature set and the supporting platform changes needed to run it end to end.

The main change is a new autonomous task workflow that lets users create scheduled or webhook-triggered tasks, route those runs through the existing supervisor/dynamic-agent A2A path, and review the resulting execution history from the chat UI. The work includes task persistence, scheduler reload behavior, webhook security, per-task chat context, preflight acknowledgement, and UI flows for creating, editing, deleting, filtering, and continuing autonomous task conversations.

It also includes the related integration work needed for the feature to operate in the current platform shape: supervisor tools for autonomous task management, GitHub webhook setup helpers, dynamic-agent chat timeline rendering fixes, MongoDB-backed task storage, deployment/env wiring, Helm/prebuild workflow updates, and CI fixes found while preparing the merge branch.

Notable areas included in this branch:

  • Autonomous task CRUD, scheduling, hot reload, and MongoDB-backed task storage.
  • Webhook-triggered autonomous runs with signing, replay-window protection, ping handling, and GitHub webhook management tools.
  • Supervisor/deep-agent integration so autonomous runs and follow-up chat share the normal A2A execution pipeline.
  • UI support for an Autonomous tab, task management, task-linked chat threads, AUTO filtering/badges, replayed timelines, and scheduled-run updates.
  • Tests around task models, preflight handling, webhook management, chat rendering, and related UI behavior.
  • Deployment and configuration updates for autonomous agent public/internal URLs and prebuild Helm chart publication.

Type of Change

  • Bugfix
  • New Feature
  • Breaking Change
  • Refactor
  • Documentation
  • Other (CI/deployment wiring)

Pre-release Helm Charts (Optional)

This branch includes chart and prebuild workflow changes. Prebuild chart publishing has been exercised from the fork branch.

Checklist

  • I have read the contributing guidelines
  • Existing issues have been referenced (where applicable)
  • I have verified this change is not present in other open pull requests
  • Functionality is documented
  • All code style checks pass
  • New code contribution is covered by automated tests
  • All new and existing tests pass

A-makarim and others added 30 commits April 10, 2026 17:51
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
- Remove unused os import from main.py
- Fix import ordering in health.py and tasks.py (ruff I001)
- Add ruff as dev dependency to pyproject.toml
- Add uv.lock for reproducible installs
- Rewrite README.md with full documentation

Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
- Fix duplicate TaskRun bug: fire_webhook_task now delegates entirely to
  _execute_task which creates the single canonical run record and returns it
- Fix A2A protocol mismatch: change method tasks/send → message/send, parts
  kind type → kind, add messageId/contextId per Google A2A spec
- Forward task.agent and task.llm_provider through invoke_agent to supervisor
  metadata so the supervisor routes to the correct sub-agent
- Move import json from inside function bodies to module level (a2a_client.py,
  webhooks.py)
- Replace assert isinstance with explicit isinstance checks + HTTPException/log
- Use collections.deque(maxlen=500) for O(1) bounded run history
- Fix open CORS default ["*"] → [] (security)
- Add IntervalTrigger model_validator requiring at least one positive field
- Use Field(default_factory=dict) for mutable dict defaults in models
- Remove unused WebhookTrigger.path field (route is always /hooks/{task_id})
- Remove duplicate ruff from [project.optional-dependencies].dev in pyproject.toml
- Fix Dockerfile: COPY uv.lock, remove unused uv venv .venv line

Signed-off-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

Agent-Logs-Url: https://github.com/A-makarim/ai-platform-engineering/sessions/03dc3b5a-c94f-4f81-bf5f-531161938700

Co-authored-by: A-makarim <114302821+A-makarim@users.noreply.github.com>
- IntervalTrigger validator: check each field individually for positive values
  rather than summing (so seconds=-60, minutes=1 is correctly rejected)
- Log effective_llm_provider alongside agent in a2a_client invoke log

Signed-off-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

Agent-Logs-Url: https://github.com/A-makarim/ai-platform-engineering/sessions/03dc3b5a-c94f-4f81-bf5f-531161938700

Co-authored-by: A-makarim <114302821+A-makarim@users.noreply.github.com>
…2A client

Add a third extraction fallback that checks the task history for the last
agent message, matching the pattern in a2a_remote_agent_connect.py. Without
this, valid supervisor responses carried in history were treated as failures.

Also add params.configuration with blocking:true and acceptedOutputModes to
ensure the supervisor returns a completed result in a predictable shape.

Signed-off-by: A-makarim <$(git config user.email)>
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Replace `pip install uv` + `uv pip install --system -e .` with the repo's
established pattern: copy uv from ghcr.io/astral-sh/uv:latest and run
`uv sync --locked --no-dev`, which actually enforces the lock file and
keeps the install consistent with every other service Dockerfile in the repo.

Signed-off-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>

Agent-Logs-Url: https://github.com/A-makarim/ai-platform-engineering/sessions/6cc475fd-57ea-4f3b-be49-d66b586733f2

Co-authored-by: A-makarim <114302821+A-makarim@users.noreply.github.com>
fix(autonomous-agents): address critical bugs and review feedback
…ggers

Narrow the trigger `type` fields from plain `TriggerType` defaults to
`Literal[TriggerType.*]` on all three trigger models. Pydantic v2 requires
a `Literal`-typed discriminator field for `Field(..., discriminator="type")`
to resolve correctly at parse time; without it the union falls back to slow
left-to-right probing and can silently mis-classify trigger payloads.

Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
…finition

Cover CronTrigger, IntervalTrigger, WebhookTrigger, and TaskDefinition
construction and field defaults. Aligns with the removal of WebhookTrigger.path
(dropped in the Copilot bug-fix pass) and the Literal discriminator types
added to all trigger models.

Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Without this, pytest cannot resolve the autonomous_agents package because
the source lives under src/ (src layout). Adding pythonpath puts src/ on
sys.path so both test runs and IDE import resolution work correctly.

Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Adds a living checklist that captures the phased plan for evolving the
autonomous_agents service into a production-ready, UI-integrated feature.

Each item carries an ID (IMP-NN), status, rationale ("why it matters"),
suggested approach, and the files it would touch — so any one of them
can be picked up independently without re-deriving the design context.

Phases tracked:
  - Phase 1: hardening (persistence, retries, OTel, CORS, secrets, etc.)
  - Phase 2: production readiness + UI integration (the north star)
  - Phase 3: scale & resilience (jobstore, leader election, multi-replica)

No code changes; documentation only.

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
Adds scripts/run_supervisor_local.ps1 — a self-contained PowerShell
helper to bring up the CAIPE supervisor in single-node mode on Windows,
purely for end-to-end testing of the autonomous_agents service against
a live supervisor (no Docker, no MongoDB, no other infra required).

The script encapsulates every Windows-specific workaround needed to run
the supervisor natively, so we never have to patch tracked upstream
files outside the autonomous_agents folder:

  1. Sets PYTHONUTF8=1 / PYTHONIOENCODING=utf-8 so prompts.py and the
     supervisor's connectivity table can read/print UTF-8 content
     (emojis, box-drawing) without hitting cp1252 encode/decode errors.

  2. Sets PYTHONPATH to the repo root before changing directory, then
     cds into charts/ai-platform-engineering/data so the supervisor's
     relative-path load of prompt_config.yaml resolves to the real
     config (the repo-root prompt_config.yaml is a Docker-mount stub).

  3. Bootstraps a .pth file inside the active venv exposing every
     ai_platform_engineering/agents/* sub-package, so the single-node
     supervisor's eager imports of agent classes succeed without us
     having to install each agent as an editable dependency or modify
     the root pyproject.toml.

Scope is intentionally limited to the autonomous_agents folder — this
is a developer convenience, not part of the public surface of the
feature.

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
Introduces a small async Protocol-based abstraction for persisting
TaskRun records, with a default in-memory implementation that mirrors
the legacy deque(maxlen=500) behaviour from scheduler.py.

This is the first slice of IMP-01 (persist run history to MongoDB).
It deliberately ships the abstraction *before* either implementation
is wired into the scheduler so each step lands as a small, reviewable
commit and the scheduler swap (later) becomes a pure refactor.

Protocol surface (services/run_store.py):
  - record(run)              upsert by run_id; same call site for
                             RUNNING -> SUCCESS|FAILED transitions
  - list_by_task(task_id, limit=100)   newest first
  - list_all(limit=500)                newest first

InMemoryRunStore:
  - Bounded by maxlen (default 500), FIFO eviction
  - dict + deque pair: O(1) insert/upsert, O(n) filter
  - Update path never triggers eviction (unlike a naive append)
  - Asyncio-safe under a single-loop driver (FastAPI + APScheduler)

Test coverage (tests/test_run_store.py, 11 tests):
  - Protocol conformance (runtime_checkable)
  - Newest-first ordering for list_all and list_by_task
  - Upsert semantics
  - Filtering and limit honoring (including 0 / negative)
  - Eviction order
  - Eviction does not fire on updates to existing runs

No call sites are modified yet; the new module is introduced behind
its Protocol and will be wired into the scheduler in a follow-up
commit on this branch.

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
Adds the MongoDB-backed RunStore implementation announced by the
Protocol introduced in the previous commit. Like InMemoryRunStore it
is fully self-contained and not yet wired into the scheduler — the
swap is deferred to a later commit on this branch so each step
remains small and reviewable.

Implementation (services/run_store.py):
  - MongoRunStore takes a pre-built motor client (caller-owned
    lifecycle, easy to test by injecting AsyncMongoMockClient).
  - record() uses replace_one(upsert=True) keyed by the run's
    pinned _id so RUNNING -> SUCCESS|FAILED transitions update in
    place rather than producing duplicate documents.
  - list_all / list_by_task return newest-first, capped by `limit`,
    using cursor sort + limit (no in-memory slicing).
  - ensure_indexes() is idempotent and creates:
      * unique index on `run_id`
      * compound index on `(task_id ASC, started_at DESC)` to
        support both list_by_task and list_all without a scan.
  - Schema is intentionally identical to TaskRun.model_dump()
    output so future model fields (prompt, agent, llm_provider,
    duration_ms, etc.) flow through automatically.

Test coverage (tests/test_mongo_run_store.py, 13 tests using
mongomock_motor.AsyncMongoMockClient — no real MongoDB required):
  - Constructor input validation (empty db / collection name).
  - Protocol conformance (runtime_checkable).
  - Default collection name pinned to "autonomous_runs".
  - ensure_indexes() idempotency and resulting key specs.
  - Newest-first ordering for list_all and list_by_task using
    explicitly spaced timestamps (avoids BSON's 1ms sort ambiguity
    in tight insert loops).
  - Upsert-in-place semantics (no duplicate docs after re-record).
  - Limit honoring (including 0 / negative).
  - Collection isolation (two stores on the same client see only
    their own data).

Dependencies (pyproject.toml + uv.lock regenerated):
  - motor==3.7.1 (runtime — async MongoDB driver, brings pymongo)
  - mongomock-motor==0.0.36 (dev — in-memory mock for tests)

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
Wires the new RunStore implementations to configuration without yet
swapping the scheduler over. After this commit the service still
behaves exactly as it does today; it just learns *how* to construct
the right store for a given environment.

Settings additions (config.py):
  - mongodb_uri        (env: MONGODB_URI)        — optional
  - mongodb_database   (env: MONGODB_DATABASE)   — optional
  - mongodb_collection (env: MONGODB_COLLECTION) — defaults to
                       "autonomous_runs" so the operator only needs
                       URI + DATABASE for the common case
  - run_history_maxlen (env: RUN_HISTORY_MAXLEN) — bound for the
                       in-memory fallback, defaults to 500

Factory (services/run_store.create_run_store):
  - Returns MongoRunStore iff *both* mongodb_uri and mongodb_database
    are provided; otherwise InMemoryRunStore.
  - Partial Mongo config (URI without DATABASE or vice versa) is
    treated as "Mongo not configured" — silently engaging Mongo on
    half-config would mask typical env-var typos and write to the
    wrong place; silently falling back to in-memory on half-config
    would lose history. Either misbehaviour is operationally worse
    than the current "explicit both-or-neither" rule.
  - No network I/O at construction time (motor's AsyncIOMotorClient
    is lazy), so the factory is safe to call from tests and from
    the FastAPI lifespan startup hook.
  - Settings are passed *explicitly* (not pulled from get_settings()
    inside the factory). This keeps create_run_store reusable
    outside the FastAPI app context and keeps the unit tests free
    of monkeypatching.

Test coverage (tests/test_run_store_factory.py, 8 tests):
  - In-memory fallback when no Mongo settings, when only URI is set,
    when only DATABASE is set, and when URI is the empty string.
  - Mongo store selection when both settings are provided.
  - in_memory_maxlen and mongodb_collection are forwarded correctly.
  - Each call returns a fresh instance (no accidental memoisation).

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
Replaces the scheduler's hard-coded ``deque(maxlen=500)`` with the
RunStore abstraction added in the preceding commits. After this
commit the service uses MongoDB for run history when MONGODB_URI +
MONGODB_DATABASE are set, and the legacy in-memory behaviour
(bounded by RUN_HISTORY_MAXLEN, default 500) when they are not —
*identical* to today's behaviour for any developer who hasn't
opted into Mongo.

Combines what was originally planned as two commits because
``get_run_history()`` is sync but ``RunStore.list_*`` is async; a
sync->async transition can't be split cleanly without leaving the
codebase in a non-working intermediate state for one commit.

scheduler.py:
  - Drops the module-level ``_run_history`` deque and the ``deque``
    import.
  - Adds ``_run_store: RunStore | None`` plus ``get_run_store()``
    (lazy InMemoryRunStore default) and ``set_run_store(store)``
    for the lifespan to inject the configured store.
  - ``_execute_task`` now awaits ``store.record(run)`` twice — once
    when the run starts (RUNNING) and once when it finishes
    (SUCCESS|FAILED). Because RunStore.record is upsert-by-run_id
    this updates the same entry rather than creating duplicates.

routes/tasks.py:
  - ``/tasks/{id}/runs`` and ``/runs`` await store reads instead of
    iterating the in-memory deque. The 404 fallback for
    ``/tasks/{id}/runs`` (only 404 if BOTH unknown task AND no
    historical runs) is preserved verbatim — useful for inspecting
    runs of tasks whose definition was removed from config.yaml.

main.py:
  - The lifespan startup hook builds the appropriate store via
    ``create_run_store(...)``, calls ``ensure_indexes()`` on it
    when it's a MongoRunStore, logs which backend is active, then
    injects it into the scheduler module via ``set_run_store()``.

Test coverage (tests/test_scheduler_run_store.py, 6 tests; mocks
``invoke_agent`` so no live supervisor is needed):
  - ``get_run_store`` lazy default + injection via ``set_run_store``.
  - ``_execute_task`` records exactly one entry on success (upsert
    not duplicate) with the terminal SUCCESS state.
  - Same on failure with the error message captured.
  - The RUNNING state is visible to the store *while* invoke_agent
    is in flight (not only after completion) — this is the value of
    recording twice.
  - The TaskRun returned by ``_execute_task`` is the same object
    as the one in the store; webhook callers depend on this for
    synchronous response payloads.

Behavioural impact:
  - Operators who set MONGODB_URI + MONGODB_DATABASE now get
    persistent, unbounded run history with proper indexes.
  - Operators who don't see no change.
  - The /api/v1/runs and /api/v1/tasks/{id}/runs JSON shape is
    unchanged (still ``list[TaskRun]``).

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
Closes IMP-01 (Persist run history to MongoDB) by:
  - Adding a "Run History Persistence" section to the README that
    explains the two backends (in-memory default vs MongoDB),
    when each is selected, the fallback rule for partial Mongo
    config, the schema, the indexes, and the startup log lines
    operators can grep for.
  - Listing the four new env vars (`MONGODB_URI`,
    `MONGODB_DATABASE`, `MONGODB_COLLECTION`, `RUN_HISTORY_MAXLEN`)
    in the existing Environment Variables table.
  - Removing the IMP-01 entry from the active queue in
    IMPROVEMENTS.md and recording it under `## Done` with a brief
    summary of what landed (touched files, tests, tooling) so the
    audit trail survives even after the entry is eventually
    deleted.

No code changes.

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
Removes the unused ``import pytest`` that was tripping the project's
Ruff F401 check (and, transitively, I001 for the now-misordered
import block). The tests in this module use only plain ``assert``
statements and Pydantic constructors, so ``pytest`` was never needed
as a name in the module namespace.

Pre-existing baseline warning surfaced by CI on PR cnoe-io#3; fixing it
unblocks the linter check for the rest of the IMP-01 follow-up
commits on this branch.

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
A flaky RunStore (e.g. transient MongoDB outage, network blip) used
to abort the scheduled job entirely because the very first
``await store.record(run)`` ran *outside* any try/except. Worse,
since the same coroutine is awaited synchronously by the webhook
router, a Mongo hiccup would surface to external callers as an
HTTP 500 — turning observability infrastructure into a single point
of failure for the agent execution path.

Wrap both record() calls (start-of-run and finally-block) in a new
``_record_safely`` helper that logs at ERROR but never re-raises.
The task itself remains the source of truth for whether work
happened; persistence is best-effort observability.

Test coverage:
- A pathological RunStore that raises on every record() no longer
  prevents the task from completing successfully.
- Both failed record() attempts are still logged at ERROR so
  operators can react.
- The TaskRun returned from _execute_task remains fully populated
  even when the terminal record() blows up (the webhook router
  echoes this object back to HTTP clients).

Codex review feedback on PR cnoe-io#3 (P1).

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
…{id}/runs

Pre-IMP-01 the in-memory deque retained up to 500 runs across all
tasks and ``GET /tasks/{task_id}/runs`` returned every match for a
given task. After moving to the ``RunStore`` abstraction, the
router started calling ``list_by_task(task_id)`` with no explicit
``limit``, so it picked up the protocol's default of 100 — silently
truncating history for any task with more past runs.

Pass an explicit ``limit=_MAX_TASK_RUNS`` (500, matching the legacy
in-memory cap) so behaviour is identical regardless of which
RunStore implementation is active. The constant lives in the
router module so the limit is visible at the API boundary, and
trivially raisable if/when the UI asks for deeper history.

Test coverage (new ``tests/test_tasks_route.py``):
- Asserts the router calls ``list_by_task`` with the explicit cap,
  not the protocol default — direct regression test.
- Confirms a stored task with >100 runs round-trips fully.
- Locks in the existing 404 behaviour for genuinely unknown tasks.
- Locks in that runs for tasks removed from config.yaml are still
  inspectable.
- Covers ``/runs`` (list_all) for parity, asserting it uses the
  500-default when called without params.

Copilot review feedback on PR cnoe-io#3 (P2).

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
The compound ``(task_id ASC, started_at DESC)`` index supports the
per-task query just fine, but Mongo will not walk a compound index
for a sort unless the query also constrains the leading prefix. So
``GET /runs`` (``find({}).sort([("started_at", -1)])``) was falling
back to a full collection scan plus an in-memory sort — a latent
hot path for the upcoming UI integration that surfaces recent runs.

Add a dedicated single-field ``started_at DESC`` index in
``ensure_indexes`` to back the global listing query, and update the
docstring + README so operators see an accurate index inventory.

The cost is one extra B-tree per collection (small — runs are tiny
documents) for an unbounded improvement in worst-case latency on
collections of any meaningful size.

Test coverage:
- ``test_ensure_indexes_is_idempotent`` extended to assert the new
  index is present alongside the existing two; running
  ``ensure_indexes`` twice in a row remains a no-op.

Codex review feedback on PR cnoe-io#3 (P2).

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
pymongo (and therefore motor) defaults to ``tz_aware=False``: it
strips tzinfo from BSON dates on read and returns naive
``datetime`` objects. The write path uses
``datetime.now(timezone.utc)``, so before this fix every TaskRun
round-tripped through MongoRunStore as:

    write: started_at = 2026-04-18T10:00:00+00:00  (tz-aware)
    read:  started_at = 2026-04-18T10:00:00        (naive)

The asymmetry is a latent footgun:

  - Comparing a fresh in-memory TaskRun against one read from
    Mongo (e.g. picking the latest of two candidates) raises
    ``TypeError: can't compare offset-naive and offset-aware
    datetimes``.
  - JSON serialisation drops the trailing ``+00:00`` suffix, so
    the API response shape silently differs depending on whether a
    run is hot-from-the-scheduler or fetched from storage.
  - When a non-UTC operator looks at the data through any tool that
    re-attaches a local tz, the timestamps are misinterpreted.

Build the motor client with ``tz_aware=True, tzinfo=timezone.utc``
in ``create_run_store``. UTC is pinned explicitly so a future host
in a non-UTC tz cannot accidentally turn stored timestamps into
local time.

Test coverage:
- New ``test_mongo_client_is_constructed_with_utc_tzinfo`` patches
  ``AsyncIOMotorClient`` and asserts both kwargs are passed.

Codex review feedback on PR cnoe-io#3.

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
…otor import

Two cleanups Codex flagged on PR cnoe-io#3 — both no-ops for runtime
behaviour, but they remove guarantees that mislead future readers
about the schema and dependency model.

1. Redundant run_id unique index
   ``MongoRunStore.record`` pins ``_id = run_id``, and Mongo's
   automatic ``_id_`` index already enforces uniqueness on that
   field. The explicit ``create_index("run_id", unique=True)``
   call duplicated that guarantee at the cost of an extra B-tree
   on every write. Drop it; uniqueness is preserved by the _id
   index. README and docstring updated to reflect the new index
   inventory and call out *why* run_id needs no dedicated index.

2. Misleading "motor optional" comment
   The original local-import comment claimed motor is optional at
   import time, but motor is a hard runtime dependency declared in
   pyproject.toml. The deferred import is still useful — it keeps
   the protocol/in-memory branches free of motor's import cost and
   isolates any motor incompatibility to environments that
   actually try to use Mongo — but the rationale is import-time
   layering, not optionality. Reworded to say so.

Test coverage:
- ``test_ensure_indexes_does_not_create_redundant_run_id_index``
  — explicit regression: a future developer adding the index back
  trips this immediately.
- ``test_id_field_enforces_run_id_uniqueness`` — proves the
  _id-based uniqueness still holds after the dedicated index is
  gone (two ``record()`` calls with the same run_id collapse into
  one document via upsert).
- Existing index idempotency test rewritten for the 2-index
  inventory.

Codex review feedback on PR cnoe-io#3.

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
…agents-mongo-store

feat(autonomous-agents): persist run history to MongoDB (IMP-01)
…onential backoff

The A2A client previously hard-coded a 300s timeout and zero retries, so a
single 502 from a restarting supervisor failed the whole run permanently.
Wrap the call in tenacity.AsyncRetrying with wait_exponential_jitter and
classify failures so we only retry the ones replay can actually fix:

  * httpx.TransportError        -> retry (no response was produced)
  * httpx.HTTPStatusError 5xx   -> retry (supervisor unhealthy)
  * httpx.HTTPStatusError 4xx   -> propagate (caller-fault, replay wastes
                                  LLM quota without changing the outcome)
  * anything else               -> propagate (don't mask real bugs)

Total attempts per call = 1 + A2A_MAX_RETRIES. Each retry is logged at
WARNING via tenacity.before_sleep_log so retries stay visible to operators.

New Settings fields, all validated to reject non-positive / inf / NaN:
  - A2A_TIMEOUT_SECONDS                (default 300)
  - A2A_MAX_RETRIES                    (default 3, 0 disables retries)
  - A2A_RETRY_BACKOFF_INITIAL_SECONDS  (default 1.0)
  - A2A_RETRY_BACKOFF_MAX_SECONDS      (default 30.0, caps the backoff)

Coverage: 16 new tests across test_a2a_client.py and test_config.py for
the retry classifier, attempt budget exhaustion, the no-retry-on-4xx
guarantee, the A2A error-envelope path, and Settings validation bounds.
The httpx layer is mocked at _post_once so tests are fast and have no
network dependency.

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
Service-wide A2A retry/timeout settings are a sensible global default but
not always the right policy per task. A nightly synthesis prompt may
legitimately need a larger timeout than a 30-second status check; an
expensive "best-effort, do not burn quota" task may want max_retries=0
even when the rest of the system is happy to retry.

Add two optional fields to TaskDefinition:
  - timeout_seconds: float | None  (must be > 0 when set)
  - max_retries:    int   | None  (must be >= 0 when set; 0 means "no retries")

When None (the default), the scheduler falls back to the global
A2A_TIMEOUT_SECONDS / A2A_MAX_RETRIES from Settings. The scheduler now
forwards both values into invoke_agent so the existing per-call override
plumbing in a2a_client picks them up unchanged.

Coverage: 6 new tests in test_models.py covering the default-None
behaviour, accepting valid overrides, max_retries=0 being explicitly
allowed (it is a real "no retry" signal, not a bug), and the validators
rejecting negative max_retries and non-positive timeouts.

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
…ides; cut IMP-02

README:
  - Add four new env-var rows: A2A_TIMEOUT_SECONDS, A2A_MAX_RETRIES,
    A2A_RETRY_BACKOFF_INITIAL_SECONDS, A2A_RETRY_BACKOFF_MAX_SECONDS.
  - Show the optional timeout_seconds / max_retries fields in the
    sample task config.yaml so operators see them in context.
  - New "Supervisor call reliability" section with the failure
    classification table (transport + 5xx retried; 4xx propagated; bug
    types propagated) so it is unambiguous what gets retried and why.

IMPROVEMENTS.md:
  - Cut IMP-02 from Phase 1 and move the audit entry to ## Done with
    the shipping branch and the concrete list of what landed (deps,
    Settings, models, scheduler wiring, tests, docs).

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
Pydantic's gt=0 constraint accepts float('inf'), and PyYAML happily parses
.inf / .nan straight from config.yaml. Either would silently propagate
into httpx and break the per-attempt timeout at runtime.

Adds the same finiteness guard to TaskDefinition.timeout_seconds that
Settings.a2a_timeout_seconds already had, plus a parametric test
covering inf, -inf, and nan. Per-task overrides and the global default
now share the same validation contract.

Addresses Copilot review on PR cnoe-io#4.

Assisted-by: Claude:claude-opus-4.7
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
Made-with: Cursor
tneverlandz7 and others added 23 commits May 12, 2026 00:38
…les in dynamics agents. Tidy up comments and class structure. Remove redundant codes
Standardized the structure of all test files to align with the format used in dynamic agents. This includes tidying up comments, class structures, and removing redundant code to enhance readability and maintainability.

Signed-off-by: Ted Tang
nuotangidle7@gmail.com
…iments

Revert the test isolation conftest and Settings.__init__ rewrites that
attempted to fix CORS-related test failures. Production config.py is
restored to the pre-experiment state.

Also clean up stale test files superseded by merged versions in the
earlier test-cleanup pass (test_scheduler_*, test_tasks_crud_route)
and remove the unused run_supervisor_local.ps1 dev script.

Signed-off-by: tneverlandz7 <nuotangidle7@gmail.com>
The page previously described a different prototype (WebSocket-based
WDM bot). Rewrite it to document the current in-process inbound
bridge: endpoint at POST /api/v1/hooks/webex/events on the
autonomous-agents service (port 8002), required and optional env
vars, local ngrok setup, end-to-end verification steps, and the
failure-mode contract.

Signed-off-by: tneverlandz7 <nuotangidle7@gmail.com>
… service

Removed the standalone `webex_bot` service and integrated its functionality directly into the `autonomous-agents` service. This change simplifies the architecture by eliminating the need for cross-process communication and HMAC verification, as the dispatcher now operates in-process. Updated relevant documentation and configuration to reflect this integration.

Signed-off-by: Your Name <your.email@example.com>
…merge' into prebuild/feat/autonomous-agents-merge

Signed-off-by: Thun78 <kitichartbcc@gmail.com>
Signed-off-by: A-makarim <syedmakarim0.2@gmail.com>
@A-makarim A-makarim force-pushed the prebuild/feat/autonomous-agents-merge branch from 58c4446 to 19a681d Compare May 27, 2026 15:49
@A-makarim A-makarim marked this pull request as ready for review May 27, 2026 15:54
@A-makarim A-makarim requested a review from sriaradhyula as a code owner May 27, 2026 15:54
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19a681ddac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +349 to +351
history = await get_run_store().list_by_task(task_id, limit=_MAX_TASK_RUNS)
if history:
return history
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Gate run-history reads by task ownership

When the UI proxy forwards any authenticated user to /tasks/{id}/runs, this endpoint returns list_by_task before loading the task or calling _assert_task_access. In the per-user ownership model, a logged-in user who knows or guesses another task id can read that task's run history, including prompts, response previews, errors, and captured events; the /runs endpoint below exposes the same data across all tasks. Please apply the same caller/ownership check used by get/update/delete/trigger before returning history.

Useful? React with 👍 / 👎.

Comment on lines +176 to +178
caller_email, _ = _get_caller(request)
if caller_email and task.owner_id is None:
task = task.model_copy(update={"owner_id": caller_email})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stamp new tasks with the authenticated owner

For proxied requests caller_email is set, but this only stamps owner_id when the client omitted it. A non-admin can POST a task with owner_id set to another user's email, causing the task to be stored under that user's ownership and appear in their task list while the creator avoids ownership/audit attribution. Since ownership is the authorization boundary, authenticated creates should ignore/reject client-supplied owner ids and always set it from the trusted header unless this is a deliberate admin-only path.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@suwhang-cisco suwhang-cisco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good thanks! A few comments / questions -

  1. Could you please add two new CI files for the new autonomous agent image like these two?
    1. https://github.com/cnoe-io/ai-platform-engineering/blob/main/.github/workflows/ci-dynamic-agents.yml
    2. https://github.com/cnoe-io/ai-platform-engineering/blob/main/.github/workflows/prebuild-dynamic-agents.yml
  2. I see there is a new env var ENABLE_AUTONOMOUS_AGENTS, but is there a way where we can enable it but only allow certain user groups / admin to use have access to these autonomous agents?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this file into build/: https://github.com/cnoe-io/ai-platform-engineering/tree/main/build where other dockerfiles live?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Development

Successfully merging this pull request may close these issues.

7 participants