Skip to content

feat(BA-6581): wire IdleCheckerRepository into idle-check reconciler source#12398

Open
seedspirit wants to merge 9 commits into
mainfrom
feat/BA-6581
Open

feat(BA-6581): wire IdleCheckerRepository into idle-check reconciler source#12398
seedspirit wants to merge 9 commits into
mainfrom
feat/BA-6581

Conversation

@seedspirit

@seedspirit seedspirit commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add IdleCheckerRepository (with its DB source and result types) that fetches a normalized IdleCheckSnapshot of sessions, their applicable scopes, scope→checker bindings, and typed checker specs.
  • Wire the repository through the orchestration DI (composer / sokovan input) and the reconciler stage factory so IdleCheckSource fetches real data instead of returning a placeholder.
  • Model the checker spec as a single IdleCheckerSpec (a type discriminator plus optional per-kind sub-specs session_lifetime / network / utilization), stored in the idle_checkers.spec JSONB column via the existing PydanticColumn. A model_validator enforces that exactly the matching sub-spec is present for the chosen type. This replaces the previous ABCColumn + hand-written serialize/load dispatch. Concrete spec fields land with the checker-logic stories.
  • Move the spec and its CheckerType discriminator out of the ORM layer to common/data/idle_checker/types.py (pure Pydantic, ORM-agnostic). Manager-internal projections (ScopeType, ScopeRef, IdleCheckSessionView) stay in manager/data.

No DB migration: both ABCColumn and PydanticColumn map to JSONB and the idle_checkers table is unreleased.

Incidental test-infra fix

  • test: eagerly call ensure_all_tables_registered() in the shared tests/unit/manager/repositories/conftest.py.
  • Repository tests register only part of the model graph transitively (e.g. DomainRow via query scopes). SQLAlchemy's global configure_mappers() then fails to resolve string relationships across the rest of the graph (DomainRow → "ScalingGroupForDomainRow" → "AgentRow" → …). On main this passed only by lucky Pants shard composition; adding a new test target in this PR reshuffled the shards and exposed it. Eager full-model registration makes mapper configuration independent of test sharding. (Verified: the affected repositories/base|rbac|ops tests fail on pure main in isolation too.)

Test plan

  • pants fmt / fix / lint / check / test on changed targets
  • Previously-failing repositories/base|rbac|ops tests + idle_checker tests pass locally
  • Full --changed-since suite passes via the pre-push hook
  • CI green

Resolves BA-6581

🤖 Generated with Claude Code

@github-actions github-actions Bot added the size:L 100~500 LoC label Jun 24, 2026
@github-actions github-actions Bot added the comp:manager Related to Manager component label Jun 24, 2026
seedspirit added a commit that referenced this pull request Jun 24, 2026
@seedspirit seedspirit marked this pull request as ready for review June 24, 2026 09:37
@seedspirit seedspirit requested a review from a team as a code owner June 24, 2026 09:37
Copilot AI review requested due to automatic review settings June 24, 2026 09:37

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR replaces the placeholder idle-check reconciler source with a real data path. It introduces an IdleCheckerRepository (backed by a new IdleCheckerDBSource) that fetches a normalized IdleCheckSnapshot — sessions, their applicable scopes (resource group / project / domain), scope→checker bindings, and typed checker specs — and wires this repository through the orchestration DI and the reconciler stage factory so IdleCheckSource.fetch_reconcile_info returns the snapshot instead of an empty placeholder. It also makes IdleCheckerSpecABC.load dispatch by checker_type into concrete (currently field-less) spec classes that land their fields in follow-up stories. The change is read-only; the handler/applier remain placeholders.

Changes:

  • New idle_checker repository layer (repository, repositories, db_source, result types) producing IdleCheckSnapshot, plus new ScopeType/ScopeRef/IdleCheckSessionView data types.
  • DI wiring of idle_checker_repository through Repositories, both orchestration composers, sokovan input, and the reconciler stage factory, with IdleCheckReconcileInfo switched from session_ids to snapshot.
  • IdleCheckerSpecABC.load now dispatches on checker_type to SessionLifetimeSpec / NetworkTimeoutSpec / UtilizationSpec.

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
repositories/idle_checker/db_source/db_source.py New DB reads assembling the normalized snapshot (joins + correlated enabled-binding filter).
repositories/idle_checker/repository.py Thin repository delegating to the db_source.
repositories/idle_checker/repositories.py IdleCheckerRepositories container following the per-entity wrapper convention.
repositories/idle_checker/types.py IdleCheckerRef, IdleCheckerBindingRef, IdleCheckSnapshot result types.
repositories/repositories.py Registers idle_checker in the aggregate Repositories.
data/idle_checker/types.py Adds ScopeType, ScopeRef, IdleCheckSessionView.
models/idle_checker/spec.py load discriminated dispatch + concrete spec subclasses.
sokovan/idle_check/source.py Injects repository; fetches real snapshot.
sokovan/idle_check/types.py IdleCheckReconcileInfo carries snapshot; entity_ids derives from it.
sokovan/stages/idle_check.py build_idle_check_stage now takes the repository.
sokovan/stages/factory.py Threads idle_checker_repository into stage assembly.
dependencies/orchestration/composer.py, dependencies/orchestration/sokovan.py, dependencies/composer.py DI plumbing of the new repository.
tests/.../idle_check/test_source.py, test_stage.py, dependencies/orchestration/test_sokovan.py, test_composer.py Update/add tests for the new repository argument and source behavior.
changes/12398.feature.md Changelog entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +31 to +34
async def fetch_idle_check_snapshot(
self, session_statuses: Collection[SessionStatus]
) -> IdleCheckSnapshot:
"""Fetch sessions and scope-bound idle checkers without expanding to session-checker rows."""
@seedspirit seedspirit marked this pull request as draft June 24, 2026 10:14
@seedspirit seedspirit marked this pull request as ready for review June 25, 2026 01:54
@seedspirit seedspirit requested a review from HyeockJinKim June 25, 2026 02:02
@github-actions github-actions Bot added size:XL 500~ LoC comp:common Related to Common component and removed size:L 100~500 LoC labels Jun 29, 2026
seedspirit added a commit that referenced this pull request Jun 29, 2026
Comment thread src/ai/backend/common/data/idle_checker/types.py Outdated
Comment on lines +49 to +59
sa.select(
SessionRow.id.label("session_id"),
SessionRow.created_at.label("session_created_at"),
SessionRow.starts_at.label("session_starts_at"),
ScalingGroupRow.id.label("resource_group_id"),
SessionRow.group_id.label("project_id"),
DomainRow.id.label("domain_id"),
)
.select_from(SessionRow)
.join(DomainRow, SessionRow.domain_name == DomainRow.name)
.join(ScalingGroupRow, SessionRow.scaling_group_name == ScalingGroupRow.name)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use a JOIN like this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to look up the ID in Scope. Since the current session row only contains the domain name and resource group name, a join is required to determine their IDs.

Comment thread src/ai/backend/manager/repositories/idle_checker/db_source/db_source.py Outdated
Comment thread src/ai/backend/manager/sokovan/idle_check/types.py Outdated
Comment thread src/ai/backend/manager/repositories/idle_checker/types.py Outdated
seedspirit added a commit that referenced this pull request Jun 30, 2026
@github-actions github-actions Bot added size:L 100~500 LoC size:XL 500~ LoC and removed size:XL 500~ LoC size:L 100~500 LoC labels Jun 30, 2026
Comment thread src/ai/backend/manager/data/idle_checker/types.py

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I don't want to comment on naming, but wasn't it the convention to use the Data suffix for these data pattern types?

Is there a reason for omitting the Data suffix here?

I think it would be clearer if the type's purpose could be identified from its name alone.

jopemachine
jopemachine previously approved these changes Jul 1, 2026
@seedspirit seedspirit force-pushed the feat/BA-6581 branch 5 times, most recently from 1869788 to 9a884c5 Compare July 3, 2026 06:44
fregataa
fregataa previously approved these changes Jul 3, 2026
# today that is RUNNING only. Kept as a parameter so the policy stays in the
# stage's target_statuses rather than baked into this read.
SessionRow.status.in_(session_statuses),
SessionRow.session_type != SessionTypes.INFERENCE,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition (session_type) also can be a reconcile target

seedspirit and others added 9 commits July 3, 2026 17:32
…source

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tainer

Construct it in an IdleCheckerRepositories container and pass it through
OrchestrationInput like the other repositories, instead of building it
inline in the orchestration composer.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…k read

- Read resource_group_id/domain_id from SessionRow columns (BA-6644)
  instead of joining DomainRow/ScalingGroupRow in the batch fetch,
  correlating the enabled-binding EXISTS subquery on those columns too
- Move the session_type filter into the stage's target_statuses as
  session_types, matching how session_statuses is owned by the stage
- Populate the new NOT NULL scope-id columns in the repository fixtures

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment on lines +44 to +49
# Inference sessions follow the deployment/routing lifecycle, not idle termination.
session_types=frozenset({
SessionTypes.INTERACTIVE,
SessionTypes.BATCH,
SessionTypes.SYSTEM,
}),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the system session targeted?

Comment on lines +124 to +143
def _enabled_binding_exists_query(self) -> sa.sql.elements.ColumnElement[bool]:
return sa.exists().where(
sa.and_(
IdleCheckerBindingRow.enabled == sa.true(),
sa.or_(
sa.and_(
IdleCheckerBindingRow.scope_type == ScopeType.RESOURCE_GROUP.value,
IdleCheckerBindingRow.scope_id == SessionRow.resource_group_id,
),
sa.and_(
IdleCheckerBindingRow.scope_type == ScopeType.PROJECT.value,
IdleCheckerBindingRow.scope_id == SessionRow.group_id,
),
sa.and_(
IdleCheckerBindingRow.scope_type == ScopeType.DOMAIN.value,
IdleCheckerBindingRow.scope_id == SessionRow.domain_id,
),
),
)
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little disappointed with the query binding part—wouldn't it make more sense to query based on IdleChecker or the session?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

comp:common Related to Common component comp:manager Related to Manager component size:XL 500~ LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants