Skip to content

feat(FR-2451): add auto scaling rule management feature spec#6356

Open
agatha197 wants to merge 9 commits intomainfrom
04-03-feat_add_auto_scaling_rule_management_feature_spec
Open

feat(FR-2451): add auto scaling rule management feature spec#6356
agatha197 wants to merge 9 commits intomainfrom
04-03-feat_add_auto_scaling_rule_management_feature_spec

Conversation

@agatha197
Copy link
Copy Markdown
Contributor

@agatha197 agatha197 commented Apr 2, 2026

Note

This is for v26.5.0

Resolves #6357 (FR-2451)

Summary

  • Add full-scope draft spec for Auto Scaling Rule management feature
  • Admin Serving page: Prometheus Query Preset CRUD for superadmins (create/read/update/delete via dedicated tab)
  • Service detail page: single/range condition mode selector, Prometheus preset-based metric selection, query result preview
  • Updated to reflect schema changes:
    • Query APIs are user-accessible (prometheusQueryPresets, prometheusQueryPreset, prometheusQueryPresetResult), mutations remain admin-only
    • AutoScalingRule.queryPreset: QueryDefinition field added (26.4.3) — enables direct preset resolution without ID roundtrip
    • QueryDefinition.category: QueryPresetCategory field added (26.4.3) — preset grouping in UI dropdowns
    • New category-related queries/mutations (prometheusQueryPresetCategories, adminCreatePrometheusQueryPresetCategory, etc.)
    • QueryDefinition extended with description, rank, categoryId fields

Notes

Spec

See `.specs/draft-auto-scaling-rule-management/spec.md`

Copy link
Copy Markdown
Contributor Author

agatha197 commented Apr 2, 2026


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • flow:merge-queue - adds this PR to the back of the merge queue
  • flow:hotfix - for urgent changes, fast-track this PR to the front of the merge queue

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has required the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions github-actions Bot added the size:L 100~500 LoC label Apr 2, 2026
@agatha197 agatha197 changed the title feat: add auto scaling rule management feature spec feat(FR-2451): add auto scaling rule management feature spec Apr 2, 2026
@agatha197 agatha197 marked this pull request as ready for review April 2, 2026 15:25
Copilot AI review requested due to automatic review settings April 2, 2026 15:25
@agatha197 agatha197 marked this pull request as draft April 2, 2026 15:25
@agatha197 agatha197 requested a review from yomybaby April 2, 2026 15:26
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a draft feature specification for Auto Scaling Rule management in the Admin Serving area, covering Prometheus Query Preset CRUD (superadmin) and preset-based metric selection + condition mode enhancements in the service detail auto-scaling rule editor.

Changes:

  • Introduces a new spec document describing Admin Serving tab split and Preset CRUD flows.
  • Specifies service-detail modal UX changes (preset metric selection, single vs range condition mode).
  • Documents (current) dependency on an admin-only query result preview API.

Comment thread .specs/draft-auto-scaling-rule-management/spec.md Outdated
Comment thread .specs/draft-auto-scaling-rule-management/spec.md Outdated
Comment thread .specs/draft-auto-scaling-rule-management/spec.md Outdated
Comment thread .specs/draft-auto-scaling-rule-management/spec.md Outdated
Comment thread .specs/draft-auto-scaling-rule-management/spec.md Outdated
Comment thread .specs/draft-auto-scaling-rule-management/spec.md Outdated
@agatha197 agatha197 force-pushed the 04-03-feat_add_auto_scaling_rule_management_feature_spec branch 3 times, most recently from ad4647a to 1314f5f Compare April 7, 2026 08:12
@agatha197 agatha197 changed the base branch from main to graphite-base/6356 April 7, 2026 08:53
@agatha197 agatha197 force-pushed the 04-03-feat_add_auto_scaling_rule_management_feature_spec branch from 1314f5f to 671fe73 Compare April 7, 2026 08:53
@agatha197 agatha197 changed the base branch from graphite-base/6356 to 04-07-feat_fr-2494_add_auto_scaling_rule_ux_improvements_spec April 7, 2026 08:53
@agatha197 agatha197 changed the base branch from 04-07-feat_fr-2494_add_auto_scaling_rule_ux_improvements_spec to graphite-base/6356 April 7, 2026 09:11
@agatha197 agatha197 force-pushed the graphite-base/6356 branch from a6b71eb to c20cb58 Compare April 7, 2026 09:31
@agatha197 agatha197 force-pushed the 04-03-feat_add_auto_scaling_rule_management_feature_spec branch 2 times, most recently from 52937eb to d147436 Compare April 9, 2026 02:34
@agatha197 agatha197 force-pushed the graphite-base/6356 branch from c20cb58 to 60b5a7a Compare April 9, 2026 02:34
agatha197 and others added 8 commits April 23, 2026 10:31
…6459)

Resolves #6453 ([FR-2483](https://lablup.atlassian.net/browse/FR-2483)) — sub-task of [FR-2470](https://lablup.atlassian.net/browse/FR-2470).

- Fix the root-cause blocker where `EduAppLauncherPage` calls `useApiEndpoint()` but has never gone through login, so `apiEndpoint` resolves to an empty string
- Resolve endpoint from `config.toml` at page level via `useEduAppApiEndpoint()`
- Ensure the Backend.AI client is initialized before any downstream Relay query runs

- `react/src/pages/EduAppLauncherPage.tsx`
- `react/src/components/EduAppLauncher.tsx`

- Technical Requirements #1 — `apiEndpoint` never empty
- Must-Have: endpoint initialization with `wsproxy.proxyURL`
- Out of Scope: `_token_login` untouched

- [ ] Open EduAppLauncher page without prior login — endpoint should resolve from `config.toml`
- [ ] No suspense error from Relay query firing before client is ready

[FR-2483]: https://lablup.atlassian.net/browse/FR-2483?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[FR-2470]: https://lablup.atlassian.net/browse/FR-2470?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
…aged state machine (#6460)

Resolves #6454 ([FR-2484](https://lablup.atlassian.net/browse/FR-2484)) — sub-task of [FR-2470](https://lablup.atlassian.net/browse/FR-2470).

- Replace REST-based session lookup (`computeSession.list`) with a Relay `useLazyLoadQuery` over `ComputeSessionNode` so the returned fragment ref can feed `useBackendAIAppLauncher` in the next sub-task
- Refactor `EduAppLauncher` into a staged state machine (auth → session → launch)
- Add explicit session-creation error classification (resource shortage, missing template, 408 timeout, duplicate image session, other) — rendering lands in FR-2487

- `react/src/components/EduAppLauncher.tsx`
- `react/src/__generated__/EduAppLauncher*.graphql.ts` (generated)

- Technical Requirements #3 and #4 — session lookup via Relay, fragment data available
- Acceptance Criteria: 5-bucket session-creation error classification (rendering is FR-2487)
- Preserves existing token auth / reuse / creation flow

- [ ] Existing session reuse path still works
- [ ] New session creation path still works
- [ ] State transitions traced via debug logs

[FR-2484]: https://lablup.atlassian.net/browse/FR-2484?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[FR-2470]: https://lablup.atlassian.net/browse/FR-2470?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
…cation (#6462)

Resolves #6456 ([FR-2486](https://lablup.atlassian.net/browse/FR-2486)) — sub-task of [FR-2470](https://lablup.atlassian.net/browse/FR-2470).

- Remove the `CustomEvent('add-bai-notification')` workaround and the `NotificationForAnonymous` bridge in favor of the native `useSetBAINotification` hook
- `EduAppLauncherPage` is already wrapped by antd `App` via `DefaultProvidersForReactRoot`, so `useSetBAINotification` works directly without a custom event hop

- `react/src/components/EduAppLauncher.tsx`
- `react/src/pages/EduAppLauncherPage.tsx`

- Technical Requirements #5 and #6 — all notifications via `upsertNotification`; no `dispatchEvent('add-bai-notification')` in `EduAppLauncher.tsx`
- Must-Have: remove `NotificationForAnonymous` dependency from `EduAppLauncherPage`

- [ ] Success notifications show correctly
- [ ] Error notifications show correctly (persistent for errors)
- [ ] No stale CustomEvent listeners remain

[FR-2486]: https://lablup.atlassian.net/browse/FR-2486?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[FR-2470]: https://lablup.atlassian.net/browse/FR-2470?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
Resolves #6457 ([FR-2487](https://lablup.atlassian.net/browse/FR-2487)) — sub-task of [FR-2470](https://lablup.atlassian.net/browse/FR-2470).

- Render a centered Ant Design `Card` with `Steps` (인증 확인 → 세션 확인/생성 → 앱 실행) that visualizes progress, per-step errors, and the final completion state
- Map classified session-creation errors from FR-2484 to per-step messages
- Map service-port-missing errors from the hook to an 앱 실행 step error
- Open the launched app in a new window (`window.open(..., '_blank')`), leaving the original page on the completion message

- `react/src/components/EduAppLauncher.tsx`
- `resources/i18n/*.json` (minimal additions)

- UI/UX Acceptance Criteria (all 5)
- Technical Acceptance: error classification rendering
- Technical Acceptance: `service_ports` missing message
- Must-Have: new window + completion state

- [ ] Steps render correctly at each stage
- [ ] Error at each step shows the correct classified message
- [ ] On success, new window opens and page shows completion state

[FR-2487]: https://lablup.atlassian.net/browse/FR-2487?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[FR-2470]: https://lablup.atlassian.net/browse/FR-2470?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
…ning (#6498)

Resolves #6497 ([FR-2496](https://lablup.atlassian.net/browse/FR-2496)) — sub-task of [FR-2470](https://lablup.atlassian.net/browse/FR-2470).

Follow-up polish on the EduAppLauncher refactor stack. Three related improvements surfaced during manual testing of the completed FR-2483FR-2487 work.

When the launch completes, the original code only auto-opens the app window. Browsers with popup blockers silently drop the `window.open` call. This PR:

- Stores the resolved `appConnectUrl` in the `done` stage so the UI knows where to navigate
- Adds a visible primary button ("If a new window did not open automatically, click here.") on the completion card
- Adds a secondary "Refresh page" action

- Remove manual `useCallback` / `useMemoizedFn` wrappers now that the file uses the `'use memo'` directive (React Compiler memoizes automatically)
- Wrap the launch-effect body in `useEffectEvent` so the `useEffect` deps stay precisely `[active, apiEndpoint]` — no more `// eslint-disable-next-line react-hooks/exhaustive-deps`
- The `onLaunchEffect` block is placed **after** `_launch` is declared to respect the temporal-dead-zone rule (caught by `react-hooks/immutability` lint)

Root-cause fix: the Backend.AI client initializes `_config.proxyURL = null` by default, and `config.toml` can ship `wsproxy.proxyURL = ""` for environments that rely on the local default. The previous `!== undefined` check let both null and empty-string pass through, which later crashed on `url.endsWith(...)`. Switch to a truthy check so both fall through to the default `http://127.0.0.1:5050/`.

Also adds structured debug logging through the wsproxy launch path (`getWSProxyVersion`, `getProxyURL`, `_launchApp`) so the version/URL/wsproxy response chain is observable from the browser console.

New keys `eduAppLauncher.OpenAppInNewWindow` and `eduAppLauncher.RefreshPage` added in all 21 supported languages.

Codifies when to reach for `useEffectEvent` over `useCallback` under the `'use memo'` directive. Captures the temporal-dead-zone gotcha that this PR itself tripped on.

SSH/VS Code Desktop modal apps, Edu API runtime state handling (inherited from FR-2470 out-of-scope list).

- [ ] With a popup-blocker enabled: completion card shows the fallback button and it navigates correctly on click
- [ ] Without a popup-blocker: original `window.open` path still works; completion card still renders the fallback as a recovery option
- [ ] `config.toml` with `wsproxy.proxyURL = ""` no longer crashes; falls through to `http://127.0.0.1:5050/`
- [ ] `config.toml` with a real proxy URL still honors it
- [ ] Browser console shows the new `[wsproxy]` / `[_launchApp]` logs at `info` level
- [ ] No `useCallback` / `useMemoizedFn` regressions in `EduAppLauncher.tsx`

[FR-2496]: https://lablup.atlassian.net/browse/FR-2496?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
[FR-2470]: https://lablup.atlassian.net/browse/FR-2470?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
…6452)

Resolves #6443 (FR-2467)

- Disable commit button when session status is not RUNNING
- Previously only checked isActive() which allowed non-running states like PREPARING, PENDING, PULLING to enable the button

- [ ] Verify commit button is disabled when session is in PREPARING state
- [ ] Verify commit button is disabled when session is in PENDING state
- [ ] Verify commit button is enabled when session is in RUNNING state
- [ ] Verify commit button remains disabled for non-owners regardless of session status

🤖 Generated with [Claude Code](https://claude.com/claude-code)
@agatha197 agatha197 force-pushed the 04-03-feat_add_auto_scaling_rule_management_feature_spec branch from 7f6e62e to f2f0030 Compare April 23, 2026 01:42
@agatha197 agatha197 marked this pull request as draft April 23, 2026 01:44
@agatha197 agatha197 force-pushed the 04-03-feat_add_auto_scaling_rule_management_feature_spec branch from f2f0030 to 5dea801 Compare April 23, 2026 01:58
@agatha197 agatha197 marked this pull request as ready for review April 23, 2026 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L 100~500 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Adding Prometheus Query Preset CRUD page (admin)

4 participants