Skip to content

Commit 4c48972

Browse files
RHOAIENG-38790: Cursor Rule Files
rh-pre-commit.version: 2.3.2 rh-pre-commit.check-secrets: ENABLED
1 parent f488d3c commit 4c48972

3 files changed

Lines changed: 290 additions & 0 deletions

File tree

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
---
2+
description: Codeflare SDK core context, grounding, and user personas — apply to every chat
3+
globs:
4+
alwaysApply: true
5+
---
6+
7+
# Codeflare SDK — Core context & personas
8+
9+
## What this is
10+
- **Codeflare SDK**: Python SDK for batch resource requesting, Ray clusters, job submission, Kueue. Apache-2.0.
11+
- **Python**: ^3.11 in pyproject.toml; CI runs 3.12.
12+
- **Repo**: [project-codeflare/codeflare-sdk](https://github.com/project-codeflare/codeflare-sdk). Main package: `src/codeflare_sdk/`.
13+
14+
## Project structure
15+
```
16+
src/codeflare_sdk/
17+
__init__.py # Public API — export new public classes/functions here
18+
conftest.py # Global test fixtures (auto-mocks K8s API clients)
19+
common/
20+
kubernetes_cluster/ # Auth, API client, error handling
21+
kueue/ # Local queue listing, default queue resolution
22+
utils/ # Constants, helpers, validation
23+
widgets/ # Jupyter/IPython widgets
24+
ray/
25+
cluster/ # Cluster create/config/status/delete (main entry: Cluster)
26+
rayjobs/ # RayJob submit, tracking, runtime env
27+
client/ # Ray JobSubmissionClient wrapper
28+
vendored/ # DO NOT MODIFY — vendored KubeRay client
29+
tests/
30+
e2e/, e2e_v2/ # E2E (KinD/Kueue/KubeRay); not run in “new code” flow
31+
```
32+
33+
## Public API
34+
- **Export new public classes and functions** in `src/codeflare_sdk/__init__.py`. Do not add public API without listing it there.
35+
36+
## Grounding — avoid hallucination
37+
- **Only use APIs, types, and file paths that exist in this codebase.** If unsure, search the repo (e.g. `codeflare_sdk`, `Cluster`, `ClusterConfiguration`, `get_api_client`) before suggesting code.
38+
- **Do not invent** new modules, env vars, or config keys unless the user explicitly asks to add them.
39+
- **Prefer referencing** existing modules: `codeflare_sdk.ray.cluster`, `codeflare_sdk.common.kueue`, `codeflare_sdk.common.kubernetes_cluster.auth`, etc.
40+
- When adding features, follow existing patterns in the same package (e.g. `kueue.py` for Kueue, cluster code under `ray/cluster/`).
41+
42+
## Conventions (high level)
43+
- New code: type hints, Apache-2.0 header. Tests: pytest; coverage ≥90% project, ≥85% patch. Format: pre-commit.
44+
- **Reuse existing enums and types** for status/state (e.g. `RayClusterStatus`); do not introduce new string-based status fields for concepts already modeled in the codebase.
45+
- **Scope of changes**: Do not refactor, rename, or change code outside the scope of the user's request unless required for the change to work (e.g. a type used by new code). Prefer minimal, targeted edits to keep diffs small and avoid unrelated "improvements" that can break things or waste review time.
46+
47+
---
48+
49+
## User personas
50+
51+
Use these personas to keep code, APIs, and docs user-relevant.
52+
53+
### Cluster / platform admin
54+
- Manages Kubernetes/OpenShift, Kueue, quotas, namespaces.
55+
- Cares about: auth (kubeconfig, OIDC, tokens), RBAC, resource limits, default queues, priority classes.
56+
- Prefer: clear errors, `config_check()`, namespace/queue handling, security and quotas.
57+
58+
### Data scientist / ML engineer
59+
- Runs Ray jobs, uses notebooks, wants minimal YAML/K8s.
60+
- Cares about: `Cluster`/`ClusterConfiguration`, job submission, runtime env, demos (`codeflare_sdk.copy_demo_nbs()`).
61+
- Prefer: simple APIs, good defaults, demos and docs that match notebook workflows.
62+
63+
### Application developer
64+
- Integrates SDK into apps or pipelines.
65+
- Cares about: programmatic API, status checks, timeouts, error handling, idempotency.
66+
- Prefer: stable function signatures, logging, and predictable behavior.
67+
68+
When writing or changing code, consider: "Which persona does this serve?" and keep their use case in mind.
69+
70+
---
71+
72+
## Suggesting rule improvements
73+
74+
At the **end of the conversation**, if there is clear evidence in **this chat** that a new or updated rule would help, suggest one or two concrete improvements for the user to accept or deny.
75+
76+
**Only suggest when:**
77+
- You had to correct the same type of mistake more than once, or
78+
- The user had to explicitly ask for something the existing rules could have enforced, or
79+
- You had to guess on style, structure, or behavior that could be turned into a guardrail.
80+
81+
**How to suggest:**
82+
- Propose **concrete** rule text (a bullet or short paragraph) and say which `.mdc` file it belongs in (e.g. `02-python-standards.mdc`, `03-testing-and-ci.mdc`).
83+
- Do **not** suggest rules that are already covered by the existing `.cursor/rules` content.
84+
- Present suggestions as optional: "You could add the following to …" or "Consider adding a rule: …". The user may accept or deny; do not edit `.mdc` files unless the user explicitly asks you to.
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
---
2+
description: Python coding standards — quality, style, and matching existing codebase patterns
3+
globs: **/*.py
4+
alwaysApply: false
5+
---
6+
# Python coding standards
7+
8+
When adding or editing code, **match the style and patterns of the surrounding code and the same subpackage**. Prefer consistency with existing files over introducing new conventions.
9+
10+
---
11+
12+
## Style & tooling
13+
- **Formatting / pre-commit**: Use pre-commit for all checks. Run `pre-commit run --show-diff-on-failure --color=always --all-files` before committing.
14+
- **Naming**: snake_case for functions, variables, modules; PascalCase for classes.
15+
- **Apache-2.0 header** at top of new files (see any `src/codeflare_sdk/**/*.py`).
16+
- **Type hints** for function parameters and return types (e.g. `Optional[str]`, `List[...]`). Use `typing`: `Optional`, `List`, `Dict`, `Tuple`. Prefer **dataclasses** for configuration or data-holding types (see `ClusterConfiguration` in config.py).
17+
18+
---
19+
20+
## Docstrings
21+
- **Format**: Google-style. One-line summary; then `Args:` with `name (type):` or `name (type, optional):`; then `Returns:`; add `Note:` or `Raises:` if needed. Required for public functions.
22+
- **Example** (match this style):
23+
```
24+
Args:
25+
namespace (str):
26+
The Kubernetes namespace to query.
27+
Returns:
28+
Optional[str]:
29+
The name of the default queue, or None.
30+
```
31+
- Optional multi-line description before Args for non-trivial behavior (see `get_default_kueue_name` in kueue.py).
32+
- **Module docstring**: Optional but used in submodules (e.g. cluster, config, auth) — short description of what the module provides.
33+
34+
---
35+
36+
## Imports
37+
- **Order**: stdlib → third-party → local. Blank line between groups.
38+
- **Local imports**: Use **relative** imports within the same package (e.g. `from ...common.utils import get_current_namespace`, `from .config import ClusterConfiguration`). Use **absolute** `from codeflare_sdk....` when importing from another top-level package or in tests.
39+
- **K8s/auth**: Prefer `from codeflare_sdk.common.kubernetes_cluster.auth import config_check, get_api_client` or `from ...common.kubernetes_cluster.auth import ...`. Use `from codeflare_sdk.common import _kube_api_error_handling` (or relative `...common`) for API error handling.
40+
41+
---
42+
43+
## Structure & naming
44+
- **Private helpers**: Prefix with `_` (e.g. `_fetch_local_queues`, `_find_default_queue_name`). Use for logic that is not part of the public API.
45+
- **Logging**: `logger = logging.getLogger(__name__)` at module level; use `logger` instead of `print` for runtime or debug output. (Some legacy code still uses `print`; new code should use `logger`.)
46+
- **Deprecation**: Use `@deprecated` from `typing_extensions` and `warnings.warn(..., DeprecationWarning, stacklevel=2)`; see `auth.py` for the pattern.
47+
48+
---
49+
50+
## Kubernetes / API errors
51+
- Use **`_kube_api_error_handling(e)`** for `ApiException` (from `kubernetes.client.exceptions` or `kubernetes.client.rest`). Do not add new ad-hoc exception handling patterns; follow kueue.py and build_ray_cluster.py.
52+
- Call **`config_check()`** before K8s API calls when the code path expects a configured client.
53+
- Use **`get_api_client()`** to obtain the client; do not instantiate new Kubernetes clients directly for the default SDK-configured client.
54+
55+
## Parsing Kubernetes / API response dicts
56+
- Assume CR or list/get response fields can be **missing or wrong type**. Use safe access (e.g. `.get()`, `try/except` for KeyError, IndexError, TypeError) and coerce numeric fields with **`int(...)`** where appropriate (e.g. replicas, counts) so string or missing values do not crash.
57+
- When parsing Kubernetes Custom Resources (dictionaries), use **isolated try/except blocks** for distinct sections (e.g. metadata, spec, status). Do not let a missing status field prevent metadata or spec from being parsed.
58+
- **Never** use raw strings to represent application or cluster states. Always search for and reuse existing Enums (e.g. `RayClusterStatus` in `ray/cluster/status.py`). Map invalid or missing values to the enum’s unknown/default (e.g. `RayClusterStatus.UNKNOWN`) inside a try/except; do not introduce new string constants for the same concept.
59+
60+
---
61+
62+
## Canonical examples (copy patterns from these)
63+
- **Kueue / K8s API helpers**: `src/codeflare_sdk/common/kueue/kueue.py` — docstrings, `_helper` functions, ApiException handling, `config_check()`/`get_api_client()`.
64+
- **Cluster / high-level API**: `src/codeflare_sdk/ray/cluster/cluster.py` — class docstrings, relative imports from `...common`, use of `_kube_api_error_handling`.
65+
- **Config / dataclasses**: `src/codeflare_sdk/ray/cluster/config.py` — module docstring, `@dataclass`, Google-style Args with type in parens.
66+
- **Auth**: `src/codeflare_sdk/common/kubernetes_cluster/auth.py` — module docstring, deprecation pattern, abstract base classes.
67+
68+
When in doubt, open the nearest existing file in the same package and mirror its style.
69+
70+
---
71+
72+
## Don’t
73+
- Add new dependencies without updating `pyproject.toml` (Poetry). Use existing test deps: pytest, pytest-mock, pytest-timeout, coverage.
74+
- Change Python version: keep ^3.11 per pyproject.toml.
75+
76+
## Common pitfalls
77+
- **Don’t import from vendored** — use the SDK’s own wrappers (e.g. `RayjobApi` via codeflare_sdk, not raw vendored modules).
78+
- **Don’t hardcode Ray image tags** — use `common/utils/constants.py` (maps Python versions to default images).
79+
- **Don’t skip `config_check()`** — call it before K8s API calls when the code path expects a configured client.
80+
- **Unit test timeout is 900s** — long-running tests are killed after 15 minutes (pyproject.toml).
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
---
2+
description: Testing and coverage when new code is added — pre-commit, unit tests, coverage (no E2E/notebooks)
3+
globs: "**/test_*.py,**/*_test.py,**/tests/**/*.py,src/**/*.py,.github/workflows/*.yml,.github/workflows/*.yaml"
4+
alwaysApply: true
5+
---
6+
7+
# Testing & coverage (new code)
8+
9+
When **new code is added**, run this validation pipeline: **pre-commit**, **unit tests**, and **coverage**. Do **not** run E2E or notebook tests (they take too long).
10+
11+
---
12+
13+
## Cursor: no write/CRUD git commands
14+
15+
- **Cursor MUST NOT run any git commands that modify state** (e.g. no `git commit`, `git push`, `git add`, `git merge`, `git rebase`, `git reset --hard`, etc.). The user runs those themselves.
16+
- **Cursor MAY run read-only git commands** when useful (e.g. `git status`, `git log`, `git diff`, `git show`, `git checkout -- <path>` to restore a file, `git branch -a`, etc.).
17+
---
18+
19+
## Validation when adding new code
20+
21+
- **When to run**: If you added or changed code in this session, run the validation pipeline **once, just before ending the conversation**. Do not run pre-commit, unit tests, or coverage after every small edit; run them only at the end so the user sees a single pass/fail before they leave.
22+
23+
**Quick commands** (unit tests only, no E2E/notebooks; then check **patch** coverage only, not full codebase):
24+
```bash
25+
pre-commit run --show-diff-on-failure --color=always --all-files
26+
poetry install --with test
27+
coverage run --omit="src/**/test_*.py,src/codeflare_sdk/common/utils/unit_test_support.py,src/codeflare_sdk/vendored/**" -m pytest --ignore=tests/e2e --ignore=tests/e2e_v2 --ignore=tests/upgrade --ignore=demo-notebooks --ignore=ui-tests
28+
coverage report -m
29+
# Then, for source files changed in this session only: coverage report -m --include="path/to/changed1.py,path/to/changed2.py" (require ≥85% for that subset; ignore overall %)
30+
```
31+
32+
### 1. Pre-commit
33+
- Run before considering changes done: `pre-commit run --all-files` (or the full command above).
34+
- Do not skip or alter pre-commit hooks.
35+
36+
### 2. Unit tests (no E2E, no notebooks)
37+
- Run **only unit tests**. Do **not** run E2E or notebook tests (they take too long). Exclude: `tests/e2e`, `tests/e2e_v2`, `tests/upgrade`, `demo-notebooks`, `ui-tests`.
38+
- Use the coverage + pytest command from **Quick commands** above.
39+
40+
### 3. Coverage requirements (in this session)
41+
- **Check only patch coverage**: In Cursor, only validate coverage for **code added or changed in this chat** (the “patch”), not the full codebase. Full codebase coverage is enforced in GitHub CI and will already be ≥90% on main; local runs often show low overall % (e.g. below 30%) due to vendored code, kuberay client, and other excluded paths—**do not fail the session on that**.
42+
- **Patch target**: For source files you added or modified in this session, require **≥85%** coverage. After running the coverage commands below, run `coverage report -m --include="path/to/changed1.py,path/to/changed2.py,..."` with the actual paths you changed and ensure that subset is ≥85%. If you only changed tests or config, there is no patch coverage to check.
43+
- **CI / codecov**: GitHub runs full pytest and enforces ≥90% project coverage; codecov.yml uses patch 85%, threshold 2.5%. Ignore: `**/*.ipynb`, `demo-notebooks/**`, `**/__init__.py`.
44+
45+
---
46+
47+
## Unit test stack (do not replace)
48+
- **Runner**: pytest 9.x. **Extras**: pytest-mock, pytest-timeout (default timeout 900s in pyproject.toml). **Coverage**: coverage 7.x.
49+
50+
### Where unit tests live
51+
- Tests next to code: `src/codeflare_sdk/**/test_*.py`. Ignore: `src/codeflare_sdk/vendored/**`, `unit_test_support.py`.
52+
- **conftest.py**: The global `src/codeflare_sdk/conftest.py` auto-mocks K8s API clients; tests inherit these mocks. Override with `mocker` or `monkeypatch` when testing specific K8s/config behavior. Never make real Kubernetes API calls in unit tests.
53+
54+
### Pytest markers (use when relevant)
55+
- `smoke` — quick validation. `tier1` — standard suite.
56+
- `kind`, `openshift`, `nvidia_gpu` — environment-specific (E2E only; do not run in the “new code” flow).
57+
58+
### Writing tests
59+
- Use **mocker** (pytest-mock) for K8s/API calls; see `test_kueue.py` and `test_auth*.py` for patterns.
60+
- **NEVER** hardcode raw Kubernetes JSON payloads in test files. You **MUST** use or extend the helper functions in `src/codeflare_sdk/common/utils/unit_test_support.py` (e.g. `get_ray_obj_with_status`, `get_obj_none`, `get_local_queue`, `create_cluster_config`, `apply_template`).
61+
- New code in `src/codeflare_sdk` should have corresponding tests so patch coverage (for that new code) stays ≥85%; full project coverage is enforced in CI.
62+
- **Edge-case tests for API/CR parsing**: When adding code that parses Kubernetes Custom Resource or list/get API response dicts, add at least one test that uses **malformed or partial payloads**: empty `items`, missing `spec` or `status`, empty or missing nested lists (e.g. `workerGroupSpecs`). Assert safe defaults (e.g. 0 for counts, UNKNOWN or equivalent for status). Test error-handling by mocking the API to raise (e.g. `ApiException`); assert the handler is invoked (e.g. mock `_kube_api_error_handling` and `assert_called_once()`) rather than asserting exact stdout text.
63+
64+
---
65+
66+
## CI workflows (reference only; E2E/notebooks not in “new code” flow)
67+
68+
### Versions (CI env)
69+
- **KUEUE_VERSION**: v0.13.4. **KUBERAY_VERSION**: v1.4.2 (opendatahub-io/kuberay fork, RHOAI features). **Python**: 3.12. **Common repo**: project-codeflare/codeflare-common @ main (KinD/GPU setup).
70+
71+
### Pre-commit (every PR / workflow_dispatch)
72+
- **Run before pushing**: `pre-commit run --all-files`. Image: `quay.io/project-codeflare/codeflare-sdk-precommit:v0.0.1`. Do not skip or alter pre-commit hooks.
73+
74+
### Unit tests (every PR)
75+
- `poetry install --with test` then pytest with coverage ≥90%. No paths-ignore for this workflow.
76+
77+
### RayJob E2E (PR to main, release-*, ray-jobs-feature)
78+
- **Paths-ignore**: docs/**, **.adoc, **.md, LICENSE. **Runner**: gpu-t4-4-core. KinD + NVIDIA GPU operator + Kueue + KubeRay.
79+
- **Command**: `poetry run pytest -v -s ./tests/e2e/rayjob/`
80+
- **RBAC**: sdk-user with limited permissions (rayclusters, rayjobs, localqueues, clusterqueues, resourceflavors, pods, services, secrets, workloads, etc.). Do not assume cluster-admin.
81+
82+
### General E2E (same branches as RayJob E2E)
83+
- **Command**: `poetry run pytest -v -s ./tests/e2e/ -m 'kind and nvidia_gpu'`
84+
- E2E tests in `tests/e2e/` must use `@pytest.mark.kind` and `@pytest.mark.nvidia_gpu` to run in this workflow. Place RayJob e2e in `tests/e2e/rayjob/`; other e2e in `tests/e2e/`. Install for e2e: `poetry install --with test,docs`.
85+
86+
### Guided notebooks (label: test-guided-notebooks)
87+
- KinD + Kueue + KubeRay; no GPU. Notebooks: 0_basic_ray, 4_rayjob_existing_cluster, 5_submit_rayjob_cr. Run with papermill (see Demo notebooks below).
88+
89+
### UI notebooks (labels: test-guided-notebooks or test-ui-notebooks)
90+
- Job: verify-3_widget_example. Playwright (chromium) in `ui-tests/`; notebook `demo-notebooks/guided-demos/3_widget_example.ipynb`.
91+
92+
### Additional notebooks (label: test-additional-notebooks)
93+
- local_interactive.ipynb and ray_job_client.ipynb are **skipped** in CI (mTLS/OpenShift required; not available in KinD).
94+
95+
---
96+
97+
## Demo notebooks & CI
98+
99+
### Notebooks executed in CI
100+
101+
| Notebook | Workflow | Notes |
102+
|----------|----------|--------|
103+
| 0_basic_ray.ipynb | Guided | KinD: namespace='default', dashboard_check=False, remove auth cells |
104+
| 4_rayjob_existing_cluster.ipynb | Guided | KinD: namespace='default', GPU 0, remove oc login cell |
105+
| 5_submit_rayjob_cr.ipynb | Guided | KinD: namespace='default', remove oc login cell |
106+
| 3_widget_example.ipynb | UI | Playwright in ui-tests/; namespace='default', view_clusters('default'), remove auth cells |
107+
108+
**Skipped in CI** (require mTLS/OpenShift): local_interactive.ipynb, ray_job_client.ipynb.
109+
110+
### KinD-specific adaptations (CI applies these)
111+
- **Auth**: Remove cells that do auth/login (e.g. "Create authentication object for user permissions", `auth.logout()`, `oc login`) — KinD doesn't support token auth the same way.
112+
- **Namespace**: Use `namespace='default'` where the SDK needs it. Replace `namespace="your-namespace"` with `namespace="default"` in notebooks.
113+
- **Dashboard**: Use `cluster.wait_ready(dashboard_check=False)` in KinD (no HTTPRoute/Route).
114+
- **GPU**: In KinD jobs without GPU, set GPU requests to 0 (e.g. `head_extended_resource_requests={'nvidia.com/gpu':0}`).
115+
- **Widget**: For 3_widget_example, call `view_clusters('default')` with explicit namespace.
116+
117+
When editing guided demos, keep them runnable on both real OpenShift and KinD; CI runs on KinD with the above edits applied in the workflow.
118+
119+
### How notebooks are run
120+
- **Guided**: `poetry run papermill <notebook>.ipynb <notebook>_out.ipynb --log-output --execution-timeout 600` from `demo-notebooks/guided-demos`. Install: `poetry install --with test,docs`; for papermill also `pip install papermill ipython ipykernel`.
121+
- **UI**: From `ui-tests/`, `poetry run yarn test` (Playwright). Dependencies: `yarn install`, `yarn playwright install chromium`.
122+
123+
### Adding a new notebook that should run in CI
124+
- **Guided**: Add a job in the Guided notebooks workflow (similar to verify-0_basic_ray), apply the same KinD adaptations in the workflow steps.
125+
- **UI**: Add to ui-tests and ensure 3_widget_example pattern (namespace, auth removal) if it uses cluster/widget APIs.
126+
- Do not rely on mTLS or OpenShift-only features if the notebook should run in current KinD CI.

0 commit comments

Comments
 (0)