Skip to content

Commit d6d4da1

Browse files
authored
Merge pull request #580 from KhiopsML/577-add-comprehensive-instructions-for-khiops-python-development
577 add comprehensive instructions for khiops python development
2 parents 5257dbe + f007510 commit d6d4da1

7 files changed

Lines changed: 644 additions & 55 deletions

File tree

.github/copilot-instructions.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# Copilot Instructions for khiops-python
2+
3+
Use this file as the shared repository guide. When you work in a path covered by a
4+
scoped instruction file, apply both this document and the matching file in
5+
`.github/instructions/`.
6+
7+
## Scoped Instruction Files
8+
9+
- `.github/instructions/python-changes.instructions.md` — Python source and test
10+
changes (`**/*.py`)
11+
- `.github/instructions/docker-changes.instructions.md` — development Docker image
12+
changes (`packaging/docker/khiopspydev/**`)
13+
- `.github/instructions/doc-changes.instructions.md` — documentation source changes
14+
(`doc/**`)
15+
- `.github/instructions/ci-workflows.instructions.md` — GitHub Actions workflow
16+
changes (`.github/workflows/**`)
17+
18+
## Architecture
19+
20+
Khiops Python is a Python interface to the **Khiops AutoML suite** for building
21+
supervised models (classifiers, regressors, encoders) and unsupervised models
22+
(coclusterings). It provides two ways to use Khiops from Python:
23+
24+
- **`khiops.core`** — The low-level API that drives the Khiops binaries via
25+
dictionary files (`.kdic`, `.kdicj`) and tabular data files. The code which implements this API must depend only on Python built-in modules.
26+
- `core.api` — public functions such as `train_predictor` and
27+
`train_recoder`
28+
- `core.dictionary` — data classes for Khiops dictionary files (in the
29+
`.kdic` and JSON `.kdicj` formats)
30+
- `core.analysis_results` — data classes for Khiops JSON analysis reports
31+
(`.khj`)
32+
- `core.coclustering_results` — data classes for Khiops coclustering report
33+
files (`.khcj`)
34+
- `core.internals.runner` — backend abstraction for local, Docker, and other
35+
execution modes, configurable with `get_runner()` and `set_runner()`
36+
- `core.internals.filesystems` — filesystem abstraction for local, S3, GCS and
37+
Azure access
38+
- `core.internals.task`, `core.internals.tasks` — task definitions for
39+
Khiops operations
40+
- **`khiops.sklearn`** — Scikit-Learn compatible estimators built on top of
41+
`khiops.core`. The code which implements these estimators may depend on Pandas and Scikit-learn only.
42+
```
43+
KhiopsEstimator(ABC, BaseEstimator)
44+
├── KhiopsCoclustering(ClusterMixin)
45+
└── KhiopsSupervisedEstimator
46+
├── KhiopsPredictor
47+
│ ├── KhiopsClassifier(ClassifierMixin)
48+
│ └── KhiopsRegressor(RegressorMixin)
49+
└── KhiopsEncoder(TransformerMixin)
50+
```
51+
- `sklearn.dataset` — normalizes DataFrames, file paths, and multi-table
52+
dictionaries into Khiops-compatible datasets
53+
- **`khiops.extras`** — Optional integrations such as the Docker runner
54+
- **`khiops.tools`** — Miscellaneous utility tools and CLI entry points
55+
- **`khiops.samples`** — Sample scripts, also used to generate parts of the
56+
documentation via `doc/convert-samples-hook`
57+
58+
Keep changes inside these layer boundaries.
59+
60+
## Shared Conventions
61+
62+
### Dependency Rules
63+
64+
- Do not add new external dependencies without discussion. Minimize external
65+
package dependencies to reduce installation problems.
66+
- Development and documentation generation dependencies (e.g., `black`,
67+
`isort`, `sphinx`, `wrapt`, `furo`) can be more permissive, but still avoid
68+
unnecessary additions.
69+
- Test dependencies are listed in `test-requirements.txt` (`coverage`, `wrapt`).
70+
Package dependencies are extracted from `pyproject.toml` at CI time via
71+
`scripts/extract_dependencies_from_pyproject_toml.py`.
72+
73+
### Python Support Policy
74+
75+
- CI tests run against Python 3.10–3.14.
76+
77+
### Versioning
78+
79+
The project uses `MAJOR.MINOR.PATCH.INCREMENT[-PRE_RELEASE]`, where
80+
`MAJOR.MINOR.PATCH` tracks the compatible Khiops native version and `INCREMENT`
81+
tracks the Python package's own evolution.
82+
83+
For Pip and Conda packages, the dash before the pre-release atom is removed to
84+
comply with
85+
[Python version specifiers](https://packaging.python.org/en/latest/specifications/version-specifiers/#version-specifiers)
86+
(e.g., `11.0.0.2a1` instead of `11.0.0.2-a.1`).
87+
88+
## License
89+
90+
BSD 3-Clause-Clear. See `LICENSE.md`.
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
---
2+
applyTo: ".github/workflows/**"
3+
---
4+
5+
# CI Workflow Changes
6+
7+
Use these rules for files under `.github/workflows/`. Apply the shared guidance
8+
from `.github/copilot-instructions.md` first, then this workflow-specific
9+
guidance.
10+
11+
## Workflow Overview
12+
13+
This repository has six GitHub Actions workflows in `.github/workflows/`. Most
14+
workflows use concurrency groups to cancel in-progress runs when superseded,
15+
except `api-docs.yml` (which uses a `pages` concurrency group that does not
16+
cancel in-progress runs).
17+
18+
### `quick-checks.yml`
19+
20+
Runs pre-commit hooks on every pull request and on `workflow_dispatch`. The
21+
hooks (configured in
22+
`.pre-commit-config.yaml`) are: Black, pylint, isort (with special no-sections
23+
config for sample files), yamlfix, shellcheck, GitHub workflow/action schema
24+
validation (`check-github-workflows`, `check-github-actions`), and a local
25+
`samples-generation` hook that regenerates reST samples when
26+
`khiops/samples/samples.py` or `khiops/samples/samples_sklearn.py` change.
27+
28+
### `tests.yml`
29+
30+
The main test suite. Triggers on PRs that touch `khiops/**/*.py`,
31+
`tests/**/*.py`, `tests/resources/**` (excluding `tests/resources/**/*.md`), or
32+
the workflow file itself. Also supports `workflow_dispatch`.
33+
34+
Three job groups:
35+
36+
- **`run`** (Linux matrix): Runs across Python 3.10–3.14 in custom Docker
37+
containers (`ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04`). Each
38+
Python version uses a dedicated Conda environment with native Khiops.
39+
Coverage is collected with `coverage` and reported as XML. Test results use
40+
JUnit XML via `unittest-xml-reporting`.
41+
- **`check-khiops-integration-on-linux`**: Runs integration tests on multiple
42+
Linux containers (ubuntu22.04, rocky8, rocky9, debian13). Validates Khiops
43+
status, runs samples, tests major-version mismatch detection with a
44+
`py3_khiops10_conda` environment, and runs the integration test suite.
45+
- **`check-khiops-integration-on-windows`**: Installs Khiops Desktop via NSIS
46+
installer on Windows 2022 with Python 3.12. Runs integration tests and
47+
samples outside a Python virtual environment, then installs khiops-python
48+
inside a venv and validates the installation status.
49+
50+
**Expensive tests** (remote file access with S3/GCS/Azure): Skipped by default
51+
on feature branches. Enabled on `main`/`main-v10` branches or via the
52+
`run-expensive-tests` workflow dispatch input. These require GCP Workload
53+
Identity Federation, a local fake S3 server, and Azure storage credentials.
54+
55+
**Environment variables**: `KHIOPS_SAMPLES_DIR` points to a checkout of
56+
`khiopsml/khiops-samples`. `KHIOPS_PROC_NUMBER=4` forces MPI multi-process
57+
execution. MPI oversubscribe flags are set for Open MPI 4.x and 5+.
58+
59+
### `pip.yml`
60+
61+
Builds an **sdist** package (no wheel) and tests it in Docker containers
62+
(ubuntu22.04, rocky9, debian13). Triggers on:
63+
64+
- Tag pushes (any tag) — automatically publishes to GitHub Releases
65+
- PRs touching `pyproject.toml`, `LICENSE.md`, or the workflow file
66+
- `workflow_dispatch` with optional `pypi-target` choice (`None`, `testpypi`,
67+
`pypi`)
68+
69+
Publishing to TestPyPI/PyPI uses OIDC Trusted Publishing and requires the
70+
corresponding GitHub environment (`testpypi` or `pypi`). Only runs for the
71+
`KhiopsML` org on tag pushes.
72+
73+
### `api-docs.yml`
74+
75+
Builds Sphinx documentation inside a dev Docker container. Triggers on:
76+
77+
- Tag pushes — builds docs and uploads a zip archive to GitHub Releases
78+
- PRs touching `doc/**/*.rst`, `doc/create-doc`, `doc/clean-doc`, `doc/*.py`,
79+
`khiops/**/*.py`, or the workflow file
80+
- `workflow_dispatch` with optional tutorial and samples revision inputs
81+
82+
Uses the `khiopspydev-ubuntu22.04` Docker image and runs
83+
`./create-doc -t -d -g <revision>`. Uses a `pages` concurrency group that does
84+
**not** cancel in-progress runs (to avoid interrupting production deployments).
85+
86+
### `dev-docker.yml`
87+
88+
Builds development Docker images for multiple OS targets (ubuntu22.04, rocky8,
89+
rocky9, debian13) with configurable Khiops revision, server revision, Python
90+
versions (3.10–3.14), and remote file driver versions (GCS, S3, Azure).
91+
Triggers on PRs touching `packaging/docker/khiopspydev/Dockerfile.*` or the
92+
workflow file, and on `workflow_dispatch`. Images are pushed to
93+
`ghcr.io/khiopsml/khiops-python/khiopspydev-*` only when manually requested via
94+
`push: true`. The `set-latest` flag only works on the `main` or `main-v10`
95+
branches.
96+
97+
### `test-conda-forge-package.yml`
98+
99+
Manual-only workflow that tests the released `khiops` Conda package on the
100+
`conda-forge` channel across a broad matrix: Python 3.10–3.14 × multiple OS
101+
environments (Ubuntu 20.04/22.04/24.04, Rocky 8/9, Windows 2022/2025, macOS
102+
14/15/15-Intel). Tests both normal Conda environments and "Conda-based
103+
environments" (where `CONDA_PREFIX` is unset to simulate non-Conda invocation).
104+
105+
## Editing Rules
106+
107+
- Workflow YAML files are validated by pre-commit hooks
108+
(`check-github-workflows`, `check-github-actions`) and formatted by `yamlfix`.
109+
- The dev Docker images are the test environment for both `tests.yml` and
110+
`pip.yml`. If you need new system dependencies in CI, they go into the
111+
Dockerfiles under `packaging/docker/khiopspydev/`.
112+
- Test dependencies are in `test-requirements.txt` (`coverage`, `wrapt`).
113+
Package dependencies are extracted from `pyproject.toml` at CI time via
114+
`scripts/extract_dependencies_from_pyproject_toml.py`.

0 commit comments

Comments
 (0)