You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
refactor: enhance module structure and update documentation
- Updated `tach.toml` to define new module boundaries for `yt_framework.yt` and `yt_framework.yt.clients`, ensuring stricter import rules.
- Revised `CONTRIBUTING.md` to clarify import rules and link to updated architecture documentation.
- Adjusted `pyproject.toml` to include new linting rules and updated ignored complexity checks.
- Enhanced documentation in `layers.md` to reflect the new architecture and import direction.
- Updated various examples and tests to align with the new module structure and import paths for YQL requests.
- Introduced new tests in `test_architecture_boundaries.py` to validate import restrictions and ensure compliance with the updated architecture.
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -178,7 +178,7 @@ This helps ensure you haven't broken existing functionality.
178
178
179
179
### Module boundaries (Tach)
180
180
181
-
[Tach](https://github.com/tach-org/tach) enforces which subpackages under `yt_framework` and `ytjobs` may import each other. [tach.toml](tach.toml) lists every module with explicit `depends_on`, layer ordering, `layers_explicit_depends_on`, unused-edge detection (`exact`), and no circular first-party cycles. Anything under `tests/`, `examples/`, `docs/`, and `tools/` is excluded from that graph. Layer narrative: [docs/architecture/layers.md](docs/architecture/layers.md).
181
+
[Tach](https://github.com/tach-org/tach) enforces which subpackages under `yt_framework` and `ytjobs` may import each other. [tach.toml](tach.toml) lists every module with explicit `depends_on`, layer ordering, `layers_explicit_depends_on`, unused-edge detection (`exact`), and no circular first-party cycles. Anything under `tests/`, `examples/`, `docs/`, and `tools/` is excluded from that graph. Layer narrative: [docs/architecture/layers.md](docs/architecture/layers.md). A few import rules are duplicated in [tests/test_architecture_boundaries.py](tests/test_architecture_boundaries.py) so CI output names the contract directly (for example, `yt_framework.operations` may only reach YT through `yt_framework.yt.clients`).
182
182
183
183
If your change adds or removes imports across those boundaries, update `tach.toml` in the same branch. Run `tach check` after substantive edits; if the graph drifted, run `tach sync` and then trim redundant `depends_on` entries so `exact` stays satisfied. Run `tach check-external` when you touch third-party imports so they stay aligned with `pyproject.toml`.
The examples above chain **`run_map`** and **`run_vanilla`**. The same **sequential** pattern applies to other entry points:
404
404
405
-
-**YQL**: call `join_tables_request`, `filter_table_request`, and related helpers on `self.deps.yt_client` with request objects from `yt_framework.yt.clients.yql_requests`—see [YQL operations](../operations/yql.md).
405
+
-**YQL**: call `join_tables_request`, `filter_table_request`, and related helpers on `self.deps.yt_client` with request objects from `yt_framework.yt.clients.yql.yql_requests`—see [YQL operations](../operations/yql.md).
406
406
-**Map-reduce / reduce**: use `run_map_reduce` or `run_reduce` from `yt_framework.operations.command_ops.map_reduce` with `self.context` and `self.config.client.operations.*`—see [TypedJob map-reduce](../operations/map-reduce-typed-jobs.md) and [Command mode](../operations/command-mode-map-reduce.md).
407
407
-**Sort**: use `run_sort` from `yt_framework.operations.command_ops.sort`—see [Sort operations](../operations/sort.md).
Copy file name to clipboardExpand all lines: docs/architecture/layers.md
+6-4Lines changed: 6 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,9 +7,11 @@ The library is split so **pipeline code** sits at the top, **operation drivers**
7
7
Roughly:
8
8
9
9
1.**Foundation** (`yt_framework.utils`, `yt_framework.job_command`, `yt_framework.typed_jobs`, and the empty `yt_framework` namespace) — must not import `core`, `operations`, or `yt`.
10
-
2.**`yt_framework.yt`** — factory, `yt_framework.yt.clients` (ports and specs), dev/prod clients, mixins, and runtime helpers. Operation drivers should import **ports and specs** from `yt_framework.yt.clients.*` instead of reaching into mixins when possible.
11
-
3.**`yt_framework.operations`** — map/vanilla/map-reduce drivers, upload, S3 helpers. Depends on `yt_framework.yt` (and `job_command`, `utils`, `ytjobs.s3` per `tach.toml`). Must **not** import `yt_framework.core` (enforced in tests; Tach uses `ignore_type_checking_imports = false` so type-only imports count too).
2.**`yt_framework.yt`** — factory and package `__init__` only depend on **`yt_framework.yt.clients`**.
11
+
3.**`yt_framework.yt.support`** — max row weight, dev simulator, prod/dev runtime helpers, secure-env splitting, and shared `OperationResources` dataclass. Depends only on `yt_framework` and `ytjobs` (plus third-party libs). Nothing here may import `yt_framework.yt.clients` or the pipeline layers above YT.
12
+
4.**`yt_framework.yt.clients`** — `BaseYTClient`, dev/prod clients, YQL request types under `clients.yql`, mixins under `clients._client_split`, and the public operation specs. Depends on `support`, `job_command`, and `utils`.
13
+
5.**`yt_framework.operations`** — map/vanilla/map-reduce drivers, upload, S3 helpers. Declares **`yt_framework.yt.clients`** only (not `yt.support` or `yt.factory`). Must **not** import `yt_framework.core` (also covered by `tests/test_architecture_boundaries.py`). Type-only imports still count toward Tach (`ignore_type_checking_imports = false`).
14
+
6.**`yt_framework.core`** — `BasePipeline`, stage discovery/registry, `BaseStage`, concrete `PipelineStageDependencies`. Imports `operations`, `utils`, `yt` (factory entry), and `yt.clients` for types used by the pipeline.
13
15
14
16
`StageDependencies`, `StageContext`, and related injection types live in `yt_framework.operations.stage_contracts` so operation helpers do not import `core` just for those types.
15
17
@@ -19,4 +21,4 @@ Roughly:
19
21
20
22
## Checks beyond Tach
21
23
22
-
Pre-commit also runs strict BasedPyright, Ruff, Xenon, Vulture, and small repo policies (file length, directory width, binding-word limits) described in `CONTRIBUTING.md` at the repository root.
24
+
Pre-commit also runs strict BasedPyright, Ruff, Xenon, Vulture, and small repo policies (file length, directory width, binding-word limits) described in `CONTRIBUTING.md` at the repository root.`tests/test_architecture_boundaries.py` adds a few grep-style rules (for example: `operations` imports YT only via `yt_framework.yt.clients`; `yt` must not import `core` or `operations`).
A short **allowlist** keeps clearly non-secret keys in plain `environment` (for example `YT_STAGE_NAME` and tokenizer artifact path variables). To expose additional non-secret keys in the UI, set `environment_public_keys` on that operation’s config. The insecure rollback is `use_plain_environment_for_secrets: true` (not recommended).
51
51
52
-
**TypedJob** legs do not get that automatic shim: call `promote_secure_vault_environment()` from `yt_framework.yt.operation_secure_env` at the start of your job, or use a string command.
52
+
**TypedJob** legs do not get that automatic shim: call `promote_secure_vault_environment()` from `yt_framework.yt.support.operation_secure_env` at the start of your job, or use a string command.
53
53
54
54
Do not put secrets in command lines; upstream discussions ([ytsaurus#780](https://github.com/ytsaurus/ytsaurus/issues/780), [ytsaurus#990](https://github.com/ytsaurus/ytsaurus/issues/990)) note that commands are another surface that may leak values in the UI.
Copy file name to clipboardExpand all lines: docs/operations/yql.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,7 +28,7 @@ YQL expresses set logic declaratively. Map runs your Python on each row stream.
28
28
29
29
## Request API
30
30
31
-
Each helper is a `*_request` method on the YT client. It takes a single frozen dataclass from `yt_framework.yt.clients.yql_requests` (for example `JoinTablesRequest`). The same types are what `yt_framework.yt.yql_builder` uses to build SQL strings.
31
+
Each helper is a `*_request` method on the YT client. It takes a single frozen dataclass from `yt_framework.yt.clients.yql.yql_requests` (for example `JoinTablesRequest`). The same types are what `yt_framework.yt.clients.yql.yql_builder` uses to build SQL strings.
32
32
33
33
For filter, union, sort, and limit, you can leave `columns` unset on the request; the client fills them from the input table schema when needed.
34
34
@@ -45,7 +45,7 @@ by default.
45
45
Override per call when needed (must not exceed `128M`; larger values raise `ValueError`):
46
46
47
47
```python
48
-
from yt_framework.yt.clients.yql_requests import JoinTablesRequest
48
+
from yt_framework.yt.clients.yql.yql_requests import JoinTablesRequest
49
49
50
50
self.deps.yt_client.join_tables_request(
51
51
JoinTablesRequest(
@@ -76,7 +76,7 @@ If the SQL already contains `PRAGMA yt.MaxRowWeight`, that value is checked too;
76
76
Join two tables on a common column.
77
77
78
78
```python
79
-
from yt_framework.yt.clients.yql_requests import JoinTablesRequest
79
+
from yt_framework.yt.clients.yql.yql_requests import JoinTablesRequest
-**Packaging** — upload helpers shared by operations
19
19
-**`ytjobs`** — [Job-side reference](ytjobs.md)
@@ -35,6 +35,24 @@ How-to guides under `docs/operations`, `docs/configuration`, and `docs/advanced`
35
35
:show-inheritance:
36
36
```
37
37
38
+
### Pipeline config helpers
39
+
40
+
```{eval-rst}
41
+
.. automodule:: yt_framework.core.pipeline_config
42
+
:members:
43
+
:undoc-members:
44
+
:show-inheritance:
45
+
```
46
+
47
+
### Pipeline CLI helpers
48
+
49
+
```{eval-rst}
50
+
.. automodule:: yt_framework.core.pipeline_cli
51
+
:members:
52
+
:undoc-members:
53
+
:show-inheritance:
54
+
```
55
+
38
56
### Stage
39
57
40
58
```{eval-rst}
@@ -112,7 +130,7 @@ How-to guides under `docs/operations`, `docs/configuration`, and `docs/advanced`
112
130
113
131
### YQL Operations
114
132
115
-
YQL operations are ``*_request`` methods on the YT client, each taking a frozen request type from ``yt_framework.yt.clients.yql_requests`` (for example ``JoinTablesRequest``). See :doc:`../operations/yql`.
133
+
YQL operations are ``*_request`` methods on the YT client, each taking a frozen request type from ``yt_framework.yt.clients.yql.yql_requests`` (for example ``JoinTablesRequest``). See :doc:`../operations/yql`.
Copy file name to clipboardExpand all lines: docs/reference/environment-variables.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ In **prod** mode, `YTProdClient` splits the merged operation env:
9
9
-**Plain `environment`** (visible in the YT UI for operations you can read): a small built-in set (`YT_STAGE_NAME`, `YT_ALLOW_HTTP_REQUESTS_TO_YT_FROM_JOB`, tokenizer artifact keys) plus any names listed under `environment_public_keys` for that operation.
10
10
-**`secure_vault`**: everything else from the merged env, plus `docker_auth` when `docker_image` is set, merged with any `secure_vault` you pass through operation kwargs.
11
11
12
-
Jobs still receive vaulted keys under their normal names in the process environment when you use **string** mapper/vanilla/reducer commands (the client adds a stdlib promotion step). **TypedJob** code should call `promote_secure_vault_environment()` from `yt_framework.yt.operation_secure_env` early, or rely on `YT_SECURE_VAULT_*` names.
12
+
Jobs still receive vaulted keys under their normal names in the process environment when you use **string** mapper/vanilla/reducer commands (the client adds a stdlib promotion step). **TypedJob** code should call `promote_secure_vault_environment()` from `yt_framework.yt.support.operation_secure_env` early, or rely on `YT_SECURE_VAULT_*` names.
13
13
14
14
**Dev mode** does not split: the subprocess gets the full dict, since there is no YT spec UI.
0 commit comments