Skip to content

Commit e1582d0

Browse files
committed
refactor: enhance module structure and update documentation
- Updated `tach.toml` to define new module boundaries for `yt_framework.yt` and `yt_framework.yt.clients`, ensuring stricter import rules. - Revised `CONTRIBUTING.md` to clarify import rules and link to updated architecture documentation. - Adjusted `pyproject.toml` to include new linting rules and updated ignored complexity checks. - Enhanced documentation in `layers.md` to reflect the new architecture and import direction. - Updated various examples and tests to align with the new module structure and import paths for YQL requests. - Introduced new tests in `test_architecture_boundaries.py` to validate import restrictions and ensure compliance with the updated architecture.
1 parent e02f9d4 commit e1582d0

49 files changed

Lines changed: 602 additions & 369 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@ This helps ensure you haven't broken existing functionality.
178178

179179
### Module boundaries (Tach)
180180

181-
[Tach](https://github.com/tach-org/tach) enforces which subpackages under `yt_framework` and `ytjobs` may import each other. [tach.toml](tach.toml) lists every module with explicit `depends_on`, layer ordering, `layers_explicit_depends_on`, unused-edge detection (`exact`), and no circular first-party cycles. Anything under `tests/`, `examples/`, `docs/`, and `tools/` is excluded from that graph. Layer narrative: [docs/architecture/layers.md](docs/architecture/layers.md).
181+
[Tach](https://github.com/tach-org/tach) enforces which subpackages under `yt_framework` and `ytjobs` may import each other. [tach.toml](tach.toml) lists every module with explicit `depends_on`, layer ordering, `layers_explicit_depends_on`, unused-edge detection (`exact`), and no circular first-party cycles. Anything under `tests/`, `examples/`, `docs/`, and `tools/` is excluded from that graph. Layer narrative: [docs/architecture/layers.md](docs/architecture/layers.md). A few import rules are duplicated in [tests/test_architecture_boundaries.py](tests/test_architecture_boundaries.py) so CI output names the contract directly (for example, `yt_framework.operations` may only reach YT through `yt_framework.yt.clients`).
182182

183183
If your change adds or removes imports across those boundaries, update `tach.toml` in the same branch. Run `tach check` after substantive edits; if the graph drifted, run `tach sync` and then trim redundant `depends_on` entries so `exact` stays satisfied. Run `tach check-external` when you touch third-party imports so they stay aligned with `pyproject.toml`.
184184

docs/advanced/multiple-operations.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -402,7 +402,7 @@ self.logger.info("Validate operation completed")
402402

403403
The examples above chain **`run_map`** and **`run_vanilla`**. The same **sequential** pattern applies to other entry points:
404404

405-
- **YQL**: call `join_tables_request`, `filter_table_request`, and related helpers on `self.deps.yt_client` with request objects from `yt_framework.yt.clients.yql_requests`—see [YQL operations](../operations/yql.md).
405+
- **YQL**: call `join_tables_request`, `filter_table_request`, and related helpers on `self.deps.yt_client` with request objects from `yt_framework.yt.clients.yql.yql_requests`—see [YQL operations](../operations/yql.md).
406406
- **Map-reduce / reduce**: use `run_map_reduce` or `run_reduce` from `yt_framework.operations.command_ops.map_reduce` with `self.context` and `self.config.client.operations.*`—see [TypedJob map-reduce](../operations/map-reduce-typed-jobs.md) and [Command mode](../operations/command-mode-map-reduce.md).
407407
- **Sort**: use `run_sort` from `yt_framework.operations.command_ops.sort`—see [Sort operations](../operations/sort.md).
408408

docs/architecture/layers.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,11 @@ The library is split so **pipeline code** sits at the top, **operation drivers**
77
Roughly:
88

99
1. **Foundation** (`yt_framework.utils`, `yt_framework.job_command`, `yt_framework.typed_jobs`, and the empty `yt_framework` namespace) — must not import `core`, `operations`, or `yt`.
10-
2. **`yt_framework.yt`** — factory, `yt_framework.yt.clients` (ports and specs), dev/prod clients, mixins, and runtime helpers. Operation drivers should import **ports and specs** from `yt_framework.yt.clients.*` instead of reaching into mixins when possible.
11-
3. **`yt_framework.operations`** — map/vanilla/map-reduce drivers, upload, S3 helpers. Depends on `yt_framework.yt` (and `job_command`, `utils`, `ytjobs.s3` per `tach.toml`). Must **not** import `yt_framework.core` (enforced in tests; Tach uses `ignore_type_checking_imports = false` so type-only imports count too).
12-
4. **`yt_framework.core`**`BasePipeline`, stage discovery/registry, `BaseStage`, concrete `PipelineStageDependencies`. Imports `operations`, `utils`, and `yt`.
10+
2. **`yt_framework.yt`** — factory and package `__init__` only depend on **`yt_framework.yt.clients`**.
11+
3. **`yt_framework.yt.support`** — max row weight, dev simulator, prod/dev runtime helpers, secure-env splitting, and shared `OperationResources` dataclass. Depends only on `yt_framework` and `ytjobs` (plus third-party libs). Nothing here may import `yt_framework.yt.clients` or the pipeline layers above YT.
12+
4. **`yt_framework.yt.clients`**`BaseYTClient`, dev/prod clients, YQL request types under `clients.yql`, mixins under `clients._client_split`, and the public operation specs. Depends on `support`, `job_command`, and `utils`.
13+
5. **`yt_framework.operations`** — map/vanilla/map-reduce drivers, upload, S3 helpers. Declares **`yt_framework.yt.clients`** only (not `yt.support` or `yt.factory`). Must **not** import `yt_framework.core` (also covered by `tests/test_architecture_boundaries.py`). Type-only imports still count toward Tach (`ignore_type_checking_imports = false`).
14+
6. **`yt_framework.core`**`BasePipeline`, stage discovery/registry, `BaseStage`, concrete `PipelineStageDependencies`. Imports `operations`, `utils`, `yt` (factory entry), and `yt.clients` for types used by the pipeline.
1315

1416
`StageDependencies`, `StageContext`, and related injection types live in `yt_framework.operations.stage_contracts` so operation helpers do not import `core` just for those types.
1517

@@ -19,4 +21,4 @@ Roughly:
1921

2022
## Checks beyond Tach
2123

22-
Pre-commit also runs strict BasedPyright, Ruff, Xenon, Vulture, and small repo policies (file length, directory width, binding-word limits) described in `CONTRIBUTING.md` at the repository root.
24+
Pre-commit also runs strict BasedPyright, Ruff, Xenon, Vulture, and small repo policies (file length, directory width, binding-word limits) described in `CONTRIBUTING.md` at the repository root. `tests/test_architecture_boundaries.py` adds a few grep-style rules (for example: `operations` imports YT only via `yt_framework.yt.clients`; `yt` must not import `core` or `operations`).

docs/configuration/secrets.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ def run(self, debug: DebugContext) -> DebugContext:
4949

5050
A short **allowlist** keeps clearly non-secret keys in plain `environment` (for example `YT_STAGE_NAME` and tokenizer artifact path variables). To expose additional non-secret keys in the UI, set `environment_public_keys` on that operation’s config. The insecure rollback is `use_plain_environment_for_secrets: true` (not recommended).
5151

52-
**TypedJob** legs do not get that automatic shim: call `promote_secure_vault_environment()` from `yt_framework.yt.operation_secure_env` at the start of your job, or use a string command.
52+
**TypedJob** legs do not get that automatic shim: call `promote_secure_vault_environment()` from `yt_framework.yt.support.operation_secure_env` at the start of your job, or use a string command.
5353

5454
Do not put secrets in command lines; upstream discussions ([ytsaurus#780](https://github.com/ytsaurus/ytsaurus/issues/780), [ytsaurus#990](https://github.com/ytsaurus/ytsaurus/issues/990)) note that commands are another surface that may leak values in the UI.
5555

docs/operations/yql.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ YQL expresses set logic declaratively. Map runs your Python on each row stream.
2828

2929
## Request API
3030

31-
Each helper is a `*_request` method on the YT client. It takes a single frozen dataclass from `yt_framework.yt.clients.yql_requests` (for example `JoinTablesRequest`). The same types are what `yt_framework.yt.yql_builder` uses to build SQL strings.
31+
Each helper is a `*_request` method on the YT client. It takes a single frozen dataclass from `yt_framework.yt.clients.yql.yql_requests` (for example `JoinTablesRequest`). The same types are what `yt_framework.yt.clients.yql.yql_builder` uses to build SQL strings.
3232

3333
For filter, union, sort, and limit, you can leave `columns` unset on the request; the client fills them from the input table schema when needed.
3434

@@ -45,7 +45,7 @@ by default.
4545
Override per call when needed (must not exceed `128M`; larger values raise `ValueError`):
4646

4747
```python
48-
from yt_framework.yt.clients.yql_requests import JoinTablesRequest
48+
from yt_framework.yt.clients.yql.yql_requests import JoinTablesRequest
4949

5050
self.deps.yt_client.join_tables_request(
5151
JoinTablesRequest(
@@ -76,7 +76,7 @@ If the SQL already contains `PRAGMA yt.MaxRowWeight`, that value is checked too;
7676
Join two tables on a common column.
7777

7878
```python
79-
from yt_framework.yt.clients.yql_requests import JoinTablesRequest
79+
from yt_framework.yt.clients.yql.yql_requests import JoinTablesRequest
8080

8181
self.deps.yt_client.join_tables_request(
8282
JoinTablesRequest(
@@ -113,7 +113,7 @@ self.deps.yt_client.join_tables_request(
113113
Filter rows based on a condition.
114114

115115
```python
116-
from yt_framework.yt.clients.yql_requests import FilterTableRequest
116+
from yt_framework.yt.clients.yql.yql_requests import FilterTableRequest
117117

118118
self.deps.yt_client.filter_table_request(
119119
FilterTableRequest(
@@ -136,7 +136,7 @@ self.deps.yt_client.filter_table_request(
136136
Select specific columns from a table.
137137

138138
```python
139-
from yt_framework.yt.clients.yql_requests import SelectColumnsRequest
139+
from yt_framework.yt.clients.yql.yql_requests import SelectColumnsRequest
140140

141141
self.deps.yt_client.select_columns_request(
142142
SelectColumnsRequest(
@@ -158,7 +158,7 @@ self.deps.yt_client.select_columns_request(
158158
Group rows and compute aggregations.
159159

160160
```python
161-
from yt_framework.yt.clients.yql_requests import GroupByAggregateRequest
161+
from yt_framework.yt.clients.yql.yql_requests import GroupByAggregateRequest
162162

163163
self.deps.yt_client.group_by_aggregate_request(
164164
GroupByAggregateRequest(
@@ -196,7 +196,7 @@ self.deps.yt_client.group_by_aggregate_request(
196196
Combine multiple tables into one.
197197

198198
```python
199-
from yt_framework.yt.clients.yql_requests import UnionTablesRequest
199+
from yt_framework.yt.clients.yql.yql_requests import UnionTablesRequest
200200

201201
self.deps.yt_client.union_tables_request(
202202
UnionTablesRequest(
@@ -222,7 +222,7 @@ self.deps.yt_client.union_tables_request(
222222
Get distinct values from columns.
223223

224224
```python
225-
from yt_framework.yt.clients.yql_requests import DistinctRequest
225+
from yt_framework.yt.clients.yql.yql_requests import DistinctRequest
226226

227227
self.deps.yt_client.distinct_request(
228228
DistinctRequest(
@@ -244,7 +244,7 @@ self.deps.yt_client.distinct_request(
244244
Sort table by one or more columns.
245245

246246
```python
247-
from yt_framework.yt.clients.yql_requests import SortTableRequest
247+
from yt_framework.yt.clients.yql.yql_requests import SortTableRequest
248248

249249
self.deps.yt_client.sort_table_request(
250250
SortTableRequest(
@@ -269,7 +269,7 @@ self.deps.yt_client.sort_table_request(
269269
Limit the number of rows in a table.
270270

271271
```python
272-
from yt_framework.yt.clients.yql_requests import LimitTableRequest
272+
from yt_framework.yt.clients.yql.yql_requests import LimitTableRequest
273273

274274
self.deps.yt_client.limit_table_request(
275275
LimitTableRequest(
@@ -292,7 +292,7 @@ self.deps.yt_client.limit_table_request(
292292
All YQL operations support dry run mode to preview queries before execution:
293293

294294
```python
295-
from yt_framework.yt.clients.yql_requests import JoinTablesRequest
295+
from yt_framework.yt.clients.yql.yql_requests import JoinTablesRequest
296296

297297
# Preview query without executing
298298
query = self.deps.yt_client.join_tables_request(
@@ -377,7 +377,7 @@ In dev mode, YQL operations are simulated using DuckDB:
377377
### Multi-Table Join
378378

379379
```python
380-
from yt_framework.yt.clients.yql_requests import JoinTablesRequest
380+
from yt_framework.yt.clients.yql.yql_requests import JoinTablesRequest
381381
382382
joined1 = self.deps.yt_client.join_tables_request(
383383
JoinTablesRequest(
@@ -401,7 +401,7 @@ joined2 = self.deps.yt_client.join_tables_request(
401401
### Filtered Aggregation
402402

403403
```python
404-
from yt_framework.yt.clients.yql_requests import (
404+
from yt_framework.yt.clients.yql.yql_requests import (
405405
FilterTableRequest,
406406
GroupByAggregateRequest,
407407
)
@@ -427,7 +427,7 @@ aggregated = self.deps.yt_client.group_by_aggregate_request(
427427
### Top N Results
428428

429429
```python
430-
from yt_framework.yt.clients.yql_requests import LimitTableRequest, SortTableRequest
430+
from yt_framework.yt.clients.yql.yql_requests import LimitTableRequest, SortTableRequest
431431
432432
sorted_table = self.deps.yt_client.sort_table_request(
433433
SortTableRequest(

docs/reference/api.md

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ Use the sidebar headings for modules. Anything not listed is still in the source
1313
- **Core** — pipeline, stage, registry, discovery, `self.deps` injection
1414
- **Operations** — map, vanilla, map-reduce/reduce, S3 helpers, table helpers, checkpoint upload, sort, tokenizer artifact wiring; stage contracts (`stage_contracts`) shared with `core` without reversing the dependency
1515
- **Typed jobs**`StageBootstrapTypedJob`
16-
- **YT**client factory, dev and prod clients (YQL helpers live on the clients)
16+
- **YT**`yt.support` (shared runtime helpers), `yt.clients` (public client API and mixins), and `yt` entry (`factory`, package exports)
1717
- **Utils** — env files, logging setup, ignore patterns
1818
- **Packaging** — upload helpers shared by operations
1919
- **`ytjobs`**[Job-side reference](ytjobs.md)
@@ -35,6 +35,24 @@ How-to guides under `docs/operations`, `docs/configuration`, and `docs/advanced`
3535
:show-inheritance:
3636
```
3737

38+
### Pipeline config helpers
39+
40+
```{eval-rst}
41+
.. automodule:: yt_framework.core.pipeline_config
42+
:members:
43+
:undoc-members:
44+
:show-inheritance:
45+
```
46+
47+
### Pipeline CLI helpers
48+
49+
```{eval-rst}
50+
.. automodule:: yt_framework.core.pipeline_cli
51+
:members:
52+
:undoc-members:
53+
:show-inheritance:
54+
```
55+
3856
### Stage
3957

4058
```{eval-rst}
@@ -112,7 +130,7 @@ How-to guides under `docs/operations`, `docs/configuration`, and `docs/advanced`
112130

113131
### YQL Operations
114132

115-
YQL operations are ``*_request`` methods on the YT client, each taking a frozen request type from ``yt_framework.yt.clients.yql_requests`` (for example ``JoinTablesRequest``). See :doc:`../operations/yql`.
133+
YQL operations are ``*_request`` methods on the YT client, each taking a frozen request type from ``yt_framework.yt.clients.yql.yql_requests`` (for example ``JoinTablesRequest``). See :doc:`../operations/yql`.
116134

117135
```{note}
118136
**YQL Operations Location**

docs/reference/environment-variables.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ In **prod** mode, `YTProdClient` splits the merged operation env:
99
- **Plain `environment`** (visible in the YT UI for operations you can read): a small built-in set (`YT_STAGE_NAME`, `YT_ALLOW_HTTP_REQUESTS_TO_YT_FROM_JOB`, tokenizer artifact keys) plus any names listed under `environment_public_keys` for that operation.
1010
- **`secure_vault`**: everything else from the merged env, plus `docker_auth` when `docker_image` is set, merged with any `secure_vault` you pass through operation kwargs.
1111

12-
Jobs still receive vaulted keys under their normal names in the process environment when you use **string** mapper/vanilla/reducer commands (the client adds a stdlib promotion step). **TypedJob** code should call `promote_secure_vault_environment()` from `yt_framework.yt.operation_secure_env` early, or rely on `YT_SECURE_VAULT_*` names.
12+
Jobs still receive vaulted keys under their normal names in the process environment when you use **string** mapper/vanilla/reducer commands (the client adds a stdlib promotion step). **TypedJob** code should call `promote_secure_vault_environment()` from `yt_framework.yt.support.operation_secure_env` early, or rely on `YT_SECURE_VAULT_*` names.
1313

1414
**Dev mode** does not split: the subprocess gets the full dict, since there is no YT spec UI.
1515

examples/02_multi_stage_pipeline/stages/join_data/stage.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
from yt_framework.core.pipeline import DebugContext
22
from yt_framework.core.stage import BaseStage
3-
from yt_framework.yt.clients.yql_requests import JoinTablesRequest
3+
from yt_framework.yt.clients.yql.yql_requests import JoinTablesRequest
44

55

66
class JoinDataStage(BaseStage):

examples/03_yql_operations/stages/yql_examples/stage.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from yt_framework.core.pipeline import DebugContext
22
from yt_framework.core.stage import BaseStage
33
from yt_framework.utils.logging import log_header
4-
from yt_framework.yt.clients.yql_requests import (
4+
from yt_framework.yt.clients.yql.yql_requests import (
55
DistinctRequest,
66
FilterTableRequest,
77
GroupByAggregateRequest,

examples/video_gpu/stages/join_tables/stage.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from yt_framework.core.pipeline import DebugContext
22
from yt_framework.core.stage import BaseStage
33
from yt_framework.utils.logging import log_header
4-
from yt_framework.yt.clients.yql_requests import JoinTablesRequest
4+
from yt_framework.yt.clients.yql.yql_requests import JoinTablesRequest
55

66

77
class JoinTablesStage(BaseStage):

0 commit comments

Comments
 (0)