This file is the maintainer checklist: environment, tests, docs build, and review expectations.
- Code of Conduct
- How to Contribute
- Development Setup
- Project Structure
- Development Workflow
- Code Style and Conventions
- Testing
- Documentation
- Submitting Contributions
- Additional Guidelines
This project adheres to a code of conduct that all contributors are expected to follow. Please be respectful and constructive in all interactions.
Contributions come in many forms:
- Bug Reports: Report issues you encounter
- Feature Requests: Suggest new features or improvements
- Code Contributions: Submit pull requests with bug fixes or new features
- Documentation: Improve or add documentation
- Examples: Add example pipelines demonstrating framework features
- Testing: Add tests or improve test coverage
- Python 3.11 or higher
- Git
- Conda (Miniconda, Mambaforge, or similar) — recommended so you use one shared environment for tests, docs, and formatting
- Access to YTsaurus cluster (for production mode testing)
- YT credentials (for production mode testing)
-
Fork and clone the repository:
git clone https://github.com/GregoryKogan/yt-framework.git cd yt-framework -
Create and activate the project Conda environment (recommended):
conda create -n yt-framework python=3.11 conda activate yt-framework
Optionally use
conda-forge(e.g.conda create -n yt-framework python=3.11 -c conda-forge). -
Install the package in editable mode with dev and docs extras:
pip install -e ".[dev,docs]"If
which pippoints outside the Conda env (for example Homebrew), usepython -m pip install -e ".[dev,docs]"instead. The same applies to one-off commands withconda run -n yt-framework -- python -m pip ....This installs runtime dependencies plus
ruff,basedpyright,vulture,xenon,tach,pytest,pytest-cov,pre-commit, and the Sphinx stack needed formake -C docs html. You do not need a separatepip install -e ".[docs]"step.Without Conda: use a Python 3.11+ virtual environment, then
pip install -e ".[dev,docs]"(orpip install -e .andpip install -e ".[dev]"if you will not build docs locally). -
IDE / Cursor: set the Python interpreter to the
yt-frameworkConda environment so the editor, integrated terminal, and agents target the same interpreter. -
Install Git commit hooks (recommended):
pre-commit install
Run this once per clone. On each commit, pre-commit runs Ruff (
ruff check --fixandruff format), strict BasedPyright, Vulture (dead-code scan foryt_frameworkandytjobs; see[tool.vulture]inpyproject.toml), Xenon (cyclomatic complexity on those packages; thresholds in .pre-commit-config.yaml), Tach as local hookstach-checkandtach-external(tach checkfor internal module boundaries andtach check-externalso third-party imports match[project] dependenciesinpyproject.toml; Tach is pinned to the same version as in the dev extra), theyt-framework-pre-commit-policyhook (python scripts/precommit/run.py, configured under[tool.yt_framework.pre_commit]inpyproject.toml: by default 550 lines max per*.pyunder the listed roots (same newline count aswc -l), 10 max immediate non-ignored children per directory, and 5 max underscore-separated segments in each bound name across the AST—see that table for overrides), andpython scripts/coverage/run_pytest_line_gate.pyfrom thepytesthook (same as CI:pytest -m "not yt_cluster"overyt_frameworkandytjobswith--cov-report=json, thenscripts/coverage/check_line_coverage.pyso commits cannot pass with missed statements). The hook useslanguage: pythonwithadditional_dependenciesaligned to[project] dependenciesinpyproject.toml, notlanguage: system/ Conda. You can still runconda run -n yt-framework -- python scripts/coverage/run_pytest_line_gate.pylocally when you prefer the Conda env. To skip hooks in an emergency, usegit commit --no-verifyorgit push --no-verify. To run checks manually before commit, use:pre-commit run ruff-check --all-files pre-commit run ruff-format --all-files pre-commit run basedpyright --all-files pre-commit run vulture --all-files pre-commit run xenon --all-files pre-commit run tach-check --all-files pre-commit run tach-external --all-files pre-commit run yt-framework-pre-commit-policy --all-files pre-commit run pytest --all-files
Or directly (same Conda env):
conda run -n yt-framework -- ruff check .,conda run -n yt-framework -- ruff format .,conda run -n yt-framework -- basedpyright --pythonpath "$(python -c 'import sys; print(sys.executable)')",conda run -n yt-framework -- vulture,conda run -n yt-framework -- xenon --max-absolute=A --max-modules=A --max-average=A yt_framework ytjobs,conda run -n yt-framework -- python scripts/precommit/run.py,conda run -n yt-framework -- tach check,conda run -n yt-framework -- tach check-external, andconda run -n yt-framework -- python scripts/coverage/run_pytest_line_gate.py.When you change runtime dependencies under
[project] dependenciesinpyproject.toml, update thepytesthook’sadditional_dependenciesin .pre-commit-config.yaml so the hook’s venv matches (includingpytest-covfor the coverage gate). -
Set up YT credentials (for production mode testing):
Create a
secrets.envfile in any example'sconfigs/directory:# configs/secrets.env YT_PROXY=your-yt-proxy-url YT_TOKEN=your-yt-tokenSee Configuration Guide for more details.
-
Verify installation:
python -c "import yt_framework; print('YT Framework installed successfully')"
YT Framework supports a dev mode that simulates YT operations locally using the file system. This is perfect for development and testing without needing YT cluster access.
-
Set mode to dev in your pipeline config:
# configs/config.yaml pipeline: mode: "dev"
-
Run your pipeline:
python pipeline.py
In dev mode, tables are stored as
.jsonlfiles in the.dev/directory, and operations run locally.
See Dev vs Prod Guide for more details.
When testing changes, create a test pipeline in the examples/ directory:
-
Create a new example directory:
mkdir -p examples/test_feature/stages/my_stage configs
-
Create
pipeline.py:from yt_framework.core.pipeline import DefaultPipeline if __name__ == "__main__": DefaultPipeline.main()
-
Create stage and config files following the pattern in existing examples.
-
Run the example to verify your changes work correctly.
To verify your changes work with existing examples:
cd examples/01_hello_world
python pipeline.pyThis helps ensure you haven't broken existing functionality.
Tach enforces which subpackages under yt_framework and ytjobs may import each other. tach.toml lists every module with explicit depends_on, layer ordering, layers_explicit_depends_on, unused-edge detection (exact), and no circular first-party cycles. Anything under tests/, examples/, docs/, and tools/ is excluded from that graph. Layer narrative: docs/architecture/layers.md.
Source of truth: when a rule matters for architecture, it should appear in tach.toml (and usually in docs/architecture/layers.md). tests/test_architecture_boundaries.py repeats a handful of import rules as line-oriented greps over yt_framework/operations and yt_framework/yt. That overlaps Tach for some cases (for example operations must not import core). Keep both in sync: Tach is authoritative for the full dependency graph; the pytest module exists so failures cite concrete file and line, which is easier to read in CI than a graph edge alone.
If your change adds or removes imports across those boundaries, update tach.toml in the same branch. Run tach check after substantive edits; if the graph drifted, run tach sync and then trim redundant depends_on entries so exact stays satisfied. Run tach check-external when you touch third-party imports so they stay aligned with pyproject.toml.
Widening excludes, turning modules into utilities, or disabling checks requires maintainer agreement—do not do that to get a green build.
- Separation of Concerns: Keep core logic, operations, and utilities separate
- Reusability: Write code that can be reused across different stages
- Simplicity: Prefer simple, readable solutions over complex ones
- Consistency: Follow existing patterns and conventions
- Follow PEP 8 style guidelines
- Use Ruff for formatting and linting (line length 88, Python 3.11; see
[tool.ruff]inpyproject.toml). Lint usesselect = ["ALL"]with a small documentedignorelist. - Use BasedPyright for strict static typing (
pyrightconfig.json). The checked tree isyt_frameworkandytjobs(seeinclude/excludethere); tests andexamples/are excluded from that pass. - Use Vulture for unused code at 100% confidence (
[tool.vulture]inpyproject.toml). If static analysis flags code that is used dynamically, add a small whitelist module and list it under[tool.vulture]paths(see Vulture’s docs). - Use Xenon (Radon-backed) for cyclomatic complexity on
yt_frameworkandytjobs. Thresholds are--max-absolute=A,--max-modules=A,--max-average=A(see .pre-commit-config.yaml and the CI lint job). See Xenon and Radon ranks. - Use Tach (tach-org/tach) so imports between
yt_framework.*andytjobs.*match tach.toml: explicitdepends_onper module, layers, noytjobs→yt_frameworkedges, andtach check-externalagainst declared runtime dependencies. Tests, examples, docs, andtools/stay out of the graph. - Use type hints where appropriate
- Classes:
PascalCase(e.g.,BaseStage,DefaultPipeline) - Functions and variables:
snake_case(e.g.,write_table,config_path) - Constants:
UPPER_SNAKE_CASE(e.g.,DEFAULT_MODE) - Private methods: Prefix with underscore (e.g.,
_internal_method)
Use Google-style docstrings:
def write_table(self, table_path: str, rows: list) -> None:
"""Write rows to a YT table.
Args:
table_path: Path to the YT table (e.g., "//tmp/my_table")
rows: List of dictionaries representing table rows
Raises:
ValueError: If table_path is invalid
"""
...Organize imports in this order:
- Standard library imports
- Third-party imports
- Local application imports
Example:
import os
from pathlib import Path
from typing import Optional
import ytsaurus_client
from omegaconf import DictConfig
from yt_framework.core.stage import BaseStage
from yt_framework.operations.table import TableOperation- One class per file (when possible)
- Keep files focused and cohesive
- Use
__init__.pyto expose public API - Place tests in
tests/directory (when test suite exists)
YT Framework uses pytest for testing (available as a dev dependency). Alongside the main suite, you can use:
- Dev Mode Testing: Use dev mode to test changes locally
- Example Pipelines: Run existing examples to verify compatibility
- Manual Testing: Create test pipelines to exercise new features
- Real cluster integration tests (optional): pytest packages under
tests/integration/yt_cluster/andtests/integration/examples_cluster/; see docs/testing/yt-cluster-integration.md and docs/testing/example-pipelines.md
When tests are available, run them with:
# Run all tests
conda run -n yt-framework -- pytest
# Run with coverage (matches CI; excludes real-cluster marker `yt_cluster`)
conda run -n yt-framework -- pytest -m "not yt_cluster" --cov=yt_framework --cov=ytjobs --cov-report=json:coverage.json
conda run -n yt-framework -- python scripts/coverage/check_line_coverage.py coverage.json
# Same checks as the pre-commit `pytest` hook (pytest + JSON report + line gate)
conda run -n yt-framework -- python scripts/coverage/run_pytest_line_gate.py
# Run specific test file
conda run -n yt-framework -- pytest tests/test_stage.pyIf YT_PROXY and YT_TOKEN are available from a repo-root yt-cluster-test.env file (see yt-cluster-test.example.env), from YT_FRAMEWORK_CLUSTER_TEST_ENV, or from the environment, pytest collects tests/integration/yt_cluster/ and tests/integration/examples_cluster/ on your machine. Those directories are also ignored when CI=true (typical on CI hosts), and GitHub Actions runs pytest -m "not yt_cluster" so real-cluster tests never execute there even if secrets were misconfigured. Without credentials and outside CI, the same packages are skipped at collection time.
Run only those tests:
conda run -n yt-framework -- pytest -m yt_cluster -xvsDo not commit real tokens; *.env is gitignored except *example.env. Jobs in these tests rely on the cell default Docker image (no YT_TEST_DOCKER_IMAGE).
If you keep yt-cluster-test.env in your clone and run the full pytest command locally, cluster tests will run too. To avoid hitting a cell, unset those credentials for that run, narrow markers, or skip cluster tests as described in docs/testing/yt-cluster-integration.md.
The workflow .github/workflows/ci.yml runs Ruff, Vulture, Xenon, tach check, strict BasedPyright, and pytest -m "not yt_cluster" with coverage over yt_framework and ytjobs on every push to any branch and on pull requests targeting main or dev (Python 3.11, pip install -e ".[dev]"). That pytest run includes dev-tier example pipeline smoke tests under tests/integration/example_pipelines/. After pytest writes coverage.json, CI runs python scripts/coverage/check_line_coverage.py coverage.json, which fails if any statement under those two packages is still marked missing (branch coverage is collected separately and is not part of that gate). Real YT cluster tests are excluded because the runner has no cell access. To require a green check before merging, configure branch protection on GitHub and add lint, typecheck, and test as required status checks.
The README coverage badge is powered by shields.io endpoint badges reading JSON from a public GitHub Gist. CI on push to main updates that file when repository configuration is present. Forks and PRs from forks do not run the gist step.
Complete these steps in order (each step is one action):
-
Create a classic personal access token with scope
gistonly: GitHub → Settings → Developer settings → Personal access tokens. Store the token securely after creation. -
Create a public gist (Your gists → create). Add a single file named
yt-framework-coverage.jsonwith this content:{"schemaVersion":1,"label":"coverage","message":"0%","color":"lightgrey"}. Save the gist. -
Copy the gist ID from the gist URL
https://gist.github.com/<you>/<GIST_ID>(the hex segment after your username). -
In the
yt-frameworkrepository: Settings → Secrets and variables → Actions → Variables → New repository variable. Name:COVERAGE_GIST_ID. Value: the gist ID from step 3. Save. -
In the same repository: Secrets and variables → Actions → Secrets → New repository secret. Name:
COVERAGE_GIST_TOKEN. Value: the PAT from step 1. Save. -
Confirm the gist owner and the PAT owner are the same GitHub user (or that the PAT can edit that gist).
-
Edit README.md: in the coverage badge URL, replace
YOUR_GIST_IDwith the gist ID from step 3 (leave the rest of the shields.io URL unchanged). -
Push or merge to
mainand confirm in Actions that CI succeeded; open the gist and check thatyt-framework-coverage.jsonshows the real percentage. -
If the README badge looks stale, hard-refresh the page or wait briefly; shields.io and proxies may cache images. The badge URL includes
cacheSeconds=60to limit caching.
Dev mode is ideal for testing because it:
- Simulates YT operations locally
- Doesn't require YT cluster access
- Provides fast feedback
- Creates reproducible test environments
Example:
cd examples/01_hello_world
# Ensure config.yaml has mode: "dev"
python pipeline.pyCatalog and subprocess checks live in examples/manifest.yaml; see docs/testing/example-pipelines.md.
From the repo root (uses the same pytest selection as CI for dev-tier demos):
conda run -n yt-framework -- pytest tests/integration/example_pipelines/test_smoke.py -m examples --tb=shortOptional prod demos on a real cell (same credential rules as cluster IT) are documented in docs/testing/yt-cluster-integration.md.
You can still run a single tree by hand, for example:
cd examples/01_hello_world
python pipeline.pyDocumentation lives in the docs/ directory:
- Main docs:
docs/index.md- Installation and quick start - Guides:
docs/pipelines-and-stages.md,docs/configuration/index.md,docs/operations/,docs/advanced/, etc. - API reference (framework):
docs/reference/api.md—yt_frameworkmodules (Sphinx autodoc) - YT jobs library:
docs/reference/ytjobs.md— job-sideytjobspackage (runs on cluster workers) - Environment variables:
docs/reference/environment-variables.md— dev, driver, and sandbox vars - Troubleshooting:
docs/troubleshooting/index.md
After substantive doc edits, build locally with the same environment: make -C docs html (requires the docs extra — included in pip install -e ".[dev,docs]" above; Python 3.11+).
When adding features:
- Update relevant documentation files
- Add examples if applicable
- Update API reference if adding public APIs
- Add troubleshooting entries for common issues
Examples are valuable for demonstrating features:
- Create a new directory in
examples/with a descriptive name - Follow the structure of existing examples
- Include a
README.mdexplaining what the example demonstrates - Add the example to the main
README.mdexamples list
- Document all public classes and methods
- Include parameter descriptions and types
- Document return values
- Include usage examples for complex functions
- Document exceptions that may be raised
Before creating an issue:
- Search existing issues to avoid duplicates
- Check documentation to ensure it's not already covered
- Verify it's a bug or clearly describe the feature request
Include:
- Clear description of the issue
- Steps to reproduce
- Expected vs actual behavior
- Environment details (Python version, OS, etc.)
- Error messages or logs (if applicable)
- Minimal example demonstrating the issue (if possible)
Include:
- Clear description of the feature
- Use case and motivation
- Proposed solution (if you have one)
- Alternatives considered (if any)
-
Fork the repository on GitHub
-
Create a branch for your changes:
git checkout -b feature/my-feature-name # or git checkout -b fix/bug-description -
Make your changes following the guidelines above
-
Commit your changes (see commit message guidelines below)
-
Push to your fork:
git push origin feature/my-feature-name
Write clear, descriptive commit messages:
- Convention: Follow Conventional Commits specification
Examples:
feat: add support for custom Docker images in Map operations
fix: fix stage discovery for nested directories
docs: update documentation with examples
test: add tests for new feature
### Pull Request Process
**Pull requests without up-to-date documentation will not be merged.**
If your changes affect user-facing functionality, APIs, configuration, or behavior, you must update the relevant documentation. This includes:
- User guides in `docs/`
- API reference in `docs/reference/api.md`
- Example code and README files
- Inline code documentation (docstrings)
1. **Update your branch** with the latest changes from main:
```bash
git checkout main
git pull upstream main
git checkout feature/my-feature-name
git rebase main
-
Ensure your code follows style guidelines
-
Test your changes using dev mode and examples
-
Update documentation if needed (see Documentation section)
-
Create a pull request on GitHub with:
- Clear title and description
- Reference to related issues (if any)
- Summary of changes
- Testing performed
Before submitting, ensure:
- Code follows style guidelines
- Changes are tested (dev mode and/or examples)
- Documentation is updated
- Commit messages are clear
- No merge conflicts with main
- Examples still work (if applicable)
- No sensitive data (credentials, tokens) in commits
By contributing to YT Framework, you agree that your contributions will be licensed under the same license as the project.
For questions or concerns, contact the maintainers:
- Gregory Koganovsky - g.koganovsky@gmail.com
- Artem Zavarzin - artemutz555@gmail.com
- Repository: https://github.com/GregoryKogan/yt-framework
- Issues: https://github.com/GregoryKogan/yt-framework/issues
- Documentation: See
docs/directory
- Check the Troubleshooting Guide for common issues
- Review existing Examples for usage patterns
- Open an issue for bugs or feature requests
- Contact maintainers for questions
Thank you for contributing to YT Framework!