Skip to content

Latest commit

 

History

History
557 lines (368 loc) · 24 KB

File metadata and controls

557 lines (368 loc) · 24 KB

Contributing to YT Framework

This file is the maintainer checklist: environment, tests, docs build, and review expectations.

Table of Contents

Code of Conduct

This project adheres to a code of conduct that all contributors are expected to follow. Please be respectful and constructive in all interactions.

How to Contribute

Contributions come in many forms:

  • Bug Reports: Report issues you encounter
  • Feature Requests: Suggest new features or improvements
  • Code Contributions: Submit pull requests with bug fixes or new features
  • Documentation: Improve or add documentation
  • Examples: Add example pipelines demonstrating framework features
  • Testing: Add tests or improve test coverage

Development Setup

Prerequisites

  • Python 3.11 or higher
  • Git
  • Conda (Miniconda, Mambaforge, or similar) — recommended so you use one shared environment for tests, docs, and formatting
  • Access to YTsaurus cluster (for production mode testing)
  • YT credentials (for production mode testing)

Installation

  1. Fork and clone the repository:

    git clone https://github.com/GregoryKogan/yt-framework.git
    cd yt-framework
  2. Create and activate the project Conda environment (recommended):

    conda create -n yt-framework python=3.11
    conda activate yt-framework

    Optionally use conda-forge (e.g. conda create -n yt-framework python=3.11 -c conda-forge).

  3. Install the package in editable mode with dev and docs extras:

    pip install -e ".[dev,docs]"

    If which pip points outside the Conda env (for example Homebrew), use python -m pip install -e ".[dev,docs]" instead. The same applies to one-off commands with conda run -n yt-framework -- python -m pip ....

    This installs runtime dependencies plus ruff, basedpyright, vulture, xenon, tach, pytest, pytest-cov, pre-commit, and the Sphinx stack needed for make -C docs html. You do not need a separate pip install -e ".[docs]" step.

    Without Conda: use a Python 3.11+ virtual environment, then pip install -e ".[dev,docs]" (or pip install -e . and pip install -e ".[dev]" if you will not build docs locally).

  4. IDE / Cursor: set the Python interpreter to the yt-framework Conda environment so the editor, integrated terminal, and agents target the same interpreter.

  5. Install Git commit hooks (recommended):

    pre-commit install

    Run this once per clone. On each commit, pre-commit runs Ruff (ruff check --fix and ruff format), strict BasedPyright, Vulture (dead-code scan for yt_framework and ytjobs; see [tool.vulture] in pyproject.toml), Xenon (cyclomatic complexity on those packages; thresholds in .pre-commit-config.yaml), Tach as local hooks tach-check and tach-external (tach check for internal module boundaries and tach check-external so third-party imports match [project] dependencies in pyproject.toml; Tach is pinned to the same version as in the dev extra), the yt-framework-pre-commit-policy hook (python scripts/precommit/run.py, configured under [tool.yt_framework.pre_commit] in pyproject.toml: by default 550 lines max per *.py under the listed roots (same newline count as wc -l), 10 max immediate non-ignored children per directory, and 5 max underscore-separated segments in each bound name across the AST—see that table for overrides), and python scripts/coverage/run_pytest_line_gate.py from the pytest hook (same as CI: pytest -m "not yt_cluster" over yt_framework and ytjobs with --cov-report=json, then scripts/coverage/check_line_coverage.py so commits cannot pass with missed statements). The hook uses language: python with additional_dependencies aligned to [project] dependencies in pyproject.toml, not language: system / Conda. You can still run conda run -n yt-framework -- python scripts/coverage/run_pytest_line_gate.py locally when you prefer the Conda env. To skip hooks in an emergency, use git commit --no-verify or git push --no-verify. To run checks manually before commit, use:

    pre-commit run ruff-check --all-files
    pre-commit run ruff-format --all-files
    pre-commit run basedpyright --all-files
    pre-commit run vulture --all-files
    pre-commit run xenon --all-files
    pre-commit run tach-check --all-files
    pre-commit run tach-external --all-files
    pre-commit run yt-framework-pre-commit-policy --all-files
    pre-commit run pytest --all-files

    Or directly (same Conda env): conda run -n yt-framework -- ruff check ., conda run -n yt-framework -- ruff format ., conda run -n yt-framework -- basedpyright --pythonpath "$(python -c 'import sys; print(sys.executable)')", conda run -n yt-framework -- vulture, conda run -n yt-framework -- xenon --max-absolute=A --max-modules=A --max-average=A yt_framework ytjobs, conda run -n yt-framework -- python scripts/precommit/run.py, conda run -n yt-framework -- tach check, conda run -n yt-framework -- tach check-external, and conda run -n yt-framework -- python scripts/coverage/run_pytest_line_gate.py.

    When you change runtime dependencies under [project] dependencies in pyproject.toml, update the pytest hook’s additional_dependencies in .pre-commit-config.yaml so the hook’s venv matches (including pytest-cov for the coverage gate).

  6. Set up YT credentials (for production mode testing):

    Create a secrets.env file in any example's configs/ directory:

    # configs/secrets.env
    YT_PROXY=your-yt-proxy-url
    YT_TOKEN=your-yt-token

    See Configuration Guide for more details.

  7. Verify installation:

    python -c "import yt_framework; print('YT Framework installed successfully')"

Development Workflow

Using Dev Mode

YT Framework supports a dev mode that simulates YT operations locally using the file system. This is perfect for development and testing without needing YT cluster access.

  1. Set mode to dev in your pipeline config:

    # configs/config.yaml
    pipeline:
      mode: "dev"
  2. Run your pipeline:

    python pipeline.py

    In dev mode, tables are stored as .jsonl files in the .dev/ directory, and operations run locally.

See Dev vs Prod Guide for more details.

Creating Test Pipelines

When testing changes, create a test pipeline in the examples/ directory:

  1. Create a new example directory:

    mkdir -p examples/test_feature/stages/my_stage configs
  2. Create pipeline.py:

    from yt_framework.core.pipeline import DefaultPipeline
    
    if __name__ == "__main__":
        DefaultPipeline.main()
  3. Create stage and config files following the pattern in existing examples.

  4. Run the example to verify your changes work correctly.

Running Examples

To verify your changes work with existing examples:

cd examples/01_hello_world
python pipeline.py

This helps ensure you haven't broken existing functionality.

Module boundaries (Tach)

Tach enforces which subpackages under yt_framework and ytjobs may import each other. tach.toml lists every module with explicit depends_on, layer ordering, layers_explicit_depends_on, unused-edge detection (exact), and no circular first-party cycles. Anything under tests/, examples/, docs/, and tools/ is excluded from that graph. Layer narrative: docs/architecture/layers.md.

Source of truth: when a rule matters for architecture, it should appear in tach.toml (and usually in docs/architecture/layers.md). tests/test_architecture_boundaries.py repeats a handful of import rules as line-oriented greps over yt_framework/operations and yt_framework/yt. That overlaps Tach for some cases (for example operations must not import core). Keep both in sync: Tach is authoritative for the full dependency graph; the pytest module exists so failures cite concrete file and line, which is easier to read in CI than a graph edge alone.

If your change adds or removes imports across those boundaries, update tach.toml in the same branch. Run tach check after substantive edits; if the graph drifted, run tach sync and then trim redundant depends_on entries so exact stays satisfied. Run tach check-external when you touch third-party imports so they stay aligned with pyproject.toml.

Widening excludes, turning modules into utilities, or disabling checks requires maintainer agreement—do not do that to get a green build.

Code Organization Principles

  • Separation of Concerns: Keep core logic, operations, and utilities separate
  • Reusability: Write code that can be reused across different stages
  • Simplicity: Prefer simple, readable solutions over complex ones
  • Consistency: Follow existing patterns and conventions

Code Style and Conventions

Python Style

  • Follow PEP 8 style guidelines
  • Use Ruff for formatting and linting (line length 88, Python 3.11; see [tool.ruff] in pyproject.toml). Lint uses select = ["ALL"] with a small documented ignore list.
  • Use BasedPyright for strict static typing (pyrightconfig.json). The checked tree is yt_framework and ytjobs (see include / exclude there); tests and examples/ are excluded from that pass.
  • Use Vulture for unused code at 100% confidence ([tool.vulture] in pyproject.toml). If static analysis flags code that is used dynamically, add a small whitelist module and list it under [tool.vulture] paths (see Vulture’s docs).
  • Use Xenon (Radon-backed) for cyclomatic complexity on yt_framework and ytjobs. Thresholds are --max-absolute=A, --max-modules=A, --max-average=A (see .pre-commit-config.yaml and the CI lint job). See Xenon and Radon ranks.
  • Use Tach (tach-org/tach) so imports between yt_framework.* and ytjobs.* match tach.toml: explicit depends_on per module, layers, no ytjobsyt_framework edges, and tach check-external against declared runtime dependencies. Tests, examples, docs, and tools/ stay out of the graph.
  • Use type hints where appropriate

Naming Conventions

  • Classes: PascalCase (e.g., BaseStage, DefaultPipeline)
  • Functions and variables: snake_case (e.g., write_table, config_path)
  • Constants: UPPER_SNAKE_CASE (e.g., DEFAULT_MODE)
  • Private methods: Prefix with underscore (e.g., _internal_method)

Docstrings

Use Google-style docstrings:

def write_table(self, table_path: str, rows: list) -> None:
    """Write rows to a YT table.

    Args:
        table_path: Path to the YT table (e.g., "//tmp/my_table")
        rows: List of dictionaries representing table rows

    Raises:
        ValueError: If table_path is invalid
    """
    ...

Import Organization

Organize imports in this order:

  1. Standard library imports
  2. Third-party imports
  3. Local application imports

Example:

import os
from pathlib import Path
from typing import Optional

import ytsaurus_client
from omegaconf import DictConfig

from yt_framework.core.stage import BaseStage
from yt_framework.operations.table import TableOperation

File Structure Conventions

  • One class per file (when possible)
  • Keep files focused and cohesive
  • Use __init__.py to expose public API
  • Place tests in tests/ directory (when test suite exists)

Testing

Testing Approach

YT Framework uses pytest for testing (available as a dev dependency). Alongside the main suite, you can use:

  1. Dev Mode Testing: Use dev mode to test changes locally
  2. Example Pipelines: Run existing examples to verify compatibility
  3. Manual Testing: Create test pipelines to exercise new features
  4. Real cluster integration tests (optional): pytest packages under tests/integration/yt_cluster/ and tests/integration/examples_cluster/; see docs/testing/yt-cluster-integration.md and docs/testing/example-pipelines.md

Running Tests

When tests are available, run them with:

# Run all tests
conda run -n yt-framework -- pytest

# Run with coverage (matches CI; excludes real-cluster marker `yt_cluster`)
conda run -n yt-framework -- pytest -m "not yt_cluster" --cov=yt_framework --cov=ytjobs --cov-report=json:coverage.json
conda run -n yt-framework -- python scripts/coverage/check_line_coverage.py coverage.json

# Same checks as the pre-commit `pytest` hook (pytest + JSON report + line gate)
conda run -n yt-framework -- python scripts/coverage/run_pytest_line_gate.py

# Run specific test file
conda run -n yt-framework -- pytest tests/test_stage.py

Real cluster integration tests (optional)

If YT_PROXY and YT_TOKEN are available from a repo-root yt-cluster-test.env file (see yt-cluster-test.example.env), from YT_FRAMEWORK_CLUSTER_TEST_ENV, or from the environment, pytest collects tests/integration/yt_cluster/ and tests/integration/examples_cluster/ on your machine. Those directories are also ignored when CI=true (typical on CI hosts), and GitHub Actions runs pytest -m "not yt_cluster" so real-cluster tests never execute there even if secrets were misconfigured. Without credentials and outside CI, the same packages are skipped at collection time.

Run only those tests:

conda run -n yt-framework -- pytest -m yt_cluster -xvs

Do not commit real tokens; *.env is gitignored except *example.env. Jobs in these tests rely on the cell default Docker image (no YT_TEST_DOCKER_IMAGE).

If you keep yt-cluster-test.env in your clone and run the full pytest command locally, cluster tests will run too. To avoid hitting a cell, unset those credentials for that run, narrow markers, or skip cluster tests as described in docs/testing/yt-cluster-integration.md.

CI (GitHub Actions)

The workflow .github/workflows/ci.yml runs Ruff, Vulture, Xenon, tach check, strict BasedPyright, and pytest -m "not yt_cluster" with coverage over yt_framework and ytjobs on every push to any branch and on pull requests targeting main or dev (Python 3.11, pip install -e ".[dev]"). That pytest run includes dev-tier example pipeline smoke tests under tests/integration/example_pipelines/. After pytest writes coverage.json, CI runs python scripts/coverage/check_line_coverage.py coverage.json, which fails if any statement under those two packages is still marked missing (branch coverage is collected separately and is not part of that gate). Real YT cluster tests are excluded because the runner has no cell access. To require a green check before merging, configure branch protection on GitHub and add lint, typecheck, and test as required status checks.

Coverage badge (maintainers)

The README coverage badge is powered by shields.io endpoint badges reading JSON from a public GitHub Gist. CI on push to main updates that file when repository configuration is present. Forks and PRs from forks do not run the gist step.

Complete these steps in order (each step is one action):

  1. Create a classic personal access token with scope gist only: GitHub → SettingsDeveloper settingsPersonal access tokens. Store the token securely after creation.

  2. Create a public gist (Your gists → create). Add a single file named yt-framework-coverage.json with this content: {"schemaVersion":1,"label":"coverage","message":"0%","color":"lightgrey"}. Save the gist.

  3. Copy the gist ID from the gist URL https://gist.github.com/<you>/<GIST_ID> (the hex segment after your username).

  4. In the yt-framework repository: SettingsSecrets and variablesActionsVariablesNew repository variable. Name: COVERAGE_GIST_ID. Value: the gist ID from step 3. Save.

  5. In the same repository: Secrets and variablesActionsSecretsNew repository secret. Name: COVERAGE_GIST_TOKEN. Value: the PAT from step 1. Save.

  6. Confirm the gist owner and the PAT owner are the same GitHub user (or that the PAT can edit that gist).

  7. Edit README.md: in the coverage badge URL, replace YOUR_GIST_ID with the gist ID from step 3 (leave the rest of the shields.io URL unchanged).

  8. Push or merge to main and confirm in Actions that CI succeeded; open the gist and check that yt-framework-coverage.json shows the real percentage.

  9. If the README badge looks stale, hard-refresh the page or wait briefly; shields.io and proxies may cache images. The badge URL includes cacheSeconds=60 to limit caching.

Testing with Dev Mode

Dev mode is ideal for testing because it:

  • Simulates YT operations locally
  • Doesn't require YT cluster access
  • Provides fast feedback
  • Creates reproducible test environments

Example:

cd examples/01_hello_world
# Ensure config.yaml has mode: "dev"
python pipeline.py

Testing with Examples

Catalog and subprocess checks live in examples/manifest.yaml; see docs/testing/example-pipelines.md.

From the repo root (uses the same pytest selection as CI for dev-tier demos):

conda run -n yt-framework -- pytest tests/integration/example_pipelines/test_smoke.py -m examples --tb=short

Optional prod demos on a real cell (same credential rules as cluster IT) are documented in docs/testing/yt-cluster-integration.md.

You can still run a single tree by hand, for example:

cd examples/01_hello_world
python pipeline.py

Documentation

Updating Documentation

Documentation lives in the docs/ directory:

  • Main docs: docs/index.md - Installation and quick start
  • Guides: docs/pipelines-and-stages.md, docs/configuration/index.md, docs/operations/, docs/advanced/, etc.
  • API reference (framework): docs/reference/api.mdyt_framework modules (Sphinx autodoc)
  • YT jobs library: docs/reference/ytjobs.md — job-side ytjobs package (runs on cluster workers)
  • Environment variables: docs/reference/environment-variables.md — dev, driver, and sandbox vars
  • Troubleshooting: docs/troubleshooting/index.md

After substantive doc edits, build locally with the same environment: make -C docs html (requires the docs extra — included in pip install -e ".[dev,docs]" above; Python 3.11+).

When adding features:

  1. Update relevant documentation files
  2. Add examples if applicable
  3. Update API reference if adding public APIs
  4. Add troubleshooting entries for common issues

Adding Examples

Examples are valuable for demonstrating features:

  1. Create a new directory in examples/ with a descriptive name
  2. Follow the structure of existing examples
  3. Include a README.md explaining what the example demonstrates
  4. Add the example to the main README.md examples list

Docstring Standards

  • Document all public classes and methods
  • Include parameter descriptions and types
  • Document return values
  • Include usage examples for complex functions
  • Document exceptions that may be raised

Submitting Contributions

Creating Issues

Before creating an issue:

  1. Search existing issues to avoid duplicates
  2. Check documentation to ensure it's not already covered
  3. Verify it's a bug or clearly describe the feature request

Bug Reports

Include:

  • Clear description of the issue
  • Steps to reproduce
  • Expected vs actual behavior
  • Environment details (Python version, OS, etc.)
  • Error messages or logs (if applicable)
  • Minimal example demonstrating the issue (if possible)

Feature Requests

Include:

  • Clear description of the feature
  • Use case and motivation
  • Proposed solution (if you have one)
  • Alternatives considered (if any)

Forking and Branching

  1. Fork the repository on GitHub

  2. Create a branch for your changes:

    git checkout -b feature/my-feature-name
    # or
    git checkout -b fix/bug-description
  3. Make your changes following the guidelines above

  4. Commit your changes (see commit message guidelines below)

  5. Push to your fork:

    git push origin feature/my-feature-name

Commit Message Guidelines

Write clear, descriptive commit messages:

Examples:

feat: add support for custom Docker images in Map operations
fix: fix stage discovery for nested directories
docs: update documentation with examples
test: add tests for new feature

### Pull Request Process

**Pull requests without up-to-date documentation will not be merged.**

If your changes affect user-facing functionality, APIs, configuration, or behavior, you must update the relevant documentation. This includes:

- User guides in `docs/`
- API reference in `docs/reference/api.md`
- Example code and README files
- Inline code documentation (docstrings)

1. **Update your branch** with the latest changes from main:

   ```bash
   git checkout main
   git pull upstream main
   git checkout feature/my-feature-name
   git rebase main
  1. Ensure your code follows style guidelines

  2. Test your changes using dev mode and examples

  3. Update documentation if needed (see Documentation section)

  4. Create a pull request on GitHub with:

    • Clear title and description
    • Reference to related issues (if any)
    • Summary of changes
    • Testing performed

Pull Request Checklist

Before submitting, ensure:

  • Code follows style guidelines
  • Changes are tested (dev mode and/or examples)
  • Documentation is updated
  • Commit messages are clear
  • No merge conflicts with main
  • Examples still work (if applicable)
  • No sensitive data (credentials, tokens) in commits

Additional Guidelines

License

By contributing to YT Framework, you agree that your contributions will be licensed under the same license as the project.

Maintainers

For questions or concerns, contact the maintainers:

Links

Getting Help

  • Check the Troubleshooting Guide for common issues
  • Review existing Examples for usage patterns
  • Open an issue for bugs or feature requests
  • Contact maintainers for questions

Thank you for contributing to YT Framework!