Hydra-Python

This package contains the Python coder DSL sources: Python modules that describe how to translate Hydra modules into Python source code. The runnable Python head (hand-written primitives, DSL runtime, pyproject.toml, test runner) lives in heads/python/. The generated Python kernel lives in dist/python/hydra-kernel/.

Hydra is a type-aware data transformation toolkit which aims to be highly flexible and portable. It has its roots in graph databases and type theory, and provides APIs in Haskell, Java, Python, Scala, TypeScript, and Lisp. See the main Hydra README for more details.

Getting started

Hydra-Python requires Python 3.12 or later.

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Create the Python virtual environment in the Python head directory:

cd heads/python
uv venv --python 3.12
source .venv/bin/activate

Install the dependencies:

uv sync

Documentation

For comprehensive documentation about Hydra's architecture and usage, see:

Concepts - Core concepts and type system
Implementation - Implementation guide
Code Organization - The packages/, heads/, dist/ layout
Testing - Common test suite documentation
Developer Recipes - Step-by-step guides
Syncing Hydra-Python - Regenerating the Python kernel from the JSON modules

Testing

Hydra-Python has two types of tests: the common test suite (shared across all Hydra implementations) and Python-specific tests. See the Testing wiki page for comprehensive documentation.

Common test suite

The common test suite (hydra.test.testSuite) ensures parity across all Hydra implementations. Passing all common test suite cases is the criterion for a true Hydra implementation.

To run all tests (from heads/python/):

cd heads/python && pytest

To run only the common test suite:

cd heads/python && pytest src/test/python/test_suite_runner.py

The test suite is generated from Hydra DSL sources and includes:

Primitive function tests (lists, strings, math, etc.)
Case conversion tests (camelCase, snake_case, etc.)
Type inference tests
Type checking tests
Evaluation tests
JSON coder tests
Rewriting and hoisting tests

Python-specific tests

Python-specific tests validate implementation details and Python-specific functionality. These are located in heads/python/src/test/python/ alongside the common test suite runner.

To run a specific test file:

cd heads/python && pytest src/test/python/test_grammar.py

To match a specific test by name:

cd heads/python && pytest -k test_grammar

To see printed outputs, use the -s flag:

cd heads/python && pytest -s

Code organization

Hydra's Python code is split across three locations (see Code organization wiki page for the full picture):

This package (packages/hydra-python/src/main/python/hydra/sources/python/) — the Python coder DSL sources (written in Python). These are the source of truth for the hydra.python.* modules (syntax.py, language.py, coder.py, serde.py, names.py, utils.py, environment.py, testing.py, plus the _python_helpers.py / _kernel_refs.py support modules).
Python kernel overlay (overlay/python/hydra-kernel/src/main/python/) — hand-written Python kernel runtime, overlaid onto dist/python/hydra-kernel/ by bin/copy-kernel-runtime.sh so the published hydra-kernel wheel is self-contained
- hydra/lib/ — primitive function implementations
- hydra/dsl/ — DSL utilities (FrozenDict, Maybe, ...)
- hydra/python/util/ — ConsList, Lazy, PersistentMap, PersistentSet
- hydra/sources/libraries.py — primitive registration
Python head (heads/python/src/main/python/) — bootstrap layer above the kernel (bootstrap.py, generation.py, the hydra.python coder package). pyproject.toml lives in heads/python/.
Generated Python kernel (dist/python/hydra-kernel/src/main/python/)
- hydra/core.py — core types (Term, Type, Literal, ...)
- hydra/graph.py, hydra/packaging.py — graph and packaging structures
- hydra/coders.py — type adapters and coder framework
- hydra/reduction.py, hydra/rewriting.py, hydra/hoisting.py — term transformations
- hydra/inference.py, hydra/checking.py — type inference and checking
- Generated from the kernel DSL sources using the Python coder
Generated Python test suite (dist/python/hydra-kernel/src/test/python/)
- Common tests ensuring parity with Haskell, Java, Scala, and Lisp

Generate Python code

Python code generation has two stages: first the Python coder modules' DSL sources are exported to JSON (Phase 1), then the JSON is loaded by the Python host and used to generate dist/python/hydra-kernel/ (Phase 2). The two stages live in different scripts and can be invoked independently.

Phase 1: regenerate `dist/json/hydra-python/` from the Python DSL sources

bin/generate-hydra-python-from-python.sh is the self-hosting entry point: it runs the Python DSL sources in this package through the Python host and writes dist/json/hydra-python/.

# Regenerate hydra-python JSON from packages/hydra-python/src/main/python/hydra/sources/python/
bin/generate-hydra-python-from-python.sh

# Same, with byte-compare against the existing canonical
bin/generate-hydra-python-from-python.sh --compare

# Use PyPy for ~4x faster generation (CPython is the default)
bin/generate-hydra-python-from-python.sh --pypy

# Force a rebuild of the Python host (kernel JSON + dist/python/hydra-kernel) first
bin/generate-hydra-python-from-python.sh --force-rebuild

The script:

Runs bin/sync-python.sh to ensure dist/python/hydra-kernel/ and dist/python/hydra-python/ are current (these are the only trees the Python DSL → JSON driver reads). Gated by HYDRA_IN_SYNC=1 so that sync.sh Phase 5 invoking us doesn't recurse.
Runs bin/update-python-json.py, which loads the kernel universe from dist/json/hydra-kernel/, imports the Python DSL source modules, infers types, and writes the resulting JSON.

End-to-end is ~110 seconds under PyPy (faster than the Haskell incremental pipeline) and ~500 seconds under CPython, once dist/ is current. See bin/update-python-json.md for background.

Note: bin/sync.sh Phase 5 invokes generate-hydra-python-from-python.sh automatically — the native Python DSL path is the sole source of truth (the legacy Haskell DSL copies under packages/hydra-python/src/main/haskell/ were deleted in #346). See claude/pitfalls.md for the HYDRA_IN_SYNC convention around wrapper-script self-syncing.

Phase 2: regenerate `dist/python/` from the JSON

The recommended end-to-end script is:

bin/sync-python.sh

(equivalent to bin/sync.sh --hosts python --targets python)

This will:

Generate / refresh dist/json/ from the native Python DSL sources
Generate the Python kernel into dist/python/hydra-kernel/src/main/python
Generate the kernel tests into dist/python/hydra-kernel/src/test/python
Run the pytest suite

Validate generated code

find dist/python/hydra-kernel/src -name "*.py" -exec python3 -m py_compile {} +

Formatting, linting, and type checking

Install Ruff, pyright, and pytest, e.g. on macOS:

brew install ruff
brew install pyright
brew install pytest

All of these commands run from the heads/python/ directory (files/directories can also be specified as arguments).

Formatting

Format the hand-written Python code:

ruff format

Linting

Run the linter:

ruff check

Fix fixable linting errors (e.g. removing unused imports):

ruff check --fix

Static type checking

Run the type checker:

pyright

Numeric types

Hydra's decimal type is implemented as Python decimal.Decimal with the default 28-digit context precision. Operations exceeding this precision round per the active context; users requiring higher precision should adjust decimal.getcontext().prec before performing arithmetic. This differs from Haskell Scientific and Java BigDecimal (which are effectively unbounded for exact operations) but matches Python's standard decimal behavior.

Collections

Hydra-Python uses an API/implementation split for list, map, and set values, mirroring Hydra-Java:

At the type level, generated Python uses the standard collections.abc abstract base classes — Sequence[E] for lists, Mapping[K, V] for maps, and Set[E] for sets. Public function signatures stay generic and dependency-free; callers can pass any compatible collection.
At the implementation level, generated term-level literals construct immutable collection classes from hydra.python.util: ConsList (a frozen sequence), PersistentMap (a frozen map), and PersistentSet (a frozen set). Each implements the corresponding collections.abc ABC, so ConsList IS a Sequence, PersistentMap IS a Mapping, and PersistentSet IS a Set.

These classes are thin facades over native tuple, dict, and frozenset. All mutations build a fresh native container via {**self, **other}, tuple(...), or frozenset(...) and freeze it under the immutable wrapper. The cost is full O(n) copy on every update (no structural sharing); the benefit is C-speed inner loops. Hydra-Python has no third-party runtime dependencies beyond the standard library.

Where ordered iteration matters (hydra.lib.maps.{keys, elems, to_list}, the various *_list() extraction helpers, PersistentSet.__iter__), elements are sorted at extraction time via a fall-through comparator: natural < where it works, structural comparison for Hydra Term/Type and other complex values that don't define ordering in Python.

The classes live under hydra.python.util rather than hydra.util because the latter is already a kernel-generated module (containing Comparison, CaseConvention, etc.) shared across all Hydra implementations. Putting the Python-runtime helpers under hydra.python.util keeps the kernel namespace intact while making the host/kernel separation explicit. hydra.python itself is a pkgutil-style namespace package so heads-side helpers and the kernel-generated hydra.python.{coder,environment,...} modules coexist cleanly.

CPython vs PyPy

The bootstrap demo and bin/run-bootstrapping-demo.sh prefer pypy3 when available and fall back to CPython 3.12+. Both interpreters pass the full test suite. For most real workloads, PyPy is the better choice — its JIT makes term-level transformation (the dominant Python-host cost) several times faster than CPython.

Rough guide:

Workload	Faster on
`bin/run-bootstrapping-demo.sh` codegen	CPython by ~5%
Type inference on large modules (`hydra.codegen.infer_modules_given`)	PyPy by ~4×
`hydra.lib.*` primitive microbenchmarks	CPython by ~2.5×

The microbench gap reflects CPython's C-level dict/frozenset/tuple operations beating PyPy's pure-Python equivalents. The inference gap reflects PyPy's JIT amortizing per-call dispatch overhead in long-running term walks. For day-to-day development you can pick whichever is convenient (CPython is usually already installed); PyPy becomes worthwhile when you hit term-level workloads measured in seconds or minutes.

Set HYDRA_PYTHON_INTERPRETER=pypy3 (or any path/name) in the environment to force a specific interpreter for the bootstrap demo.

Future enhancements

Recommendations from #233 that haven't been adopted yet. Recorded here so the design intent survives any future re-evaluation. These are deliberate non-goals today, not bugs.

`__match_args__` for structural pattern matching

Python 3.10's match/case can destructure objects that declare __match_args__. The Python coder could emit this attribute on every generated dataclass-like type, letting consumers write:

match lit:
    case LiteralString(value=v):
        return v
    case LiteralInteger(value=iv):
        match iv:
            case IntegerValueInt32(value=n):
                return n

instead of today's isinstance() chains. Cheap codegen tweak; purely additive (existing isinstance code continues to work). Should ship after other in-flight Python work is stable so the coder edit lands on a quiet baseline.

Kwargs syntax for record construction

# Today
record(Name("Person"), [field(Name("name"), string("Alice")),
                        field(Name("age"), int32(30))])

# Proposed (combined with the existing str→Name auto-coercion)
record("Person", name=string("Alice"), age=int32(30))

Python's **kwargs preserves insertion order (3.7+), so field order is stable. List-of-field() form would remain as an overload for the programmatic case. Edge case: field names that are Python keywords (class, type) need either trailing-underscore convention or a mix of kwargs + explicit field().

Decorator-based element definitions

@hydra_element(my_module, type_=T.function(T.int32(), T.string()))
def show_number(x):
    return Strings.show_int32(x)

Idiomatic Python metadata mechanism. Risk: users may expect the decorator to "compile" arbitrary Python into Hydra terms, which it can't — the decorated body still has to be a TTerm expression. Clear docs required.

Context managers for module scoping

with hydra_module("myModule", namespace="com.example") as m:
    m.define("Person", T.record(name=T.string(), age=T.int32()))
    m.define("greet", lam("p", Strings.cat2(string("Hello, "),
                                            project("Person", "name") @ var("p"))))

Tension with Hydra's functional philosophy (modules are declarative data, not imperative side-effects). Mitigation: the context manager collects definitions and produces an immutable Module on __exit__.

Dataclass-style decorator for record type definitions

@hydra_record("com.example.Person")
class Person:
    name: str             # → T.string()
    age: int              # → T.int32()
    email: Optional[str]  # → T.optional(T.string())

Familiar to anyone who's used @dataclass. Significant scope limits: mapping Python annotations to Hydra types only works for records with simple fields; unions, wrapped types, polymorphic types have no natural Python-annotation equivalent. Would cover a subset only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hydra-Python

Getting started

Documentation

Testing

Common test suite

Python-specific tests

Code organization

Generate Python code

Phase 1: regenerate `dist/json/hydra-python/` from the Python DSL sources

Phase 2: regenerate `dist/python/` from the JSON

Validate generated code

Formatting, linting, and type checking

Formatting

Linting

Static type checking

Numeric types

Collections

CPython vs PyPy

Future enhancements

`__match_args__` for structural pattern matching

Kwargs syntax for record construction

Decorator-based element definitions

Context managers for module scoping

Dataclass-style decorator for record type definitions

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Hydra-Python

Getting started

Documentation

Testing

Common test suite

Python-specific tests

Code organization

Generate Python code

Phase 1: regenerate dist/json/hydra-python/ from the Python DSL sources

Phase 2: regenerate dist/python/ from the JSON

Validate generated code

Formatting, linting, and type checking

Formatting

Linting

Static type checking

Numeric types

Collections

CPython vs PyPy

Future enhancements

__match_args__ for structural pattern matching

Kwargs syntax for record construction

Decorator-based element definitions

Context managers for module scoping

Dataclass-style decorator for record type definitions

Phase 1: regenerate `dist/json/hydra-python/` from the Python DSL sources

Phase 2: regenerate `dist/python/` from the JSON

`__match_args__` for structural pattern matching