Skip to content

CategoricalData/hydra

Repository files navigation

Hydra

Hydra is a strongly typed intermediate language for data, schemas, and code. Programs and domain models written in Hydra map seamlessly to major programming languages like Java, Scala, and Python, to data exchange formats like Protobuf, Avro, and JSON, and to graph data models like RDF and labeled property graphs.

Hydra has been used in production at Microsoft for data modeling, validation, and transforms; its predecessor Dragon drove data integration and graph construction at Uber. Expressive enough to define and compile its own kernel, Hydra is built on the LambdaGraph data model, which establishes an isomorphism between labeled hypergraphs and typed lambda calculus: in Hydra, programs are graphs, and graphs are programs (see The LambdaGraph isomorphism).

Use cases

  • Translingual programming. Write a program or domain model once in a Hydra DSL, using the host language you are most comfortable with, and the same logic becomes available in every other supported language, with test-driven guarantees of semantic equivalence. The Hydra kernel is the most thoroughly exercised example: a working programming language, with its tests, ported across seven languages from a single source of truth.
  • Graph construction. Hydra supports TinkerPop-style property graphs as well as RDF and SHACL, and has been used in combination with the ISO/IEC GQL standard. Hydra provides DSLs for defining schemas and mappings, as well as tools for validating schemas and data, and moving them seamlessly into and out of the graph formats.
  • Data integration. Hydra includes "coders" (encoders+decoders) for many data and schema languages which you can easily compose together to build data transform pipelines. Some of the currently supported languages and formats include Protobuf, Avro, JSON and JSON Schema, YAML, RDF formats including N-Triples, GraphQL, and simple tabular data (CSV/TSV).
  • Computational graphs. As an ontology language, Hydra has deep support for parametirc polymorphism, as well as embedding of computational elements within a graph (sometimes called computational knowledge graphs). This follows naturally from the programs-are-graphs framing. See the KGC 2024 presentation Graphs, logics, and lambda calculus for examples.

Translingual features

Hydra's most distinctive properties come from its graph foundations, and from being a single programming language kernel that runs natively in multiple host languages. Its ability to translate any valid program -- including its own kernel -- into any supported language distinguishes it from conventional polyglot tooling.

Hydra is mutually self-hosting: starting from any one of its current implementations, the kernel can be regenerated into another host language, and that regenerated implementation can in turn regenerate the first — without dependency on the original source language. Seven implementations have this property today — Haskell (Hydra's original bootstrapping language), Java, Python, Scala, and three dialects of Lisp (Clojure, Scheme, and Common Lisp) — and all of them pass the common test suite under every bootstrapping path. Additional ports are in active development; see the Implementations table below for the full set.

The common test suite is what makes translingual programming load-bearing rather than aspirational: it ensures every program behaves the same when translated into each supported language, which is essential in heterogeneous environments where the same logic must be manifested identically across more than one programming language. Examples where this is useful are Gremlin language variants in Apache TinkerPop, where the same queries/programs need to produce identical results against the same data in different runtime environments, database clients which expose the same API and validation logic in different languages, and heterogeneous distributed systems.

Releases

The latest Hydra release is 0.15.0. Published artifacts:

Channel Packages
Hackage (Haskell) hydra
Maven Central (Java) hydra-kernel, hydra-java, hydra-pg, hydra-rdf
PyPI (Python) hydra-kernel, hydra-python, hydra-pg, hydra-rdf
conda-forge (Python) hydra-kernel, hydra-python, hydra-pg, hydra-rdf

All Hydra packages share a single version number; see the CHANGELOG for release history and the release process for how releases are built and published.

Status

Hydra is preparing for its 1.0 release, with the intention of becoming an Apache Incubator project and integrating more directly with Apache TinkerPop and other projects in the Apache ecosystem. The last few releases have focused on production-hardening and forward compatibility.

Implementations

A Hydra head is the mapping of the Hydra kernel into a host language, together with the primitive functions and bootstrapping infrastructure required to make Hydra a complete programming language on that host. Each head is an independent point of entry to Hydra: you can pick the head you're most comfortable with and ignore the others.

Head Status Notes
Haskell Complete Hydra's original bootstrapping language and reference implementation. (Haskell)
Java Complete (Java)
Python Complete (Python)
Scala Complete (Scala)
Clojure Complete A Lisp dialect on the JVM. (Clojure)
Scheme Complete (Scheme)
Common Lisp Complete (Common Lisp)
Emacs Lisp In progress (Emacs Lisp)
TypeScript In progress Passes the common test suite as a target, and as a host can bootstrap every other head except Java; see the stack-limit caveat for details. (TypeScript)
Go In progress (Go)
Rust In progress Coder lives in hydra-ext; the Rust head has not yet been split into its own package. (Rust)
Coq In progress Generation-only target; there is no Coq-side runtime. (Coq)
WebAssembly In progress (WebAssembly)
C++ In progress Coder lives in hydra-ext; the C++ head has not yet been split into its own package. (C++)

Packages

A Hydra package is a unit of DSL source code organized around a coherent area of functionality. Most packages are language-independent (described in a DSL, generated into every supported host); a few are domain models that build on the kernel.

Package Purpose
hydra-kernel Core types, terms, type inference, validation, primitives, and the coder framework. Every other package depends on this.
hydra-ext Extension coders: Avro, Protobuf, GraphQL, Pegasus/PDL, JSON Schema, YAML, SQL, C++, Rust, Csharp, plus domain models.
hydra-pg Property-graph model and coders (GraphSON, Cypher, GQL, TinkerPop, Graphviz, RDF mappings); the PG validator.
hydra-rdf RDF 1.1, SHACL, OWL 2, ShEx, and XML Schema syntax models; N-Triples serialization.
hydra-bench Synthetic inference benchmark workloads. Opt-in: not part of default sync.
hydra-haskell, hydra-java, hydra-python, hydra-scala, hydra-lisp, hydra-typescript, hydra-go, hydra-coq, hydra-wasm Per-language coder packages (DSL sources for translating Hydra modules to each target). See the Implementations table for head status.

Bindings

A binding is a hand-written, host-specific artifact that connects a Hydra package to an external system. Bindings sit outside the packages/heads/dist pipeline; they have no DSL definition. See Code organization § About bindings/ for the rules.

Binding Purpose
hydra-rdf4j Eclipse rdf4j integration for hydra-rdf.
hydra-neo4j Cypher and openGQL parsers via ANTLR, converting to hydra.pg.query.*.
hydra-pg-dsl Java fluent builders (Graphs, Queries, Merging) for hydra-pg.

Resources

Getting started and using Hydra

  • Getting started — using Hydra as a library from your own project.
  • DSL guide — writing Hydra programs and domain models using the embedded DSLs. See also the Java and Python variants.
  • Demos — runnable demos illustrating Hydra's capabilities, with input data and expected output.
  • Troubleshooting — common failure modes and how to diagnose them.

Concepts and design

  • Concepts — core type system, terms, modules, and design principles.
  • Property graphs — Hydra's hypergraph foundation and its relationship to TinkerPop-style property graphs.
  • RDF support — modeling, validation, and emission for RDF, SHACL, and related semantic-web formats.
  • JSON format — the language-neutral interchange format for kernel modules.

For contributors

  • Implementation guide — architectural deep dive into kernel modules, DSLs, primitives, and coders.
  • Code organization — the packages/, heads/, and dist/ layout.
  • Coding style — guiding principles, ordering conventions, common mistakes.
  • Developer recipes — step-by-step guides for adding primitives, extending core types, refactoring, and similar tasks.
  • Build system — pipeline phases, caching layers, and what triggers regeneration.

Talks, demos, and community

About

Graph programming language

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors