This directory contains a complete Java implementation of Hydra. Hydra-Java passes all tests in the common test suite, ensuring identical behavior with Hydra-Haskell and Hydra-Python.
Hydra is a type-aware data transformation toolkit which aims to be highly flexible and portable. It has its roots in graph databases and type theory, and provides APIs in Haskell, Java, Python, Scala, TypeScript, and Lisp. See the main Hydra README for more details.
JavaDocs for Hydra-Java can be found here, and releases can be found on Maven Central here.
Hydra-Java requires Java 11 or later. The Gradle wrapper lives at
heads/java/gradlew; all subprojects (hydra-java, hydra-rdf4j,
hydra-neo4j, hydra-pg-dsl) are configured from there.
cd heads/java
./gradlew :hydra-java:buildTo publish the resulting JAR to your local Maven repository:
cd heads/java
./gradlew :hydra-java:publishToMavenLocalYou may need to set the JAVA_HOME environment variable:
cd heads/java
JAVA_HOME=/path/to/java11/installation ./gradlew :hydra-java:buildOn Apple Silicon (M1/M2/M3/M4) Macs, using an x86_64 JDK (which runs under Rosetta 2 translation) can cause ~20x slower code generation and test execution compared to a native arm64 JDK. This applies to any Java version — the critical factor is architecture, not version number.
To check your JDK's architecture:
file "$(which java)"
# arm64 = native (fast), x86_64 = Rosetta (slow)If you see x86_64, install a native arm64 JDK. Downloads are available from
Oracle,
Adoptium, or via Homebrew (brew install openjdk@11).
To compare Hydra's performance across JDK versions on your machine, use the benchmark
test runner with --tag to label each run and --repeat for statistical reliability:
export JAVA_HOME=$(/usr/libexec/java_home -v 11)
bin/run-benchmark-tests.sh --hosts java --tag java11 --repeat 5
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
bin/run-benchmark-tests.sh --hosts java --tag java17 --repeat 5
# Compare results
bin/run-benchmark-tests.sh dashboard diff --old java11 --new java17For comprehensive documentation about Hydra's architecture and usage, see:
- Concepts - Core concepts and type system
- Implementation - Implementation guide
- Code Organization -
The
packages/,heads/,dist/layout - Testing - Common test suite documentation
- Developer Recipes - Step-by-step guides
Hydra-Java has two types of tests: the common test suite (shared across all Hydra implementations) and Java-specific tests. See the Testing wiki page for comprehensive documentation.
The common test suite (hydra.test.testSuite) ensures parity across all Hydra implementations.
Passing all common test suite cases is the criterion for a true Hydra implementation.
To run all tests:
cd heads/java
./gradlew :hydra-java:testThe test suite is generated from Hydra DSL sources and includes:
- Primitive function tests (lists, strings, math, etc.)
- Case conversion tests (camelCase, snake_case, etc.)
- Type inference tests
- Type checking tests
- Evaluation tests
- JSON coder tests
- Rewriting and hoisting tests
Java-specific tests validate implementation details and Java-specific functionality.
These are located in src/test/java/ alongside the common test suite runner.
To run a specific test class:
cd heads/java
./gradlew :hydra-java:test --tests "hydra.VisitorTest"Hydra's Java code is split across three locations (see Code organization wiki page for the full picture):
-
This package (
packages/hydra-java/src/main/java/hydra/sources/java/) — the Java coder DSL sources (written in Java). These are the source of truth for thehydra.java.*modules (Syntax, Language, Coder, Serde, Names, Utils, Environment, Testing, plus the hand-writtenJavaHelpersandSourceDslsupport classes).Legacy backup:
packages/hydra-java/src/main/haskell/Hydra/Sources/Java/still contains the older Haskell-DSL versions of these modules. They are kept as a backup through the 0.15 line and produce byte-identicaldist/json/hydra-java/output, but will be dropped before 0.16. Edits should go into the Java sources, not the Haskell ones. -
Java head (
heads/java/src/main/java/) — hand-written Java runtimehydra/lib/— primitive function implementationshydra/dsl/— Java DSL (Terms, Types, Expect, ...)hydra/util/— core utilities (Either, Maybe, Pair, Lazy) plus the persistent collection helpersConsList/PersistentMap/PersistentSet(see Collection classes under design notes)hydra/tools/— framework classes (PrimitiveFunction, MapperBase, ...)hydra/UpdateJavaJson.java— driver that updatesdist/json/hydra-java/from the Java DSL sources in this package (see Generate Java code)
-
Generated Java kernel (
dist/java/hydra-kernel/src/main/java/) — code-generated from the kernel DSL sourceshydra/core/,hydra/graph/,hydra/packaging/,hydra/coders/,hydra/typing/, ...hydra/reduction/,hydra/rewriting/,hydra/hoisting/hydra/inference/,hydra/checking/
-
Generated Java test suite (
dist/java/hydra-kernel/src/test/java/) — the common test suite compiled into Java.
Java code generation has two stages: first the Java coder modules' DSL sources are
exported to JSON (Phase 1), then the JSON is loaded by the Java host and used to
generate dist/java/hydra-kernel/ (Phase 2). The two stages live in different
scripts and can be invoked independently.
bin/generate-hydra-java-from-java.sh is the self-hosting entry point: it runs
the Java DSL sources in this package through the Java host and writes
dist/json/hydra-java/.
# Regenerate hydra-java JSON from packages/hydra-java/src/main/java/hydra/sources/java/
bin/generate-hydra-java-from-java.sh
# Same, with byte-compare against the existing canonical
bin/generate-hydra-java-from-java.sh --compare
# Force a rebuild of the Java host (kernel JSON + dist/java/hydra-kernel) first
bin/generate-hydra-java-from-java.sh --force-rebuildThe script:
- Runs
bin/sync.shto ensure every per-languagedist/java/hydra-*tree is current (the gradle rollup importshydra.python.*,hydra.haskell.*, etc., so a scopedsync-java.shis not sufficient). Gated byHYDRA_IN_SYNC=1so thatsync.shPhase 5 invoking us doesn't recurse. Warm-cache sync is ~3 minutes. - Compiles the rollup (
(cd heads/java && ./gradlew :hydra-java:compileHeadsExtrasJava)). - Runs
hydra.UpdateJavaJson(viabin/update-java-json.sh), which loads the kernel universe fromdist/json/hydra-kernel/, discovers the Java DSL source modules via reflection, infers types for those that don't carry pre-computed type schemes (Coder ships its schemes pre-computed; see bin/update-java-json.md for the rationale), and writes the resulting JSON.
End-to-end is ~30 seconds once dist/ is current.
Note:
bin/sync.shPhase 5 invokesgenerate-hydra-java-from-java.shautomatically — the native Java DSL path is authoritative. The legacy Haskell DSL copy atpackages/hydra-java/src/main/haskell/remains as a bootstrap fallback (used by Phase 1 on a cold checkout) and will be retired before 0.16. Seeclaude/pitfalls.mdfor theHYDRA_IN_SYNCconvention around wrapper-script self-syncing.
The narrowest end-to-end script is:
bin/sync-java.sh(equivalent to bin/sync.sh --hosts java --targets java)
This will:
- Generate / refresh
dist/json/from the legacy Haskell DSL sources - Generate the Java kernel into
dist/java/hydra-kernel/src/main/java - Generate the default lib modules
- Generate the kernel tests into
dist/java/hydra-kernel/src/test/java
sync-java.sh does not run the Java test suite. To validate against the
generated tests, run heads/java/bin/test-distribution.sh hydra-kernel
afterward (or do a full bootstrap demo via bin/run-bootstrapping-demo.sh --hosts java --targets java, which includes test runs).
Note on Phase 5 (Java self-host).
sync-java.shwill also run Phase 5 (generate-hydra-java-from-java.sh), which compileshydra.Generationand friends. That compile importshydra.{python,haskell,lisp,typescript}.*from per-languagedist/java/trees, so on a cold checkout Phase 5 will fail until those siblings have been populated bybin/sync.sh(full matrix) orbin/sync.sh --hosts java --targets <every-language>. Seeclaude/pitfalls.md§"gradle :hydra-java:testneeds all coder language packages indist/java/".
The Java coder DSL sources that drive Java code generation live here. A variety of techniques are used in order to materialize Hydra's core language in Java, including a pattern for representing algebraic data types which was originally proposed by Gabriel Garcia, and used in Dragon.
For example, the generated Vertex class represents a property graph vertex, and corresponds to a record type:
public class Vertex<V> {
public final hydra.pg.model.VertexLabel label;
public final V id;
public final java.util.Map<hydra.pg.model.PropertyKey, V> properties;
public Vertex (hydra.pg.model.VertexLabel label, V id, java.util.Map<hydra.pg.model.PropertyKey, V> properties) {
java.util.Objects.requireNonNull((label));
java.util.Objects.requireNonNull((id));
java.util.Objects.requireNonNull((properties));
this.label = label;
this.id = id;
this.properties = properties;
}
@Override
public boolean equals(Object other) {
if (!(other instanceof Vertex)) {
return false;
}
Vertex o = (Vertex) (other);
return label.equals(o.label) && id.equals(o.id) && properties.equals(o.properties);
}
@Override
public int hashCode() {
return 2 * label.hashCode() + 3 * id.hashCode() + 5 * properties.hashCode();
}
// ... with* methods for immutable updates
}See
Vertex.java
for the complete class, as well as the Vertex type in
Pg/Model.hs
for comparison.
Both files were generated from the property graph model defined
here.
Hydra's term-level lists, maps, and sets get persistent (immutable,
structurally-shared) implementations under hydra.util:
ConsList<T>— singly-linked list with O(1)consand tail sharing, matching Haskell's[a].PersistentMap<K, V>— ordered red-black tree map with O(log n) insert, delete, and union, matchingData.Mapin Haskell. Iteration is sorted by key. Keys must beComparableat runtime.PersistentSet<T>— wrapper overPersistentMap, matchingData.Set.
These are the concrete instances behind the standard java.util.List,
java.util.Map, and java.util.Set interfaces in generated kernel code:
the kernel's API surfaces (POJO fields, primitive apply signatures, accessors)
all expose the JDK interfaces, but the values flowing through them are persistent.
This matches the Haskell semantics — every "incremental" operation
(cons, insert, delete, union) returns a new collection that shares
structure with the original instead of copying — so an O(n) Haskell algorithm
runs in O(n) on the JVM instead of O(n²) (which is what naive
new ArrayList<>(old); old.add(x) would yield).
Two boundaries exist where plain JDK collections still appear:
- Internal sort scratch buffers in algorithms that need O(1) random access
(e.g.
Sort,SortOn,Transpose). These never escape the function and return aConsListto the caller. LinkedHashMapin JSON output (hydra.json.JsonEncoding.ObjectBuilder), to preserve insertion-order key emission. This is a deliberate user-visible ordering choice, not a bug.
The Java coder also emits these helpers automatically when lowering Hydra
term-level list/map/set literals, so generated code in dist/java/ is
consistent with the runtime's choice. See encodeTermInternal's _Term_list,
_Term_map, and _Term_set arms in
Coder.java.
Union types (sum types) are represented using the visitor pattern.
For example, the Element type is a tagged union of Vertex and Edge:
public abstract class Element<V> {
private Element () {}
public abstract <R> R accept(Visitor<V, R> visitor) ;
public interface Visitor<V, R> {
R visit(Vertex<V> instance) ;
R visit(Edge<V> instance) ;
}
public interface PartialVisitor<V, R> extends Visitor<V, R> {
default R otherwise(Element<V> instance) {
throw new IllegalStateException("Non-exhaustive patterns when matching: " + (instance));
}
default R visit(Vertex<V> instance) { return otherwise((instance)); }
default R visit(Edge<V> instance) { return otherwise((instance)); }
}
public static final class Vertex<V> extends hydra.pg.model.Element<V> {
public final hydra.pg.model.Vertex<V> value;
// ... constructor, equals, hashCode, accept
}
public static final class Edge<V> extends hydra.pg.model.Element<V> {
public final hydra.pg.model.Edge<V> value;
// ... constructor, equals, hashCode, accept
}
}See
Element.java
for the complete class.
The Visitor class is for pattern matching over the alternatives,
and PartialVisitor is a convenient extension which allows supplying a default value
for alternatives not matched explicitly.
The Rewriting and Reduction classes are good examples of pattern matching in action, and there are simpler examples in VisitorTest.java.
Recommendations from #233 that haven't been adopted yet. Recorded here so the design intent survives any future re-evaluation. These are deliberate non-goals today, not bugs.
The current minimum Java version for hydra-java and its generated code is
Java 11. The visitor pattern shown above is the most ergonomic encoding
of sum types compatible with that floor. Several worthwhile improvements
become available if the minimum is raised to Java 21.
Today's generated union types use an abstract base class with nested
subclasses and a Visitor/PartialVisitor for dispatch. Java 21's sealed
hierarchies combined with pattern-matching switch expressions would let
consumers write:
String label = switch (term) {
case Term.Literal l -> "literal: " + l.value;
case Term.Variable v -> "var: " + v.value;
case Term.Function f -> "function";
case Term.Application a -> "app";
// ... compiler enforces exhaustiveness; missing cases are a compile error
};Compared to today's PartialVisitor (which throws at runtime on
unhandled cases), this would give compile-time exhaustiveness checking,
remove the accept/visit boilerplate from every consumer site, and
align the Java emission with what equivalent Hydra code looks like in
Haskell, Scala, Python (match/case), and the Lisp dialects. The
generated classes would need sealed/permits keywords; downstream
code would migrate from visitor implementations to switch blocks.
For the trade-off analysis and references to issue #233's
JAVA-SEALED-SWITCH and JAVA-FUNCTIONAL-MATCH recommendations, see the
branch plan in feature_233_edsls-plan.md.
Generated record types currently use explicit fields plus hand-rolled
equals/hashCode/constructors. Java 14+ record declarations would
collapse those into a single line per type, with structural
deconstruction available in switch patterns:
public record Field(Name name, Term term) { }
// And in a consumer:
case Field(var name, var term) -> ...Raising the floor to Java 21 affects every downstream consumer of
generated Hydra code (the bindings/java/* adapters, hydrapop,
demo projects, external integrations that haven't been surveyed).
The benefit is substantial but the cost is a coordinated platform
bump that needs explicit buy-in. Until then, hydra-java stays on
Java 11 and the visitor pattern remains the canonical encoding.