Hoptimator's vocabulary is mostly familiar from SQL and Kubernetes, with a few additions specific to multi-hop pipelines. Skim this page once and the rest of the documentation will read more naturally.
Hoptimator's parser, planner, and JDBC layer are built on Apache Calcite. Calcite's reference is the source of truth for SELECT syntax and built-in functions; this page covers what Hoptimator adds on top.
| Concept | What it is |
|---|---|
| Database | A connection to an external system that exposes tables (Kafka, Venice, MySQL, etc.). |
| Catalog | The unified namespace that lets a single SQL statement reference tables across many databases. |
| View | A named SQL query, evaluated lazily. |
| Materialized view | A view backed by a running data pipeline that continuously writes results to a sink. |
| Pipeline | The set of sources, sink, and job that together implement a materialized view. |
| Engine | A runtime Hoptimator can submit queries to (e.g. a Flink SQL gateway). Optional. Pipeline materialization does not require one. |
| Connector | Configuration that tells a runtime how to read from or write to a database. Used by the planner and embedded in template output. |
| Deployer | The component that turns a planned pipeline element into real infrastructure. |
| Validator | Pre-deploy check that rejects SQL, CRDs, or planned pipelines that violate environment policy. |
| TableTemplate | Declarative recipe for materializing a source/sink in a particular database. |
| JobTemplate | Declarative recipe for materializing a job on a particular engine. |
| TableTrigger | Fires a Kubernetes job when an upstream table changes (or on a schedule). |
| LogicalTable | An abstraction model: one named entity that physically lives in many backends (nearline / online / offline). Auto-syncs and auto-backfills its tiers. |
| Subscription | YAML-native way to declare a materialized view; equivalent to CREATE MATERIALIZED VIEW ... AS. |
| Hint | Key/value passed at runtime that templates and connectors can pick up. |
A Database in Hoptimator is a Kubernetes resource that registers an external system with the catalog. Each Database supplies a JDBC URL and a schema name:
apiVersion: hoptimator.linkedin.com/v1alpha1
kind: Database
metadata:
name: ads-database
spec:
schema: ADS
url: jdbc:demodb://names=ads
dialect: CalciteOnce applied, the tables behind that URL show up in the catalog under the given
schema. SQL written against the catalog can join freely across schemas — for
example a query selecting from ADS.PAGE_VIEWS and PROFILE.MEMBERS can mix
data from two completely different storage systems.
The catalog is not exhaustive — it shows only the Databases registered in your Kubernetes namespace. This is by design: it bounds what a given app can see and act on.
Both are defined with SQL. The difference is what gets deployed.
| Property | View (CREATE VIEW) |
Materialized View (CREATE MATERIALIZED VIEW) |
|---|---|---|
| Stored as | A View Kubernetes resource |
A View resource plus a Pipeline resource |
| Evaluated when | Each time it is queried | Continuously, by a running job |
| Produces side effects | No | Yes — provisions topics, jobs, sinks |
| Use it for | Re-using SQL fragments, abstracting joins | Continuously delivering data from sources to a sink |
A materialized view is the headline abstraction: write the SQL once, and Hoptimator figures out the topology, the connectors, the templates, and the job that has to run to keep the sink up to date.
A Pipeline is the machinery behind a materialized view. It always has three parts:
- Sources — the input tables (one or many), each with the connector config needed to read them.
- Sink — the output table the view writes to, again with connector config.
- Job — the executable that ties them together (today, a Flink SQL job).
Pipelines are first-class Kubernetes objects:
$ kubectl get pipelines
NAME SQL STATUS
my-foo INSERT INTO ... SELECT ... Ready.
Pipelines are also visible in SQL via the built-in k8s schema, which lets
clients (and the MCP server) query their own deployment state.
A Connector is the configuration that lets a runtime read from or write to a particular database — for example, a Flink Kafka connector with the right bootstrap servers and topic name. Connectors are produced by the catalog adapter for each database, embedded in the YAML that TableTemplates and JobTemplates emit, and can be customized via hints.
Connectors do not require an Engine to function. The typical flow is:
Hoptimator generates a FlinkSessionJob (or similar) with the connector
config baked in, and an existing operator — like the Apache Flink Kubernetes
Operator — picks it up and runs the job. Hoptimator is not in the data path.
An Engine CRD registers a runtime Hoptimator can submit queries to —
typically a Flink SQL gateway behind a JDBC URL. This is the path used when
Hoptimator needs to execute SQL itself, e.g. for interactive SELECT
against tables that aren't in-process.
Engines are unrelated to pipeline materialization. You do not need to
register an Engine for CREATE MATERIALIZED VIEW to work — the resulting
pipeline runs on whatever runtime the JobTemplate produces (a Flink session
job, a Beam job, a custom operator, etc.). The Engine surface is mainly used
by interactive query paths and is partially developed today.
A Deployer turns a planned pipeline element into real infrastructure.
Deployers are the actual extension point that decides where pipelines land —
Kubernetes is the default, not a hard requirement. The bundled deployers
in hoptimator-k8s target Kubernetes:
- The source/sink deployer materializes table templates as Kubernetes
resources (e.g. a
KafkaTopicfor a Strimzi cluster). - The job deployer materializes job templates (e.g. a
FlinkSessionJobfor the Apache Flink Kubernetes operator).
Anything implementing Deployer from hoptimator-api can take their place.
See Extending Hoptimator when those docs land.
A Validator inspects a SQL statement, a CRD, or a planned pipeline
element before it deploys and rejects it if it doesn't meet your
constraints. Where Deployer is "make this real," Validator is "check
this is allowed."
Validators run at three points in the DDL path: on the parsed SQL, on the resolved view/source/sink after planning, and on the deployer collection before any side effects. If any validator emits an error, the whole operation aborts.
The bundled validators handle table-naming and SQL/Avro compatibility checks. Custom validators are typically used for environment-specific policy — naming conventions, ACL enforcement, schema-evolution rules. See Validators for authoring.
Adding a Database to the catalog tells Hoptimator how to read from a system. Templates tell Hoptimator how to deploy to it.
- A TableTemplate describes the YAML to apply when a source or sink in a matching database becomes part of a pipeline. It also supplies the connector config the engine should use.
- A JobTemplate describes the YAML to apply when a job is needed for the
pipeline. The template can embed the auto-generated SQL via
{{flinksql}}.
Templates use {{ }} placeholders. Some placeholders are filled in by the
deployer ({{name}}, {{table}}, {{flinksql}}); others come from
ConfigProviders or hints.
apiVersion: hoptimator.linkedin.com/v1alpha1
kind: TableTemplate
metadata:
name: kafka-template
spec:
databases:
- kafka-database
yaml: |
apiVersion: kafka.strimzi.io/v1
kind: KafkaTopic
metadata:
name: {{name}}
spec:
topicName: {{table}}
partitions: {{kafka.partitions:1}}
...
connector: |
connector = kafka
topic = {{table}}
properties.bootstrap.servers = ...You can usually adapt Hoptimator to a new system by writing templates rather than code.
A TableTrigger runs a Kubernetes job when an upstream table changes — or on a cron schedule. The job spec is arbitrary YAML, so triggers are where Hoptimator carries the imperative side effects (backfills, rETL refreshes, downstream notifications, operational hooks) that don't belong inside the pipeline itself.
The pattern: pipelines stay pure data-flow expressions, triggers carry the imperative side effects, and the two compose at the table level. For operational guidance — cron vs. status-driven firing, common patterns, the pause/resume lifecycle — see Triggers.
A LogicalTable is an abstraction model over physical stores: one named dataset that simultaneously lives in several backends. The same audience table might exist as a Kafka topic for streaming consumers, a Venice store for online lookups, and an HDFS dataset for batch analytics — typically named differently in each place and tied together by hand-rolled sync jobs. A LogicalTable replaces that arrangement with a single declaration that binds each backend to a named tier role.
You declare the entity once. Hoptimator handles the rest — the physical tier resources, the inter-tier pipelines that keep them in sync, the backfill triggers, the schema reconciliation. Your SQL refers to one name; the abstraction handles the fan-out.
Hoptimator recognizes three tier roles:
| Tier | Typical backend | Used for |
|---|---|---|
| nearline | Kafka, Pulsar, Brooklin | Streaming reads/writes; the "source of truth" for schema. |
| online | Venice, Redis, Pinot | Low-latency point lookups for serving (sub-millisecond, key-based). |
| offline | HDFS, Iceberg, S3 | Batch analytics, training data, historical queries. |
A LogicalTable can bind any subset of these to physical Databases:
apiVersion: hoptimator.linkedin.com/v1alpha1
kind: Database
metadata:
name: logical
spec:
url: jdbc:logical://nearline=kafka-database;online=venice;offline=hdfs-database
schema: LOGICAL
dialect: CalciteEvery table that shows up under the LOGICAL schema with this configuration
inherits the three-tier topology automatically.
Declaring a LogicalTable with two or more tiers is not just a schema
alias. The LogicalTableDeployer runs at deploy time and produces real
infrastructure for each binding:
- Physical tier resources. Each tier's backing Database goes through the normal Deployer SPI to create whatever the storage system needs (a Kafka topic, a Venice store, an HDFS dataset).
- Implicit inter-tier pipelines. Hoptimator auto-deploys
nearline → onlineandnearline → offlinePipeline CRDs to keep the tiers consistent. You don't write the Kafka-to-Venice job; it appears because the LogicalTable says it should. - Auto-backfill triggers. When an offline tier is present, a
TableTriggeris created so the offline mirror can be backfilled and refreshed without an external orchestrator. (See TableTriggers for what this enables.) - One schema, one source of truth. The row type is resolved from the nearline tier (or the first available one) and reused everywhere. You declare columns once, in the place where they're naturally streamed.
The pattern this most directly enables is the Lambda / Kappa architecture feature stores are built on: a streaming path through nearline, a low-latency serving path through online, and a batch path through offline — all sharing one name. Hoptimator turns the sync mechanisms between them into declared infrastructure rather than hand-rolled jobs.
Mechanically, LogicalTables are exposed via a JDBC driver
(jdbc:logical://…), but most of the value is realized by the deployer at
create time, not by the driver at query time.
A Subscription is a YAML-native way to declare a materialized view. It is
equivalent to running CREATE MATERIALIZED VIEW ... AS ... against the JDBC
driver — useful when you want pipelines to live in the same Git workflow as
the rest of your Kubernetes manifests.
apiVersion: hoptimator.linkedin.com/v1alpha1
kind: Subscription
metadata:
name: my-feature
spec:
database: VENICE
sql: |
SELECT m.id AS KEY, m.first_name FROM PROFILE.MEMBERS mValues reach templates through two paths:
ConfigProviderfor static, namespace-wide defaults — the bundled one reads them from ahoptimator-configmapConfigMap.- Hints for per-pipeline overrides, set as JDBC properties or as fields on a Subscription.
For the full mechanics, see Hints and Configuration.
Hoptimator ships with adapters for Kafka, Venice, MySQL, the
logical-table tier model, and a demodb source for local development.
Pipeline jobs target Apache Flink by default through the bundled
JobTemplates. The planner, the catalog, and the deployer model are all
designed so new sources, sinks, and runtimes can be added without changing
core code. See the connector pages (when available) for a per-system
breakdown.
- New to Hoptimator? Run through the Quickstart.
- Curious how it all fits together? See the Architecture overview.
- Want to read the original design ideas? The engineering blog posts are a good companion.