You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Semantic Type Annotations for SQL Schema Generation
1
+
# Protobuf Annotations Used by from-proto
2
2
3
3
## Overview
4
4
5
-
The semantic type annotation system allows you to specify high-level semantic meanings for protobuf fields that get automatically mapped to optimal SQL types for each database dialect. This enables support for specialized types like RisingWave's `rw_int256` while maintaining compatibility across PostgreSQL, RisingWave, and ClickHouse.
5
+
The from-proto path derives your SQL schema directly from protobuf descriptors. It recognizes a set of annotations that control table/column naming, relationships, constraints, dialect-specific table options, and semantic typing of fields.
6
6
7
-
## Quick Start
7
+
When the package includes `sf/substreams/sink/sql/schema/v1/schema.proto`, from-proto honors these annotations (useProtoOption enabled). Without it, from-proto falls back to best‑effort inference (table = message name; simple fields become columns; no explicit constraints unless added by the dialect for system integrity).
8
+
9
+
This document explains all supported annotations, how each dialect uses them, and why they matter.
10
+
11
+
---
12
+
13
+
## Message Options: table
14
+
15
+
Import the annotations definition:
8
16
9
-
1. Import the schema annotations in your protobuf:
- child_of: Optional. Defines a parent/child relation: `"<parent_table> on <parent_pk_field>"`.
42
+
- Postgres: Adds a NOT NULL parent reference column to the child table plus a FK to the parent’s PK. Also every table gets a FK to `_blocks_` on `_block_number_` (ON DELETE CASCADE).
43
+
- RisingWave: Adds the parent reference column (no FK constraints; autocommit system).
44
+
- ClickHouse: Adds the parent reference column (no FK constraints; column used for modeling joins).
45
+
- clickhouse_table_options: Optional, ClickHouse‑only. See “ClickHouse Table Options” below.
46
+
47
+
Defaults when no table option is present:
48
+
- With proto options (schema.proto present): messages without `(table)` are ignored (no table).
49
+
- Without proto options: every message becomes a table named after the message.
50
+
51
+
System columns added to every table:
52
+
-`_block_number_` (all dialects) — tracks the originating block.
53
+
-`_block_timestamp_` (all dialects).
54
+
-`_version_`, `_deleted_` (ClickHouse only) — used by ReplacingMergeTree and retraction modeling.
55
+
56
+
Primary keys when not specified explicitly:
57
+
- Postgres: PK only if specified via column annotation; else no table PK (but FK to `_blocks_`).
58
+
- RisingWave: If no explicit PK is set, a composite PK is created: `(_block_number_, <parent keys...>)` to preserve uniqueness in streaming mode.
59
+
- ClickHouse: PRIMARY KEY/ORDER BY derived from ClickHouse options or defaults (see below).
60
+
61
+
---
62
+
63
+
## Field Options: column
64
+
65
+
Annotate fields that should map to specific columns/constraints:
foreign_key: "accounts on id" // FK to accounts(id) (Postgres only enforces)
31
81
}];
32
82
}
33
83
```
34
84
85
+
Fields:
86
+
- name: Optional. Overrides the SQL column name (default is the proto field name).
87
+
- primary_key: Optional. Marks this column as the table’s primary key.
88
+
- Postgres: Adds a PK constraint.
89
+
- RisingWave: Declares an inline PK.
90
+
- ClickHouse: Used for defaults in ORDER BY/PRIMARY KEY where applicable.
91
+
- unique: Optional. Enforces uniqueness.
92
+
- Postgres: Adds a unique constraint.
93
+
- RisingWave: Emits `UNIQUE` in column definition.
94
+
- ClickHouse: No native unique constraint — ignored (consider indexes).
95
+
- foreign_key: Optional. Declares a FK to another table: `"<table> on <field>"`.
96
+
- Postgres: Adds a FK constraint to the referenced table/field.
97
+
- RisingWave/ClickHouse: Presence is validated for existence, but no constraint is created.
98
+
- semantic_type, format_hint: Optional. See “Semantic Types & Format Hints”. Affects column type selection in each dialect. Value conversion helpers exist but are not applied automatically by from‑proto inserts (see “Runtime Conversion” below).
99
+
100
+
---
101
+
102
+
## ClickHouse Table Options
103
+
104
+
ClickHouse engines need explicit ORDER/PARTITION configuration for good performance and correctness. The `clickhouse_table_options` block lets you control this per table.
105
+
106
+
Fields (repeated lists):
107
+
- order_by_fields: Required for CH. Defines the ORDER BY tuple. Each item supports:
108
+
- name: Column to order by (e.g., `_block_number_`, your PK, other fields).
109
+
- descending: Optional.
110
+
- function: Optional function wrapper (e.g., `toYYYYMM` for dates).
111
+
- partition_fields: Optional additional partition keys. If none is provided, the dialect adds a default month partition on `_block_timestamp_`.
112
+
- replacing_fields: Optional extra fields in `ReplacingMergeTree(version, <replacing_fields...>)` for conflict resolution.
113
+
- index_fields: Optional skip indexes to accelerate predicates:
114
+
- name: Index name.
115
+
- field_name: Column to index.
116
+
- type: One of `minmax`, `set`, `ngrambf_v1`, `tokenbf_v1`, `bloom_filter`.
117
+
- granularity: Index granularity.
118
+
- function: Optional function wrapper.
119
+
120
+
Defaults when options are omitted:
121
+
- Engine: `ReplacingMergeTree(_version_)`.
122
+
- PARTITION BY: `toYYYYMM(_block_timestamp_)`.
123
+
- ORDER BY: if not provided, dialect defaults to PK or `_block_number_`.
124
+
125
+
---
126
+
127
+
## Semantic Types & Format Hints
128
+
129
+
Semantic types give the dialect a clue to select the best storage type for a field (e.g., 256‑bit integers, addresses, hashes). Format hints help interpret the incoming literal representation when conversion is needed (hex vs decimal, etc.).
130
+
131
+
Supported semantic types and their column type mappings:
132
+
133
+
Note: If a semantic type is not supported by a dialect, the dialect falls back to its default mapping for the underlying protobuf type.
134
+
35
135
## Supported Semantic Types
36
136
37
137
### Blockchain/Crypto Types
@@ -63,9 +163,9 @@ message EthereumTransaction {
63
163
|`unix_timestamp_ms`| Unix timestamp (milliseconds) |`TIMESTAMP WITH TIME ZONE`|`TIMESTAMP WITH TIME ZONE`|`DateTime64(3)`|
64
164
|`block_timestamp`| Blockchain timestamp |`TIMESTAMP WITH TIME ZONE`|`TIMESTAMP WITH TIME ZONE`|`DateTime`|
65
165
66
-
## Format Hints
166
+
###Format Hints
67
167
68
-
Format hints provide additional guidance for value conversion:
168
+
Format hints provide additional guidance for value conversion (when conversions are used):
69
169
70
170
| Format Hint | Description | Usage |
71
171
|-------------|-------------|-------|
@@ -74,7 +174,19 @@ Format hints provide additional guidance for value conversion:
74
174
|`base64`| Base64 format | For binary data encoded as base64 |
75
175
|`string`| String format | Default string handling |
76
176
77
-
## Complete Example
177
+
---
178
+
179
+
## Runtime Conversion (advanced)
180
+
181
+
The codebase contains per‑dialect helpers to convert annotated values at insert time (e.g., converting `uint256` hex to a decimal literal for PostgreSQL, or casting to `rw_uint256` in RisingWave). Today, from‑proto uses prepared statements and passes values as they appear in your message — it does not automatically apply semantic conversions. The annotations primarily affect column type selection.
182
+
183
+
Practical guidance:
184
+
- Emit values in the “natural” format for your chosen dialect when possible (e.g., strings for `rw_uint256` or `NUMERIC`).
185
+
- If you require strict conversions, adapt your Substreams output to provide appropriately typed/encoded values. The conversion helpers in `db_proto/sql/*/types.go` show how to transform values if you build a custom inserter.
- Proper type selection improves index performance
392
504
- Reduced type conversion overhead in queries
393
505
394
-
This semantic type system provides a powerful way to leverage database-specific features like RisingWave's `rw_int256` while maintaining broad compatibility across different SQL databases.
506
+
This semantic type system provides a powerful way to leverage database-specific features like RisingWave's `rw_int256` while maintaining broad compatibility across different SQL databases.
0 commit comments