Skip to content

Commit 848e471

Browse files
committed
feat: improve semantic type docs
1 parent c8e618b commit 848e471

1 file changed

Lines changed: 34 additions & 16 deletions

File tree

docs/SEMANTIC_TYPES.md

Lines changed: 34 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -165,14 +165,30 @@ Note: If a semantic type is not supported by a dialect, the dialect falls back t
165165

166166
### Format Hints
167167

168-
Format hints provide additional guidance for value conversion (when conversions are used):
169-
170-
| Format Hint | Description | Usage |
171-
|-------------|-------------|-------|
172-
| `hex` | Hexadecimal format | For `uint256`, `int256` fields containing hex strings |
173-
| `decimal` | Decimal format | For numeric fields containing decimal strings |
174-
| `base64` | Base64 format | For binary data encoded as base64 |
175-
| `string` | String format | Default string handling |
168+
Format hints provide additional guidance for value conversion. Important: in the default from-proto runtime, these hints do not change schema or insert behavior. They are metadata primarily for documentation and future/custom inserters. The built-in conversion helpers (see db_proto/sql/*/types.go) honor `hex`, but are not wired into from-proto’s insert path today.
169+
170+
| Format Hint | Description | Effect in from-proto today |
171+
|-------------|-------------|----------------------------|
172+
| `hex` | Value is hexadecimal (optionally 0x-prefixed) | No effect by default; respected only by conversion helpers if explicitly used |
173+
| `decimal` | Value is base-10 decimal | No effect by default; serves as documentation/intent |
174+
| `base64` | Value is base64-encoded | No effect by default |
175+
| `string` | Treat value as plain string | No effect by default |
176+
177+
format_hint: "decimal" — What it means today
178+
- Schema selection: none. Column type selection is driven by `semantic_type`, not by `format_hint`.
179+
- Insert semantics: none. The runtime passes your field value as-is to the database driver.
180+
- CSV export: none. Values are serialized as they appear in your message.
181+
- Practical use: declare that the field’s textual representation is base‑10 (not hex). This helps readers and future/custom inserters apply the right conversions.
182+
183+
When to use format_hint: "decimal"
184+
- Use it for large integer semantic types (e.g., `uint256`, `int256`) when your Substreams output carries decimal strings and your target dialect stores them as a numeric type (Postgres `NUMERIC(78,0)`, RisingWave `rw_uint256`/`rw_int256`).
185+
- Omit it when storing in ClickHouse: the current mapping stores `uint256`/`int256` as `String`, so the hint has no effect.
186+
- Do not rely on it to convert hex to decimal automatically — conversion is not applied by default. Emit decimal strings from your Substreams if your destination column is numeric.
187+
188+
Safety notes per dialect for `uint256`/`int256`
189+
- PostgreSQL: Mapped to `NUMERIC(78,0)`. Passing a hex string like `0x...` will fail to cast to numeric via prepared statements. Provide decimal strings (use `format_hint: "decimal"` to document intent), or implement a custom inserter that calls the dialect’s `ConvertSemanticValue` helper.
190+
- RisingWave: Mapped to `rw_uint256`/`rw_int256`. RisingWave accepts literals with explicit casts (e.g., `'123...'::rw_uint256`). from-proto does not add casts; ensure your values are accepted as-is by the server (decimal strings are the safest choice), or implement custom conversion.
191+
- ClickHouse: Currently mapped to `String` for maximum compatibility. `format_hint` has no effect; you can store either decimal or hex and cast at query time when needed. Native Decimal256 cannot represent the full uint256 range (max ~76 digits), so automatic Decimal mapping is intentionally avoided.
176192

177193
---
178194

@@ -205,15 +221,17 @@ message EthereumTransaction {
205221
semantic_type: "hash"
206222
}];
207223
208-
// Large integers - uses rw_int256 in RisingWave
224+
// Large integers (emit decimal strings). These map to
225+
// NUMERIC(78,0) in Postgres and rw_uint256 in RisingWave.
209226
string value = 3 [(sf.substreams.sink.sql.schema.v1.field) = {
210227
semantic_type: "uint256",
211228
format_hint: "decimal"
212229
}];
213230
231+
// Gas price: also decimal in default from-proto (no automatic hex->decimal conversion)
214232
string gas_price = 4 [(sf.substreams.sink.sql.schema.v1.field) = {
215233
semantic_type: "uint256",
216-
format_hint: "hex"
234+
format_hint: "decimal"
217235
}];
218236
219237
// Addresses - validated format
@@ -259,18 +277,18 @@ CREATE TABLE eth_transactions (
259277
gas_price rw_uint256, -- uint256 → rw_uint256
260278
from_address CHARACTER VARYING, -- address semantic type
261279
to_address CHARACTER VARYING, -- address semantic type
262-
token_amount rw_uint256, -- uint256 semantic type
280+
token_amount NUMERIC(78,0), -- uint256 semantic type
263281
block_timestamp TIMESTAMP WITH TIME ZONE, -- unix_timestamp
264282
metadata JSONB, -- json semantic type
265283
trace_id CHARACTER VARYING -- uuid semantic type
266284
);
267285

268-
-- Sample insert with rw_int256 casting
286+
-- Example insert if you craft SQL yourself (from-proto does not add casts)
269287
INSERT INTO eth_transactions VALUES (
270288
'0x1234...abcd',
271289
'0x5678...efab',
272290
'115792089237316195423570985008687907853269984665640564039457584007913129639935'::rw_uint256,
273-
'0x1bc16d674ec80000'::rw_uint256,
291+
'20000000000'::rw_uint256,
274292
'0x742d35cc6636C0532925a3b8D0A3e5A5F2d5De8e',
275293
'0x8ba1f109551bD432803012645Hac136c5ae5c9e6',
276294
'1000123456789012345678'::rw_uint256,
@@ -289,7 +307,7 @@ CREATE TABLE eth_transactions (
289307
gas_price NUMERIC(78,0), -- uint256 → NUMERIC fallback
290308
from_address CHAR(42), -- address semantic type
291309
to_address CHAR(42), -- address semantic type
292-
token_amount rw_uint256, -- uint256 semantic type
310+
token_amount NUMERIC(78,0), -- uint256 semantic type
293311
block_timestamp TIMESTAMP WITH TIME ZONE, -- unix_timestamp
294312
metadata JSONB, -- json semantic type
295313
trace_id UUID -- uuid → PostgreSQL UUID type
@@ -301,8 +319,8 @@ CREATE TABLE eth_transactions (
301319
CREATE TABLE eth_transactions (
302320
tx_hash FixedString(66), -- hash semantic type
303321
block_hash FixedString(66), -- hash semantic type
304-
value String, -- uint256 String (no native UInt256)
305-
gas_price String, -- uint256 String (no native UInt256)
322+
value String, -- uint256 stored as String (compatibility)
323+
gas_price String, -- uint256 stored as String (compatibility)
306324
from_address FixedString(42), -- address semantic type
307325
to_address FixedString(42), -- address semantic type
308326
token_amount String, -- uint256 semantic type

0 commit comments

Comments
 (0)