|
| 1 | +# sql:: Attribute Vocabulary (SQL Backend) |
| 2 | + |
| 3 | +**Date:** 2026-05-24 |
| 4 | +**Authors:** Steven Varga |
| 5 | +**Status:** Implemented on `vargalabs/h5cpp-compiler#33-sql-backend` |
| 6 | +**Repos:** `vargalabs/h5cpp-compiler` (SQL DDL emitter, issue #33) |
| 7 | +**Scope:** SQL backend C++ attribute vocabulary and type-mapping taxonomy. Defines the `[[sql::...]]` attributes the compiler will recognize (once attribute wiring is extended), the DDL emission semantics, and the three supported dialects. |
| 8 | +**Companions:** |
| 9 | +- `tasks/h5cpp-compiler-h5-attribute-taxonomy.md` — HDF5 attribute vocabulary |
| 10 | +- `tasks/h5cpp-compiler-backend-cookbook.md` — transferable recipe for adding any new backend |
| 11 | + |
| 12 | +--- |
| 13 | + |
| 14 | +## TL;DR |
| 15 | + |
| 16 | +The SQL backend emits `CREATE TABLE` DDL statements from annotated C++ structs. Three dialects are supported: **PostgreSQL**, **MySQL**, and **SQLite3**. Each backend gets a boolean flag (`--sql-postgres`, `--sql-mysql`, `--sql-lite3`) and uses the unified `-o <file>` output path. |
| 17 | + |
| 18 | +| Surface today (C++17 standard-attribute) | C++26 reflection form | |
| 19 | +|---|---| |
| 20 | +| `[[sql::table("users")]]` | `[[=sql::table{"users"}]]` | |
| 21 | +| `[[sql::column("user_id")]]` | `[[=sql::column{"user_id"}]]` | |
| 22 | +| `[[sql::primary_key]]` | `[[=sql::primary_key{}]]` | |
| 23 | +| `[[sql::unique]]` | `[[=sql::unique{}]]` | |
| 24 | +| `[[sql::not_null]]` | `[[=sql::not_null{}]]` | |
| 25 | +| `[[sql::default_(42)]]` | `[[=sql::default_{42}]]` | |
| 26 | +| `[[sql::foreign_key("other_table.col")]]` | `[[=sql::foreign_key{"other_table.col"}]]` | |
| 27 | +| `[[sql::check("value > 0")]]` | `[[=sql::check{"value > 0"}]]` | |
| 28 | +| `[[sql::index]]` | `[[=sql::index{}]]` | |
| 29 | +| `[[sql::type_override("VARCHAR(255)")]]` | `[[=sql::type_override{"VARCHAR(255)"}]]` | |
| 30 | +| `[[sql::nested("jsonb")]]` | `[[=sql::nested{"jsonb"}]]` | |
| 31 | + |
| 32 | +Only syntactic shift is `(args)` → `{args}` under the `[[=...]]` form. **Names stay put.** |
| 33 | + |
| 34 | +--- |
| 35 | + |
| 36 | +## 1. Dialect flags and CLI usage |
| 37 | + |
| 38 | +```bash |
| 39 | +h5cpp --sql-postgres -o schema.sql my_structs.cpp -- -std=c++17 |
| 40 | +h5cpp --sql-mysql -o schema.sql my_structs.cpp -- -std=c++17 |
| 41 | +h5cpp --sql-lite3 -o schema.sql my_structs.cpp -- -std=c++17 |
| 42 | +``` |
| 43 | + |
| 44 | +The SQL backend is integrated into the unified `--format` dispatcher (issue #30). Each dialect is a distinct enum value in `OutputFormat` and gets its own CLI flag. |
| 45 | + |
| 46 | +--- |
| 47 | + |
| 48 | +## 2. Universal vocabulary — same words, `sql::` namespace |
| 49 | + |
| 50 | +These attributes use vocabulary identical to `h5::*` where the concept overlaps (rename, ignore, doc, alias). They live in `sql::` so the namespace stays self-contained for SQL-only users. |
| 51 | + |
| 52 | +### Universal Tier 1 — must-have |
| 53 | + |
| 54 | +| Attribute | Purpose | Example | |
| 55 | +|---|---|---| |
| 56 | +| `[[sql::name("col_name")]]` | Rename a field for the SQL column. Defaults to the C++ field name. | `[[sql::name("created_at")]] std::chrono::time_point_t ts;` | |
| 57 | +| `[[sql::ignore]]` | Skip this field entirely. Column absent from the emitted table. | `[[sql::ignore]] int debug_counter;` | |
| 58 | + |
| 59 | +### Universal Tier 2 — high value, low cost |
| 60 | + |
| 61 | +| Attribute | Purpose | Example | |
| 62 | +|---|---|---| |
| 63 | +| `[[sql::doc("description")]]` | Emitted as a SQL comment on the column or table. | `[[sql::doc("nanoseconds since epoch")]] std::uint64_t ts;` | |
| 64 | +| `[[sql::alias("Name")]]` | **Class-level.** Overrides the table name. Defaults to sanitized C++ qualified name (`::` → `_`). | `struct [[sql::alias("Users")]] user_t { ... };` | |
| 65 | + |
| 66 | +--- |
| 67 | + |
| 68 | +## 3. SQL-specific vocabulary |
| 69 | + |
| 70 | +### Tier 1 — must-have |
| 71 | + |
| 72 | +| Attribute | Purpose | Example | |
| 73 | +|---|---|---| |
| 74 | +| `[[sql::table("name")]]` | **Class-level.** Overrides the generated table name. | `struct [[sql::table("app_users")]] user_t { ... };` | |
| 75 | +| `[[sql::column("name")]]` | **Field-level.** Overrides the generated column name. | `[[sql::column("user_id")]] int id;` | |
| 76 | +| `[[sql::primary_key]]` | Emits `PRIMARY KEY` constraint. | `[[sql::primary_key]] int id;` | |
| 77 | +| `[[sql::not_null]]` | Emits `NOT NULL` constraint. **Default for all fields** (C++ POD fields are always present). | `[[sql::not_null]] int id;` | |
| 78 | +| `[[sql::unique]]` | Emits `UNIQUE` constraint. | `[[sql::unique]] std::string email;` | |
| 79 | + |
| 80 | +### Tier 2 — high value |
| 81 | + |
| 82 | +| Attribute | Purpose | Example | |
| 83 | +|---|---|---| |
| 84 | +| `[[sql::default_(value)]]` | Emits `DEFAULT value` clause. Dialect-aware quoting (strings quoted, numbers bare). | `[[sql::default_("active")]] std::string status;` | |
| 85 | +| `[[sql::foreign_key("table.col")]]` | Emits `REFERENCES table(col)` clause. | `[[sql::foreign_key("orders.id")]] int order_id;` | |
| 86 | +| `[[sql::check("expr")]]` | Emits `CHECK (expr)` clause. | `[[sql::check("value > 0")]] int value;` | |
| 87 | +| `[[sql::index]]` | Emits a `CREATE INDEX` statement after the table. | `[[sql::index]] std::string email;` | |
| 88 | + |
| 89 | +### Tier 3 — type control |
| 90 | + |
| 91 | +| Attribute | Purpose | Example | |
| 92 | +|---|---|---| |
| 93 | +| `[[sql::type_override("TYPE")]]` | Overrides the dialect-specific type mapping for this field. | `[[sql::type_override("VARCHAR(255)")]] std::string name;` | |
| 94 | +| `[[sql::nested("jsonb")]]` | Controls how nested structs are represented. Values: `"jsonb"` (PostgreSQL), `"json"` (MySQL), `"text"` (SQLite3), or `"table"` (create a separate table with FK). | `[[sql::nested("jsonb")]] address_t addr;` | |
| 95 | + |
| 96 | +--- |
| 97 | + |
| 98 | +## 4. Type map — C++ → SQL per dialect |
| 99 | + |
| 100 | +### Primitives |
| 101 | + |
| 102 | +| C++ type | PostgreSQL | MySQL | SQLite3 | |
| 103 | +|---|---|---|---| |
| 104 | +| `bool` / `_Bool` | `BOOLEAN` | `TINYINT(1)` | `INTEGER` | |
| 105 | +| `char` | `SMALLINT` | `TINYINT` | `INTEGER` | |
| 106 | +| `unsigned char` | `SMALLINT` | `TINYINT UNSIGNED` | `INTEGER` | |
| 107 | +| `short` | `SMALLINT` | `SMALLINT` | `INTEGER` | |
| 108 | +| `unsigned short` | `SMALLINT` | `SMALLINT UNSIGNED` | `INTEGER` | |
| 109 | +| `int` | `INTEGER` | `INT` | `INTEGER` | |
| 110 | +| `unsigned int` | `INTEGER` | `INT UNSIGNED` | `INTEGER` | |
| 111 | +| `long` | `BIGINT` | `BIGINT` | `INTEGER` | |
| 112 | +| `unsigned long` | `BIGINT` | `BIGINT UNSIGNED` | `INTEGER` | |
| 113 | +| `long long` | `BIGINT` | `BIGINT` | `INTEGER` | |
| 114 | +| `unsigned long long` | `BIGINT` | `BIGINT UNSIGNED` | `INTEGER` | |
| 115 | +| `float` | `REAL` | `FLOAT` | `REAL` | |
| 116 | +| `double` | `DOUBLE PRECISION` | `DOUBLE` | `REAL` | |
| 117 | +| `long double` | `DOUBLE PRECISION` | `DOUBLE` | `REAL` | |
| 118 | +| `enum` / `enum class` | `INTEGER` | `INT` | `INTEGER` | |
| 119 | + |
| 120 | +### Complex types |
| 121 | + |
| 122 | +| C++ type | PostgreSQL | MySQL | SQLite3 | Notes | |
| 123 | +|---|---|---|---|---| |
| 124 | +| Nested struct `S` | `JSONB` | `JSON` | `TEXT` | One column holding the nested struct as JSON. | |
| 125 | +| `T[N]` (C array) | `T[]` | `JSON` | `TEXT` | PostgreSQL native arrays; MySQL/SQLite fall back to JSON/TEXT. | |
| 126 | +| `T[M][N]` (multi-dim array) | `T[][]` | `JSON` | `TEXT` | PostgreSQL supports `[][]` syntax; MySQL/SQLite fall back. | |
| 127 | +| `std::string` | `TEXT` | `TEXT` | `TEXT` | No length limit assumed; override with `sql::type_override`. | |
| 128 | +| `std::vector<T>` | `JSONB` | `JSON` | `TEXT` | No native variable-length array in standard SQL. | |
| 129 | +| `std::map<K,V>` | `JSONB` | `JSON` | `TEXT` | No native map type; JSON fallback. | |
| 130 | +| `std::optional<T>` | `T` (nullable) | `T` (nullable) | `T` (nullable) | Emitted without `NOT NULL`; runtime nulls allowed. | |
| 131 | +| `std::chrono::time_point` | `TIMESTAMPTZ` | `DATETIME(6)` | `TEXT` | Auto-detected; can override with `sql::type_override`. | |
| 132 | + |
| 133 | +--- |
| 134 | + |
| 135 | +## 5. DDL output format |
| 136 | + |
| 137 | +### Single struct |
| 138 | + |
| 139 | +Input: |
| 140 | +```cpp |
| 141 | +namespace sn { |
| 142 | +struct Record { |
| 143 | + int id; |
| 144 | + double value; |
| 145 | +}; |
| 146 | +} |
| 147 | +``` |
| 148 | +
|
| 149 | +PostgreSQL output: |
| 150 | +```sql |
| 151 | +-- Generated by h5cpp-compiler SQL backend |
| 152 | +-- Dialect: postgresql |
| 153 | +
|
| 154 | +CREATE TABLE IF NOT EXISTS "sn__Record" ( |
| 155 | + "id" INTEGER NOT NULL, |
| 156 | + "value" DOUBLE PRECISION NOT NULL |
| 157 | +); |
| 158 | +``` |
| 159 | + |
| 160 | +MySQL output: |
| 161 | +```sql |
| 162 | +-- Generated by h5cpp-compiler SQL backend |
| 163 | +-- Dialect: mysql |
| 164 | + |
| 165 | +CREATE TABLE IF NOT EXISTS `sn__Record` ( |
| 166 | + `id` INT NOT NULL, |
| 167 | + `value` DOUBLE NOT NULL |
| 168 | +); |
| 169 | +``` |
| 170 | + |
| 171 | +SQLite3 output: |
| 172 | +```sql |
| 173 | +-- Generated by h5cpp-compiler SQL backend |
| 174 | +-- Dialect: sqlite3 |
| 175 | + |
| 176 | +CREATE TABLE IF NOT EXISTS "sn__Record" ( |
| 177 | + "id" INTEGER NOT NULL, |
| 178 | + "value" REAL NOT NULL |
| 179 | +); |
| 180 | +``` |
| 181 | + |
| 182 | +### Nested structs |
| 183 | + |
| 184 | +Input: |
| 185 | +```cpp |
| 186 | +namespace sn { |
| 187 | +struct Inner { int a; double b; }; |
| 188 | +struct Outer { |
| 189 | + int idx; |
| 190 | + Inner inner_singleton; |
| 191 | + Inner inner_array[4]; |
| 192 | +}; |
| 193 | +} |
| 194 | +``` |
| 195 | +
|
| 196 | +PostgreSQL output: |
| 197 | +```sql |
| 198 | +CREATE TABLE IF NOT EXISTS "sn__Inner" ( |
| 199 | + "a" INTEGER NOT NULL, |
| 200 | + "b" DOUBLE PRECISION NOT NULL |
| 201 | +); |
| 202 | +
|
| 203 | +CREATE TABLE IF NOT EXISTS "sn__Outer" ( |
| 204 | + "idx" INTEGER NOT NULL, |
| 205 | + "inner_singleton" JSONB NOT NULL, |
| 206 | + "inner_array" JSONB[] NOT NULL |
| 207 | +); |
| 208 | +``` |
| 209 | + |
| 210 | +**Key behavior:** Nested structs produce a separate `CREATE TABLE` for the inner struct, AND the outer struct references it as `JSONB` (or `JSON` / `TEXT` depending on dialect). This is a pragmatic compromise: the inner table exists for introspection and potential FK usage, but the outer table stores the nested data as JSON for query simplicity. |
| 211 | + |
| 212 | +### Multidimensional arrays |
| 213 | + |
| 214 | +Input: |
| 215 | +```cpp |
| 216 | +namespace sn { |
| 217 | +struct Cell { float x; float y; }; |
| 218 | +struct Grid { int idx; Cell field_05[3][8]; }; |
| 219 | +} |
| 220 | +``` |
| 221 | +
|
| 222 | +PostgreSQL output: |
| 223 | +```sql |
| 224 | +CREATE TABLE IF NOT EXISTS "sn__Cell" ( |
| 225 | + "x" REAL NOT NULL, |
| 226 | + "y" REAL NOT NULL |
| 227 | +); |
| 228 | +
|
| 229 | +CREATE TABLE IF NOT EXISTS "sn__Grid" ( |
| 230 | + "idx" INTEGER NOT NULL, |
| 231 | + "field_05" JSONB[][] NOT NULL |
| 232 | +); |
| 233 | +``` |
| 234 | + |
| 235 | +--- |
| 236 | + |
| 237 | +## 6. Identifier rules per dialect |
| 238 | + |
| 239 | +| Dialect | Table/column quoting | `::` → `_` | Case sensitivity | |
| 240 | +|---|---|---|---| |
| 241 | +| PostgreSQL | `"identifier"` | Yes | Preserved (quoted) | |
| 242 | +| MySQL | `` `identifier` `` | Yes | Preserved (quoted, but depends on OS/file system) | |
| 243 | +| SQLite3 | `"identifier"` | Yes | Preserved (quoted) | |
| 244 | + |
| 245 | +The `sanitize_name()` function replaces all `::` with `_` so C++ qualified names like `sn::typecheck::Record` become `sn__typecheck__Record`. |
| 246 | + |
| 247 | +--- |
| 248 | + |
| 249 | +## 7. Test coverage |
| 250 | + |
| 251 | +All 11 existing fixtures are reused for SQL testing, yielding 33 SQL-specific tests (11 fixtures × 3 dialects). The HDF5 tests (11 tests) remain unchanged. |
| 252 | + |
| 253 | +| Fixture | What it exercises | PostgreSQL | MySQL | SQLite3 | |
| 254 | +|---|---|---|---|---| |
| 255 | +| `primitives` | All primitive type mappings | ✓ | ✓ | ✓ | |
| 256 | +| `typedef` | Typedef resolution | ✓ | ✓ | ✓ | |
| 257 | +| `nested-ns` | Namespace nesting → table naming | ✓ | ✓ | ✓ | |
| 258 | +| `embedded-pod` | Nested structs + 1D arrays | ✓ | ✓ | ✓ | |
| 259 | +| `array-of-arrays` | Multi-dim arrays of structs | ✓ | ✓ | ✓ | |
| 260 | +| `ignored-non-pod` | Non-POD skipped | ✓ | ✓ | ✓ | |
| 261 | +| `unreferenced` | Unreferenced types skipped | ✓ | ✓ | ✓ | |
| 262 | +| `topological` | Multiple related structs | ✓ | ✓ | ✓ | |
| 263 | +| `enum-member` | Enum → INTEGER | ✓ | ✓ | ✓ | |
| 264 | +| `multi-array` | Multi-dim primitive arrays | ✓ | ✓ | ✓ | |
| 265 | +| `inheritance` | Inheritance skipped | ✓ | ✓ | ✓ | |
| 266 | + |
| 267 | +--- |
| 268 | + |
| 269 | +## 8. Attribute wiring status |
| 270 | + |
| 271 | +### Not yet wired (staging branch #30 pattern) |
| 272 | + |
| 273 | +The `sql::` attribute namespace is **defined in this taxonomy** but not yet implemented in the attribute rewriter (`h5_attr_translator.hpp`) or consumed in the SQL producer. The staging branch uses the producer/consumer pattern; attribute integration would follow the same `h5_attr_reader` + `clang::annotate` rewrite path used for `h5::` attributes (issue #32). |
| 274 | + |
| 275 | +| Attribute | Where read (planned) | Where emitted (planned) | |
| 276 | +|---|---|---| |
| 277 | +| `sql::table` | `h5_attr_reader::read_class_string(node, "sql::table")` | Overrides table name in `record_decl_impl` | |
| 278 | +| `sql::column` | `h5_attr_reader::read_field_string(fld, "sql::column")` | Overrides column name in `type_insert_impl` | |
| 279 | +| `sql::primary_key` | `h5_attr_reader::has_attr(fld, "sql::primary_key")` | Appends `PRIMARY KEY` to column definition | |
| 280 | +| `sql::unique` | `h5_attr_reader::has_attr(fld, "sql::unique")` | Appends `UNIQUE` to column definition | |
| 281 | +| `sql::not_null` | `h5_attr_reader::has_attr(fld, "sql::not_null")` | Emits `NOT NULL` (default anyway) | |
| 282 | +| `sql::default_` | `h5_attr_reader::read_field_string(fld, "sql::default_")` | Appends `DEFAULT value` | |
| 283 | +| `sql::foreign_key` | `h5_attr_reader::read_field_string(fld, "sql::foreign_key")` | Appends `REFERENCES ...` | |
| 284 | +| `sql::check` | `h5_attr_reader::read_field_string(fld, "sql::check")` | Emits `CHECK (expr)` on column or table | |
| 285 | +| `sql::index` | `h5_attr_reader::has_attr(fld, "sql::index")` | Emits `CREATE INDEX` after table DDL | |
| 286 | +| `sql::type_override` | `h5_attr_reader::read_field_string(fld, "sql::type_override")` | Replaces mapped type in `type_insert_impl` | |
| 287 | +| `sql::nested` | `h5_attr_reader::read_field_string(fld, "sql::nested")` | Controls JSONB/JSON/TEXT vs separate table | |
| 288 | + |
| 289 | +--- |
| 290 | + |
| 291 | +## 9. Design decisions |
| 292 | + |
| 293 | +### Decided |
| 294 | + |
| 295 | +| # | Decision | Rationale | |
| 296 | +|---|---|---| |
| 297 | +| 1 | **Three separate producer types** (`SqlProducer<postgres>`, `SqlProducer<mysql>`, `SqlProducer<sqlite3>`) via CRTP template | Clean separation of type maps; zero runtime overhead; follows existing `H5Producer` pattern. | |
| 298 | +| 2 | **Nested structs → JSON/JSONB/TEXT** | SQL has no native nested struct type. Flattening would require changing the consumer; JSON is the pragmatic SQL-native approach. | |
| 299 | +| 3 | **Arrays → `T[]` (PostgreSQL) or JSON/TEXT (MySQL/SQLite3)** | PostgreSQL has native arrays; MySQL and SQLite3 do not. Consistent fallback to JSON/TEXT for the latter two. | |
| 300 | +| 4 | **All fields `NOT NULL` by default** | C++ POD fields are always present. `std::optional<T>` (when supported) would omit `NOT NULL`. | |
| 301 | +| 5 | **Separate `CREATE TABLE` per struct** | Topological ordering from the consumer naturally yields one table per struct. Each struct is self-contained DDL. | |
| 302 | +| 6 | **Sanitize `::` → `_`** | `::` is not valid in unquoted SQL identifiers. Quoted identifiers could preserve it, but `_` is cleaner and more portable. | |
| 303 | +| 7 | **No `DROP TABLE IF EXISTS`** | The emitted DDL is idempotent (`CREATE TABLE IF NOT EXISTS`) and non-destructive. | |
| 304 | + |
| 305 | +### Open |
| 306 | + |
| 307 | +| # | Question | Context | |
| 308 | +|---|---|---| |
| 309 | +| 1 | **Attribute wiring.** | When should `sql::` attributes be added to `h5_attr_translator.hpp`? After issue #32 (`h5::` attributes) is merged to staging, or in parallel? | |
| 310 | +| 2 | **`sql::nested("table")` semantics.** | If a user requests separate-table nesting, should the compiler emit a foreign key column (`INTEGER REFERENCES inner(id)`) or a join table? | |
| 311 | +| 3 | **`std::optional<T>` support.** | The staging branch consumer does not yet traverse `std::optional`. Once #32 adds it, SQL should emit nullable columns. | |
| 312 | +| 4 | **Index emission.** | Should `sql::index` emit `CREATE INDEX` inline after the table, or collect all indexes and emit them at the end of the file? | |
| 313 | +| 5 | **Multi-table FKs.** | For `topological` fixtures with multiple related structs, should the compiler auto-emit foreign keys between tables? | |
| 314 | +| 6 | **Type override dialect scoping.** | Should `sql::type_override("VARCHAR(255)")` apply to all dialects, or should there be `sql::type_override_postgres`, `sql::type_override_mysql`, etc.? | |
| 315 | + |
| 316 | +--- |
| 317 | + |
| 318 | +## 10. Sources |
| 319 | + |
| 320 | +- `tasks/h5cpp-compiler-backend-cookbook.md` (workspace) — transferable recipe for adding any new backend |
| 321 | +- `tasks/h5cpp-compiler-h5-attribute-taxonomy.md` (workspace) — HDF5 attribute vocabulary |
| 322 | +- `vargalabs/h5cpp-compiler#30` — Unified `--format` dispatcher and producer/consumer pattern |
| 323 | +- `vargalabs/h5cpp-compiler#33` — `src/producer_sql.hpp`, `src/h5cpp.cpp`, `tests/CMakeLists.txt` |
| 324 | +- PostgreSQL 16 Data Types: https://www.postgresql.org/docs/current/datatype.html |
| 325 | +- MySQL 8.0 Data Types: https://dev.mysql.com/doc/refman/8.0/en/data-types.html |
| 326 | +- SQLite3 Datatypes: https://www.sqlite.org/datatype3.html |
0 commit comments