Skip to content

Commit b8f8dfc

Browse files
committed
Merge branch 'staging' into release
2 parents 754f8b8 + c8c9636 commit b8f8dfc

193 files changed

Lines changed: 11452 additions & 83 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 326 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,326 @@
1+
# sql:: Attribute Vocabulary (SQL Backend)
2+
3+
**Date:** 2026-05-24
4+
**Authors:** Steven Varga
5+
**Status:** Implemented on `vargalabs/h5cpp-compiler#33-sql-backend`
6+
**Repos:** `vargalabs/h5cpp-compiler` (SQL DDL emitter, issue #33)
7+
**Scope:** SQL backend C++ attribute vocabulary and type-mapping taxonomy. Defines the `[[sql::...]]` attributes the compiler will recognize (once attribute wiring is extended), the DDL emission semantics, and the three supported dialects.
8+
**Companions:**
9+
- `tasks/h5cpp-compiler-h5-attribute-taxonomy.md` — HDF5 attribute vocabulary
10+
- `tasks/h5cpp-compiler-backend-cookbook.md` — transferable recipe for adding any new backend
11+
12+
---
13+
14+
## TL;DR
15+
16+
The SQL backend emits `CREATE TABLE` DDL statements from annotated C++ structs. Three dialects are supported: **PostgreSQL**, **MySQL**, and **SQLite3**. Each backend gets a boolean flag (`--sql-postgres`, `--sql-mysql`, `--sql-lite3`) and uses the unified `-o <file>` output path.
17+
18+
| Surface today (C++17 standard-attribute) | C++26 reflection form |
19+
|---|---|
20+
| `[[sql::table("users")]]` | `[[=sql::table{"users"}]]` |
21+
| `[[sql::column("user_id")]]` | `[[=sql::column{"user_id"}]]` |
22+
| `[[sql::primary_key]]` | `[[=sql::primary_key{}]]` |
23+
| `[[sql::unique]]` | `[[=sql::unique{}]]` |
24+
| `[[sql::not_null]]` | `[[=sql::not_null{}]]` |
25+
| `[[sql::default_(42)]]` | `[[=sql::default_{42}]]` |
26+
| `[[sql::foreign_key("other_table.col")]]` | `[[=sql::foreign_key{"other_table.col"}]]` |
27+
| `[[sql::check("value > 0")]]` | `[[=sql::check{"value > 0"}]]` |
28+
| `[[sql::index]]` | `[[=sql::index{}]]` |
29+
| `[[sql::type_override("VARCHAR(255)")]]` | `[[=sql::type_override{"VARCHAR(255)"}]]` |
30+
| `[[sql::nested("jsonb")]]` | `[[=sql::nested{"jsonb"}]]` |
31+
32+
Only syntactic shift is `(args)``{args}` under the `[[=...]]` form. **Names stay put.**
33+
34+
---
35+
36+
## 1. Dialect flags and CLI usage
37+
38+
```bash
39+
h5cpp --sql-postgres -o schema.sql my_structs.cpp -- -std=c++17
40+
h5cpp --sql-mysql -o schema.sql my_structs.cpp -- -std=c++17
41+
h5cpp --sql-lite3 -o schema.sql my_structs.cpp -- -std=c++17
42+
```
43+
44+
The SQL backend is integrated into the unified `--format` dispatcher (issue #30). Each dialect is a distinct enum value in `OutputFormat` and gets its own CLI flag.
45+
46+
---
47+
48+
## 2. Universal vocabulary — same words, `sql::` namespace
49+
50+
These attributes use vocabulary identical to `h5::*` where the concept overlaps (rename, ignore, doc, alias). They live in `sql::` so the namespace stays self-contained for SQL-only users.
51+
52+
### Universal Tier 1 — must-have
53+
54+
| Attribute | Purpose | Example |
55+
|---|---|---|
56+
| `[[sql::name("col_name")]]` | Rename a field for the SQL column. Defaults to the C++ field name. | `[[sql::name("created_at")]] std::chrono::time_point_t ts;` |
57+
| `[[sql::ignore]]` | Skip this field entirely. Column absent from the emitted table. | `[[sql::ignore]] int debug_counter;` |
58+
59+
### Universal Tier 2 — high value, low cost
60+
61+
| Attribute | Purpose | Example |
62+
|---|---|---|
63+
| `[[sql::doc("description")]]` | Emitted as a SQL comment on the column or table. | `[[sql::doc("nanoseconds since epoch")]] std::uint64_t ts;` |
64+
| `[[sql::alias("Name")]]` | **Class-level.** Overrides the table name. Defaults to sanitized C++ qualified name (`::``_`). | `struct [[sql::alias("Users")]] user_t { ... };` |
65+
66+
---
67+
68+
## 3. SQL-specific vocabulary
69+
70+
### Tier 1 — must-have
71+
72+
| Attribute | Purpose | Example |
73+
|---|---|---|
74+
| `[[sql::table("name")]]` | **Class-level.** Overrides the generated table name. | `struct [[sql::table("app_users")]] user_t { ... };` |
75+
| `[[sql::column("name")]]` | **Field-level.** Overrides the generated column name. | `[[sql::column("user_id")]] int id;` |
76+
| `[[sql::primary_key]]` | Emits `PRIMARY KEY` constraint. | `[[sql::primary_key]] int id;` |
77+
| `[[sql::not_null]]` | Emits `NOT NULL` constraint. **Default for all fields** (C++ POD fields are always present). | `[[sql::not_null]] int id;` |
78+
| `[[sql::unique]]` | Emits `UNIQUE` constraint. | `[[sql::unique]] std::string email;` |
79+
80+
### Tier 2 — high value
81+
82+
| Attribute | Purpose | Example |
83+
|---|---|---|
84+
| `[[sql::default_(value)]]` | Emits `DEFAULT value` clause. Dialect-aware quoting (strings quoted, numbers bare). | `[[sql::default_("active")]] std::string status;` |
85+
| `[[sql::foreign_key("table.col")]]` | Emits `REFERENCES table(col)` clause. | `[[sql::foreign_key("orders.id")]] int order_id;` |
86+
| `[[sql::check("expr")]]` | Emits `CHECK (expr)` clause. | `[[sql::check("value > 0")]] int value;` |
87+
| `[[sql::index]]` | Emits a `CREATE INDEX` statement after the table. | `[[sql::index]] std::string email;` |
88+
89+
### Tier 3 — type control
90+
91+
| Attribute | Purpose | Example |
92+
|---|---|---|
93+
| `[[sql::type_override("TYPE")]]` | Overrides the dialect-specific type mapping for this field. | `[[sql::type_override("VARCHAR(255)")]] std::string name;` |
94+
| `[[sql::nested("jsonb")]]` | Controls how nested structs are represented. Values: `"jsonb"` (PostgreSQL), `"json"` (MySQL), `"text"` (SQLite3), or `"table"` (create a separate table with FK). | `[[sql::nested("jsonb")]] address_t addr;` |
95+
96+
---
97+
98+
## 4. Type map — C++ → SQL per dialect
99+
100+
### Primitives
101+
102+
| C++ type | PostgreSQL | MySQL | SQLite3 |
103+
|---|---|---|---|
104+
| `bool` / `_Bool` | `BOOLEAN` | `TINYINT(1)` | `INTEGER` |
105+
| `char` | `SMALLINT` | `TINYINT` | `INTEGER` |
106+
| `unsigned char` | `SMALLINT` | `TINYINT UNSIGNED` | `INTEGER` |
107+
| `short` | `SMALLINT` | `SMALLINT` | `INTEGER` |
108+
| `unsigned short` | `SMALLINT` | `SMALLINT UNSIGNED` | `INTEGER` |
109+
| `int` | `INTEGER` | `INT` | `INTEGER` |
110+
| `unsigned int` | `INTEGER` | `INT UNSIGNED` | `INTEGER` |
111+
| `long` | `BIGINT` | `BIGINT` | `INTEGER` |
112+
| `unsigned long` | `BIGINT` | `BIGINT UNSIGNED` | `INTEGER` |
113+
| `long long` | `BIGINT` | `BIGINT` | `INTEGER` |
114+
| `unsigned long long` | `BIGINT` | `BIGINT UNSIGNED` | `INTEGER` |
115+
| `float` | `REAL` | `FLOAT` | `REAL` |
116+
| `double` | `DOUBLE PRECISION` | `DOUBLE` | `REAL` |
117+
| `long double` | `DOUBLE PRECISION` | `DOUBLE` | `REAL` |
118+
| `enum` / `enum class` | `INTEGER` | `INT` | `INTEGER` |
119+
120+
### Complex types
121+
122+
| C++ type | PostgreSQL | MySQL | SQLite3 | Notes |
123+
|---|---|---|---|---|
124+
| Nested struct `S` | `JSONB` | `JSON` | `TEXT` | One column holding the nested struct as JSON. |
125+
| `T[N]` (C array) | `T[]` | `JSON` | `TEXT` | PostgreSQL native arrays; MySQL/SQLite fall back to JSON/TEXT. |
126+
| `T[M][N]` (multi-dim array) | `T[][]` | `JSON` | `TEXT` | PostgreSQL supports `[][]` syntax; MySQL/SQLite fall back. |
127+
| `std::string` | `TEXT` | `TEXT` | `TEXT` | No length limit assumed; override with `sql::type_override`. |
128+
| `std::vector<T>` | `JSONB` | `JSON` | `TEXT` | No native variable-length array in standard SQL. |
129+
| `std::map<K,V>` | `JSONB` | `JSON` | `TEXT` | No native map type; JSON fallback. |
130+
| `std::optional<T>` | `T` (nullable) | `T` (nullable) | `T` (nullable) | Emitted without `NOT NULL`; runtime nulls allowed. |
131+
| `std::chrono::time_point` | `TIMESTAMPTZ` | `DATETIME(6)` | `TEXT` | Auto-detected; can override with `sql::type_override`. |
132+
133+
---
134+
135+
## 5. DDL output format
136+
137+
### Single struct
138+
139+
Input:
140+
```cpp
141+
namespace sn {
142+
struct Record {
143+
int id;
144+
double value;
145+
};
146+
}
147+
```
148+
149+
PostgreSQL output:
150+
```sql
151+
-- Generated by h5cpp-compiler SQL backend
152+
-- Dialect: postgresql
153+
154+
CREATE TABLE IF NOT EXISTS "sn__Record" (
155+
"id" INTEGER NOT NULL,
156+
"value" DOUBLE PRECISION NOT NULL
157+
);
158+
```
159+
160+
MySQL output:
161+
```sql
162+
-- Generated by h5cpp-compiler SQL backend
163+
-- Dialect: mysql
164+
165+
CREATE TABLE IF NOT EXISTS `sn__Record` (
166+
`id` INT NOT NULL,
167+
`value` DOUBLE NOT NULL
168+
);
169+
```
170+
171+
SQLite3 output:
172+
```sql
173+
-- Generated by h5cpp-compiler SQL backend
174+
-- Dialect: sqlite3
175+
176+
CREATE TABLE IF NOT EXISTS "sn__Record" (
177+
"id" INTEGER NOT NULL,
178+
"value" REAL NOT NULL
179+
);
180+
```
181+
182+
### Nested structs
183+
184+
Input:
185+
```cpp
186+
namespace sn {
187+
struct Inner { int a; double b; };
188+
struct Outer {
189+
int idx;
190+
Inner inner_singleton;
191+
Inner inner_array[4];
192+
};
193+
}
194+
```
195+
196+
PostgreSQL output:
197+
```sql
198+
CREATE TABLE IF NOT EXISTS "sn__Inner" (
199+
"a" INTEGER NOT NULL,
200+
"b" DOUBLE PRECISION NOT NULL
201+
);
202+
203+
CREATE TABLE IF NOT EXISTS "sn__Outer" (
204+
"idx" INTEGER NOT NULL,
205+
"inner_singleton" JSONB NOT NULL,
206+
"inner_array" JSONB[] NOT NULL
207+
);
208+
```
209+
210+
**Key behavior:** Nested structs produce a separate `CREATE TABLE` for the inner struct, AND the outer struct references it as `JSONB` (or `JSON` / `TEXT` depending on dialect). This is a pragmatic compromise: the inner table exists for introspection and potential FK usage, but the outer table stores the nested data as JSON for query simplicity.
211+
212+
### Multidimensional arrays
213+
214+
Input:
215+
```cpp
216+
namespace sn {
217+
struct Cell { float x; float y; };
218+
struct Grid { int idx; Cell field_05[3][8]; };
219+
}
220+
```
221+
222+
PostgreSQL output:
223+
```sql
224+
CREATE TABLE IF NOT EXISTS "sn__Cell" (
225+
"x" REAL NOT NULL,
226+
"y" REAL NOT NULL
227+
);
228+
229+
CREATE TABLE IF NOT EXISTS "sn__Grid" (
230+
"idx" INTEGER NOT NULL,
231+
"field_05" JSONB[][] NOT NULL
232+
);
233+
```
234+
235+
---
236+
237+
## 6. Identifier rules per dialect
238+
239+
| Dialect | Table/column quoting | `::``_` | Case sensitivity |
240+
|---|---|---|---|
241+
| PostgreSQL | `"identifier"` | Yes | Preserved (quoted) |
242+
| MySQL | `` `identifier` `` | Yes | Preserved (quoted, but depends on OS/file system) |
243+
| SQLite3 | `"identifier"` | Yes | Preserved (quoted) |
244+
245+
The `sanitize_name()` function replaces all `::` with `_` so C++ qualified names like `sn::typecheck::Record` become `sn__typecheck__Record`.
246+
247+
---
248+
249+
## 7. Test coverage
250+
251+
All 11 existing fixtures are reused for SQL testing, yielding 33 SQL-specific tests (11 fixtures × 3 dialects). The HDF5 tests (11 tests) remain unchanged.
252+
253+
| Fixture | What it exercises | PostgreSQL | MySQL | SQLite3 |
254+
|---|---|---|---|---|
255+
| `primitives` | All primitive type mappings ||||
256+
| `typedef` | Typedef resolution ||||
257+
| `nested-ns` | Namespace nesting → table naming ||||
258+
| `embedded-pod` | Nested structs + 1D arrays ||||
259+
| `array-of-arrays` | Multi-dim arrays of structs ||||
260+
| `ignored-non-pod` | Non-POD skipped ||||
261+
| `unreferenced` | Unreferenced types skipped ||||
262+
| `topological` | Multiple related structs ||||
263+
| `enum-member` | Enum → INTEGER ||||
264+
| `multi-array` | Multi-dim primitive arrays ||||
265+
| `inheritance` | Inheritance skipped ||||
266+
267+
---
268+
269+
## 8. Attribute wiring status
270+
271+
### Not yet wired (staging branch #30 pattern)
272+
273+
The `sql::` attribute namespace is **defined in this taxonomy** but not yet implemented in the attribute rewriter (`h5_attr_translator.hpp`) or consumed in the SQL producer. The staging branch uses the producer/consumer pattern; attribute integration would follow the same `h5_attr_reader` + `clang::annotate` rewrite path used for `h5::` attributes (issue #32).
274+
275+
| Attribute | Where read (planned) | Where emitted (planned) |
276+
|---|---|---|
277+
| `sql::table` | `h5_attr_reader::read_class_string(node, "sql::table")` | Overrides table name in `record_decl_impl` |
278+
| `sql::column` | `h5_attr_reader::read_field_string(fld, "sql::column")` | Overrides column name in `type_insert_impl` |
279+
| `sql::primary_key` | `h5_attr_reader::has_attr(fld, "sql::primary_key")` | Appends `PRIMARY KEY` to column definition |
280+
| `sql::unique` | `h5_attr_reader::has_attr(fld, "sql::unique")` | Appends `UNIQUE` to column definition |
281+
| `sql::not_null` | `h5_attr_reader::has_attr(fld, "sql::not_null")` | Emits `NOT NULL` (default anyway) |
282+
| `sql::default_` | `h5_attr_reader::read_field_string(fld, "sql::default_")` | Appends `DEFAULT value` |
283+
| `sql::foreign_key` | `h5_attr_reader::read_field_string(fld, "sql::foreign_key")` | Appends `REFERENCES ...` |
284+
| `sql::check` | `h5_attr_reader::read_field_string(fld, "sql::check")` | Emits `CHECK (expr)` on column or table |
285+
| `sql::index` | `h5_attr_reader::has_attr(fld, "sql::index")` | Emits `CREATE INDEX` after table DDL |
286+
| `sql::type_override` | `h5_attr_reader::read_field_string(fld, "sql::type_override")` | Replaces mapped type in `type_insert_impl` |
287+
| `sql::nested` | `h5_attr_reader::read_field_string(fld, "sql::nested")` | Controls JSONB/JSON/TEXT vs separate table |
288+
289+
---
290+
291+
## 9. Design decisions
292+
293+
### Decided
294+
295+
| # | Decision | Rationale |
296+
|---|---|---|
297+
| 1 | **Three separate producer types** (`SqlProducer<postgres>`, `SqlProducer<mysql>`, `SqlProducer<sqlite3>`) via CRTP template | Clean separation of type maps; zero runtime overhead; follows existing `H5Producer` pattern. |
298+
| 2 | **Nested structs → JSON/JSONB/TEXT** | SQL has no native nested struct type. Flattening would require changing the consumer; JSON is the pragmatic SQL-native approach. |
299+
| 3 | **Arrays → `T[]` (PostgreSQL) or JSON/TEXT (MySQL/SQLite3)** | PostgreSQL has native arrays; MySQL and SQLite3 do not. Consistent fallback to JSON/TEXT for the latter two. |
300+
| 4 | **All fields `NOT NULL` by default** | C++ POD fields are always present. `std::optional<T>` (when supported) would omit `NOT NULL`. |
301+
| 5 | **Separate `CREATE TABLE` per struct** | Topological ordering from the consumer naturally yields one table per struct. Each struct is self-contained DDL. |
302+
| 6 | **Sanitize `::``_`** | `::` is not valid in unquoted SQL identifiers. Quoted identifiers could preserve it, but `_` is cleaner and more portable. |
303+
| 7 | **No `DROP TABLE IF EXISTS`** | The emitted DDL is idempotent (`CREATE TABLE IF NOT EXISTS`) and non-destructive. |
304+
305+
### Open
306+
307+
| # | Question | Context |
308+
|---|---|---|
309+
| 1 | **Attribute wiring.** | When should `sql::` attributes be added to `h5_attr_translator.hpp`? After issue #32 (`h5::` attributes) is merged to staging, or in parallel? |
310+
| 2 | **`sql::nested("table")` semantics.** | If a user requests separate-table nesting, should the compiler emit a foreign key column (`INTEGER REFERENCES inner(id)`) or a join table? |
311+
| 3 | **`std::optional<T>` support.** | The staging branch consumer does not yet traverse `std::optional`. Once #32 adds it, SQL should emit nullable columns. |
312+
| 4 | **Index emission.** | Should `sql::index` emit `CREATE INDEX` inline after the table, or collect all indexes and emit them at the end of the file? |
313+
| 5 | **Multi-table FKs.** | For `topological` fixtures with multiple related structs, should the compiler auto-emit foreign keys between tables? |
314+
| 6 | **Type override dialect scoping.** | Should `sql::type_override("VARCHAR(255)")` apply to all dialects, or should there be `sql::type_override_postgres`, `sql::type_override_mysql`, etc.? |
315+
316+
---
317+
318+
## 10. Sources
319+
320+
- `tasks/h5cpp-compiler-backend-cookbook.md` (workspace) — transferable recipe for adding any new backend
321+
- `tasks/h5cpp-compiler-h5-attribute-taxonomy.md` (workspace) — HDF5 attribute vocabulary
322+
- `vargalabs/h5cpp-compiler#30` — Unified `--format` dispatcher and producer/consumer pattern
323+
- `vargalabs/h5cpp-compiler#33``src/producer_sql.hpp`, `src/h5cpp.cpp`, `tests/CMakeLists.txt`
324+
- PostgreSQL 16 Data Types: https://www.postgresql.org/docs/current/datatype.html
325+
- MySQL 8.0 Data Types: https://dev.mysql.com/doc/refman/8.0/en/data-types.html
326+
- SQLite3 Datatypes: https://www.sqlite.org/datatype3.html

0 commit comments

Comments
 (0)