docs: README

khvn26 · khvn26 · commit 377aa0eec5b2 · 2026-05-15T19:54:28.000+01:00
beep boop
diff --git a/README.md b/README.md
@@ -1,3 +1,137 @@
 # flagsmith-sql-flag-engine
 
-Placeholder. The initial package scaffold lands via the first pull request.
+SQL translator for Flagsmith segment predicates.
+
+Where the Python and Rust `flag_engine` implementations evaluate
+`is_context_in_segment` against an in-memory `EvaluationContext`, this
+package takes a `SegmentContext` and emits a SQL `WHERE` expression that
+evaluates the segment against an entire `IDENTITIES` table — one row per
+identity, with the identity's full trait map held in a single column
+the translator path-extracts at query time. `PERCENTAGE_SPLIT` and
+`:semver`-marked comparators compile to inline pure-SQL.
+
+## Quickstart
+
+```python
+from flag_engine.context.types import EvaluationContext, SegmentContext
+
+from flagsmith_sql_flag_engine import TranslateContext, translate_segment
+from flagsmith_sql_flag_engine.dialects import ClickHouseDialect
+
+eval_context: EvaluationContext = {
+    "environment": {"key": "n9fbf9...3ngWhb", "name": "Production"},
+}
+ctx = TranslateContext(evaluation_context=eval_context, dialect=ClickHouseDialect())
+
+segment: SegmentContext = {
+    "key": "growth-cohort",
+    "name": "Growth cohort",
+    "rules": [
+        {
+            "type": "ALL",
+            "conditions": [
+                {"operator": "EQUAL", "property": "plan", "value": "growth"},
+            ],
+        },
+    ],
+}
+where_expr = translate_segment(segment, ctx)
+# where_expr is a SQL string. Drop into:
+#   SELECT COUNT(*) FROM IDENTITIES i
+#   WHERE i.environment_id = 'n9fbf9...3ngWhb' AND ({where_expr})
+```
+
+`environment_id` in the `IDENTITIES` table is a string column holding
+`EnvironmentContext.key` directly — the same identifier the engine uses,
+no separate integer PK.
+
+`translate_segment` returns `None` if the segment uses an operator the
+translator can't handle — typically a REGEX pattern the active dialect's
+regex flavour can't compile. Callers should fall back to
+`flag_engine.is_context_in_segment` for those segments.
+
+## Schema
+
+Each dialect publishes the table layout it expects via a `schema_ddl`
+constant. For ClickHouse:
+
+```sql
+CREATE TABLE IF NOT EXISTS IDENTITIES (
+    environment_id String,
+    id UInt64,
+    identifier String,
+    identity_key String,
+    traits JSON
+)
+ENGINE = MergeTree()
+ORDER BY (environment_id, id);
+```
+
+Traits live in a single `JSON` column (CH 24+, GA in 25.x). Each key is
+stored as a typed subcolumn, so trait reads are direct columnar scans
+rather than per-row JSON parses. Trait keys are *data* — new keys appear
+without schema changes — and the translator only sees the abstract path
+extraction.
+
+ClickHouse Cloud requires `SET allow_experimental_json_type = 1` when
+creating a `JSON`-column table (the type is GA on OSS 25.x); the test
+harness applies this setting automatically.
+
+Programmatic access:
+
+```python
+from flagsmith_sql_flag_engine.dialects.clickhouse import SCHEMA_DDL
+```
+
+## Engine parity
+
+Validated against [Flagsmith/engine-test-data](https://github.com/Flagsmith/engine-test-data),
+the test suite every engine implementation is checked against. The
+engine-parity suite loads each test case's identity into a per-dialect
+scratch table, translates the case's segments, runs the generated SQL,
+and compares to `flag_engine.is_context_in_segment`.
+
+To run the engine-parity suite locally:
+
+```bash
+git submodule update --init                 # pull engine-test-data
+docker compose up --detach --wait clickhouse
+uv run pytest tests/test_engine.py
+```
+
+Adding a new dialect's parity coverage is one harness module — see
+`tests/harnesses/` for the shape.
+
+## Dialects
+
+The translator is dialect-aware: a `Dialect` protocol abstracts the
+SQL fragments that differ across SQL engines — MD5 hex, hex-to-int
+parsing, prefix-anchored regex, padded-version comparison, type-aware
+trait predicates, regex flavour. Today `ClickHouseDialect` is the only
+implementation; adding another engine such as Snowflake, DuckDB or
+Postgres means writing one class.
+
+## Operator coverage
+
+| Operator                                     | Translatable | Notes                                                          |
+| -------------------------------------------- | :----------: | -------------------------------------------------------------- |
+| `EQUAL`, `NOT_EQUAL`, `IN`                   |     yes      |                                                                |
+| `IS_SET`, `IS_NOT_SET`                       |     yes      | trait subcolumn `IS NOT NULL` / `IS NULL`                      |
+| `CONTAINS`, `NOT_CONTAINS`                   |     yes      |                                                                |
+| `GREATER_THAN`, `LESS_THAN` plus `_INCLUSIVE`|     yes      |                                                                |
+| `MODULO`                                     |     yes      |                                                                |
+| `PERCENTAGE_SPLIT`                           |     yes      | inlined MD5-mod-9999; ~0.005% diverge on hash==9998            |
+| `REGEX`                                      |   partial    | dialect-flavour gated; unsupported patterns → caller fallback  |
+| `:semver`-marked comparators                 |     yes      | major.minor.patch only; ignores prerelease                     |
+
+## Development
+
+```bash
+make install                  # uv sync + pre-commit install
+make lint                     # run pre-commit hooks across the tree
+make typecheck                # mypy
+make test                     # unit tests
+```
+
+Ruff (lint + format) runs as a pre-commit hook on every commit. Mypy
+runs as a `make typecheck` hook on staged Python files.