|
| 1 | +# sql-redis |
| 2 | + |
| 3 | +A proof-of-concept SQL-to-Redis translator that converts SQL SELECT statements into Redis `FT.SEARCH` and `FT.AGGREGATE` commands. |
| 4 | + |
| 5 | +## Status |
| 6 | + |
| 7 | +This is an **early POC** demonstrating feasibility, not a production-ready library. The goal is to explore design decisions and validate the approach before committing to a full implementation. |
| 8 | + |
| 9 | +## Quick Example |
| 10 | + |
| 11 | +```python |
| 12 | +from redis import Redis |
| 13 | +from sql_redis import Translator |
| 14 | +from sql_redis.schema import SchemaRegistry |
| 15 | +from sql_redis.executor import Executor |
| 16 | + |
| 17 | +client = Redis() |
| 18 | +registry = SchemaRegistry(client) |
| 19 | +registry.load_all() # Loads index schemas from Redis |
| 20 | + |
| 21 | +executor = Executor(client, registry) |
| 22 | + |
| 23 | +# Simple query |
| 24 | +result = executor.execute(""" |
| 25 | + SELECT title, price |
| 26 | + FROM products |
| 27 | + WHERE category = 'electronics' AND price < 500 |
| 28 | + ORDER BY price ASC |
| 29 | + LIMIT 10 |
| 30 | +""") |
| 31 | + |
| 32 | +for row in result.rows: |
| 33 | + print(row["title"], row["price"]) |
| 34 | + |
| 35 | +# Vector search with params |
| 36 | +result = executor.execute(""" |
| 37 | + SELECT title, vector_distance(embedding, :vec) AS score |
| 38 | + FROM products |
| 39 | + LIMIT 5 |
| 40 | +""", params={"vec": vector_bytes}) |
| 41 | +``` |
| 42 | + |
| 43 | +## Design Decisions |
| 44 | + |
| 45 | +### Why SQL instead of a pandas-like Python DSL? |
| 46 | + |
| 47 | +We considered several interface options: |
| 48 | + |
| 49 | +| Approach | Example | Trade-offs | |
| 50 | +|----------|---------|------------| |
| 51 | +| **SQL** | `SELECT * FROM products WHERE price > 100` | Universal, well-understood, tooling exists | |
| 52 | +| **Pandas-like** | `df[df.price > 100]` | Pythonic but limited to Python, no standard | |
| 53 | +| **Builder pattern** | `query.select("*").where(price__gt=100)` | Type-safe but verbose, learning curve | |
| 54 | + |
| 55 | +**We chose SQL because:** |
| 56 | + |
| 57 | +1. **Universality** — SQL is the lingua franca of data. Developers, analysts, and tools all speak it. |
| 58 | +2. **No new DSL to learn** — Users already know SQL. A pandas-like API requires learning our specific dialect. |
| 59 | +3. **Tooling compatibility** — SQL strings can be generated by ORMs, query builders, or AI assistants. |
| 60 | +4. **Clear mapping** — SQL semantics map reasonably well to RediSearch operations (SELECT→LOAD, WHERE→filter, GROUP BY→GROUPBY). |
| 61 | + |
| 62 | +The downside is losing Python's type checking and IDE support, but for a query interface, the universality trade-off is worth it. |
| 63 | + |
| 64 | +### Why sqlglot instead of writing a custom parser? |
| 65 | + |
| 66 | +**Options considered:** |
| 67 | +- **Custom parser** (regex, hand-rolled recursive descent) |
| 68 | +- **PLY/Lark** (parser generators) |
| 69 | +- **sqlglot** (production SQL parser) |
| 70 | +- **sqlparse** (tokenizer, not a full parser) |
| 71 | + |
| 72 | +**We chose sqlglot because:** |
| 73 | + |
| 74 | +1. **Battle-tested** — Used in production by companies like Tobiko (SQLMesh). Handles edge cases we'd miss. |
| 75 | +2. **Full AST** — Provides a complete abstract syntax tree, not just tokens. We can traverse and analyze queries properly. |
| 76 | +3. **Dialect support** — Handles SQL variations. Users can write MySQL-style or PostgreSQL-style queries. |
| 77 | +4. **Active maintenance** — Regular releases, responsive maintainers, good documentation. |
| 78 | + |
| 79 | +The alternative was writing a custom parser, which would be error-prone and time-consuming for a POC. sqlglot lets us focus on the translation logic rather than parsing edge cases. |
| 80 | + |
| 81 | +### Why schema-aware translation? |
| 82 | + |
| 83 | +Redis field types determine query syntax: |
| 84 | + |
| 85 | +| Field Type | Redis Syntax | Example | |
| 86 | +|------------|--------------|---------| |
| 87 | +| TEXT | `@field:term` | `@title:laptop` | |
| 88 | +| NUMERIC | `@field:[min max]` | `@price:[100 500]` | |
| 89 | +| TAG | `@field:{value}` | `@category:{books}` | |
| 90 | + |
| 91 | +**Without schema knowledge**, we can't translate `category = 'books'` correctly — it could be `@category:books` (TEXT search) or `@category:{books}` (TAG exact match). |
| 92 | + |
| 93 | +**Our approach:** The `SchemaRegistry` fetches index schemas via `FT.INFO` at startup. The translator uses this to generate correct syntax per field type. |
| 94 | + |
| 95 | +This adds a Redis round-trip at initialization but ensures correct query generation. |
| 96 | + |
| 97 | +### Architecture: Why this layered design? |
| 98 | + |
| 99 | +``` |
| 100 | +SQL String |
| 101 | + ↓ |
| 102 | +┌─────────────────┐ |
| 103 | +│ SQLParser │ Parse SQL → ParsedQuery dataclass |
| 104 | +└────────┬────────┘ |
| 105 | + ↓ |
| 106 | +┌─────────────────┐ |
| 107 | +│ SchemaRegistry │ Load field types from Redis |
| 108 | +└────────┬────────┘ |
| 109 | + ↓ |
| 110 | +┌─────────────────┐ |
| 111 | +│ Analyzer │ Classify conditions by field type |
| 112 | +└────────┬────────┘ |
| 113 | + ↓ |
| 114 | +┌─────────────────┐ |
| 115 | +│ QueryBuilder │ Generate RediSearch syntax per type |
| 116 | +└────────┬────────┘ |
| 117 | + ↓ |
| 118 | +┌─────────────────┐ |
| 119 | +│ Translator │ Orchestrate pipeline, build command |
| 120 | +└────────┬────────┘ |
| 121 | + ↓ |
| 122 | +┌─────────────────┐ |
| 123 | +│ Executor │ Execute command, parse results |
| 124 | +└────────┬────────┘ |
| 125 | + ↓ |
| 126 | +QueryResult(rows, count) |
| 127 | +``` |
| 128 | + |
| 129 | +**Why separate components?** |
| 130 | + |
| 131 | +1. **Testability** — Each layer has focused unit tests. 100% coverage is achievable because responsibilities are clear. |
| 132 | +2. **Single responsibility** — Parser doesn't know about Redis. QueryBuilder doesn't know about SQL. Changes are localized. |
| 133 | +3. **Extensibility** — Adding a new field type (e.g., GEO) means updating Analyzer and QueryBuilder, not rewriting everything. |
| 134 | + |
| 135 | +**Why not a single monolithic translator?** |
| 136 | + |
| 137 | +Early prototypes combined parsing and translation. This led to: |
| 138 | +- Tests that required Redis connections for simple SQL parsing tests |
| 139 | +- Difficulty testing edge cases in isolation |
| 140 | +- Tangled code that was hard to modify |
| 141 | + |
| 142 | +The layered approach emerged from TDD — writing tests first revealed natural boundaries. |
| 143 | + |
| 144 | +## What's Implemented |
| 145 | + |
| 146 | +- [x] Basic SELECT with field selection |
| 147 | +- [x] WHERE with TEXT, NUMERIC, TAG field types |
| 148 | +- [x] Comparison operators: `=`, `!=`, `<`, `<=`, `>`, `>=`, `BETWEEN`, `IN` |
| 149 | +- [x] Boolean operators: `AND`, `OR` |
| 150 | +- [x] Aggregations: `COUNT`, `SUM`, `AVG`, `MIN`, `MAX` |
| 151 | +- [x] `GROUP BY` with multiple aggregations |
| 152 | +- [x] `ORDER BY` with ASC/DESC |
| 153 | +- [x] `LIMIT` and `OFFSET` pagination |
| 154 | +- [x] Computed fields: `price * 0.9 AS discounted` |
| 155 | +- [x] Vector KNN search: `vector_distance(field, :param)` |
| 156 | +- [x] Hybrid search (filters + vector) |
| 157 | +- [x] Full-text search: `LIKE 'prefix%'` (prefix), `fulltext(field, 'terms')` function |
| 158 | + |
| 159 | +## What's Not Implemented (Yet...) |
| 160 | + |
| 161 | +- [ ] JOINs (Redis doesn't support cross-index joins) |
| 162 | +- [ ] Subqueries |
| 163 | +- [ ] HAVING clause |
| 164 | +- [ ] DISTINCT |
| 165 | +- [ ] GEO field queries |
| 166 | +- [ ] Index creation from SQL (CREATE INDEX) |
| 167 | + |
| 168 | +## Development |
| 169 | + |
| 170 | +```bash |
| 171 | +# Install dependencies |
| 172 | +uv sync --all-extras |
| 173 | + |
| 174 | +# Run tests (requires Docker for testcontainers) |
| 175 | +uv run pytest |
| 176 | + |
| 177 | +# Run with coverage |
| 178 | +uv run pytest --cov=sql_redis --cov-report=html |
| 179 | +``` |
| 180 | + |
| 181 | +## Testing Philosophy |
| 182 | + |
| 183 | +This project uses strict TDD with 100% test coverage as a hard requirement. The approach: |
| 184 | + |
| 185 | +1. **Write failing tests first** — Define expected behavior before implementation |
| 186 | +2. **One test at a time** — Implement just enough to pass each test |
| 187 | +3. **No untestable code** — If we can't test it, we don't write it |
| 188 | +4. **Integration tests mirror raw Redis** — `test_sql_queries.py` verifies SQL produces same results as equivalent `FT.AGGREGATE` commands in `test_redis_queries.py` |
| 189 | + |
| 190 | +Coverage is enforced in CI. Pragmas (`# pragma: no cover`) are forbidden — if code can't be tested, it shouldn't exist. |
| 191 | + |
0 commit comments