Skip to content

Commit 353d031

Browse files
committed
Add README with decisions
1 parent 440b7ec commit 353d031

3 files changed

Lines changed: 368 additions & 4 deletions

File tree

README.md

Lines changed: 191 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,191 @@
1+
# sql-redis
2+
3+
A proof-of-concept SQL-to-Redis translator that converts SQL SELECT statements into Redis `FT.SEARCH` and `FT.AGGREGATE` commands.
4+
5+
## Status
6+
7+
This is an **early POC** demonstrating feasibility, not a production-ready library. The goal is to explore design decisions and validate the approach before committing to a full implementation.
8+
9+
## Quick Example
10+
11+
```python
12+
from redis import Redis
13+
from sql_redis import Translator
14+
from sql_redis.schema import SchemaRegistry
15+
from sql_redis.executor import Executor
16+
17+
client = Redis()
18+
registry = SchemaRegistry(client)
19+
registry.load_all() # Loads index schemas from Redis
20+
21+
executor = Executor(client, registry)
22+
23+
# Simple query
24+
result = executor.execute("""
25+
SELECT title, price
26+
FROM products
27+
WHERE category = 'electronics' AND price < 500
28+
ORDER BY price ASC
29+
LIMIT 10
30+
""")
31+
32+
for row in result.rows:
33+
print(row["title"], row["price"])
34+
35+
# Vector search with params
36+
result = executor.execute("""
37+
SELECT title, vector_distance(embedding, :vec) AS score
38+
FROM products
39+
LIMIT 5
40+
""", params={"vec": vector_bytes})
41+
```
42+
43+
## Design Decisions
44+
45+
### Why SQL instead of a pandas-like Python DSL?
46+
47+
We considered several interface options:
48+
49+
| Approach | Example | Trade-offs |
50+
|----------|---------|------------|
51+
| **SQL** | `SELECT * FROM products WHERE price > 100` | Universal, well-understood, tooling exists |
52+
| **Pandas-like** | `df[df.price > 100]` | Pythonic but limited to Python, no standard |
53+
| **Builder pattern** | `query.select("*").where(price__gt=100)` | Type-safe but verbose, learning curve |
54+
55+
**We chose SQL because:**
56+
57+
1. **Universality** — SQL is the lingua franca of data. Developers, analysts, and tools all speak it.
58+
2. **No new DSL to learn** — Users already know SQL. A pandas-like API requires learning our specific dialect.
59+
3. **Tooling compatibility** — SQL strings can be generated by ORMs, query builders, or AI assistants.
60+
4. **Clear mapping** — SQL semantics map reasonably well to RediSearch operations (SELECT→LOAD, WHERE→filter, GROUP BY→GROUPBY).
61+
62+
The downside is losing Python's type checking and IDE support, but for a query interface, the universality trade-off is worth it.
63+
64+
### Why sqlglot instead of writing a custom parser?
65+
66+
**Options considered:**
67+
- **Custom parser** (regex, hand-rolled recursive descent)
68+
- **PLY/Lark** (parser generators)
69+
- **sqlglot** (production SQL parser)
70+
- **sqlparse** (tokenizer, not a full parser)
71+
72+
**We chose sqlglot because:**
73+
74+
1. **Battle-tested** — Used in production by companies like Tobiko (SQLMesh). Handles edge cases we'd miss.
75+
2. **Full AST** — Provides a complete abstract syntax tree, not just tokens. We can traverse and analyze queries properly.
76+
3. **Dialect support** — Handles SQL variations. Users can write MySQL-style or PostgreSQL-style queries.
77+
4. **Active maintenance** — Regular releases, responsive maintainers, good documentation.
78+
79+
The alternative was writing a custom parser, which would be error-prone and time-consuming for a POC. sqlglot lets us focus on the translation logic rather than parsing edge cases.
80+
81+
### Why schema-aware translation?
82+
83+
Redis field types determine query syntax:
84+
85+
| Field Type | Redis Syntax | Example |
86+
|------------|--------------|---------|
87+
| TEXT | `@field:term` | `@title:laptop` |
88+
| NUMERIC | `@field:[min max]` | `@price:[100 500]` |
89+
| TAG | `@field:{value}` | `@category:{books}` |
90+
91+
**Without schema knowledge**, we can't translate `category = 'books'` correctly — it could be `@category:books` (TEXT search) or `@category:{books}` (TAG exact match).
92+
93+
**Our approach:** The `SchemaRegistry` fetches index schemas via `FT.INFO` at startup. The translator uses this to generate correct syntax per field type.
94+
95+
This adds a Redis round-trip at initialization but ensures correct query generation.
96+
97+
### Architecture: Why this layered design?
98+
99+
```
100+
SQL String
101+
102+
┌─────────────────┐
103+
│ SQLParser │ Parse SQL → ParsedQuery dataclass
104+
└────────┬────────┘
105+
106+
┌─────────────────┐
107+
│ SchemaRegistry │ Load field types from Redis
108+
└────────┬────────┘
109+
110+
┌─────────────────┐
111+
│ Analyzer │ Classify conditions by field type
112+
└────────┬────────┘
113+
114+
┌─────────────────┐
115+
│ QueryBuilder │ Generate RediSearch syntax per type
116+
└────────┬────────┘
117+
118+
┌─────────────────┐
119+
│ Translator │ Orchestrate pipeline, build command
120+
└────────┬────────┘
121+
122+
┌─────────────────┐
123+
│ Executor │ Execute command, parse results
124+
└────────┬────────┘
125+
126+
QueryResult(rows, count)
127+
```
128+
129+
**Why separate components?**
130+
131+
1. **Testability** — Each layer has focused unit tests. 100% coverage is achievable because responsibilities are clear.
132+
2. **Single responsibility** — Parser doesn't know about Redis. QueryBuilder doesn't know about SQL. Changes are localized.
133+
3. **Extensibility** — Adding a new field type (e.g., GEO) means updating Analyzer and QueryBuilder, not rewriting everything.
134+
135+
**Why not a single monolithic translator?**
136+
137+
Early prototypes combined parsing and translation. This led to:
138+
- Tests that required Redis connections for simple SQL parsing tests
139+
- Difficulty testing edge cases in isolation
140+
- Tangled code that was hard to modify
141+
142+
The layered approach emerged from TDD — writing tests first revealed natural boundaries.
143+
144+
## What's Implemented
145+
146+
- [x] Basic SELECT with field selection
147+
- [x] WHERE with TEXT, NUMERIC, TAG field types
148+
- [x] Comparison operators: `=`, `!=`, `<`, `<=`, `>`, `>=`, `BETWEEN`, `IN`
149+
- [x] Boolean operators: `AND`, `OR`
150+
- [x] Aggregations: `COUNT`, `SUM`, `AVG`, `MIN`, `MAX`
151+
- [x] `GROUP BY` with multiple aggregations
152+
- [x] `ORDER BY` with ASC/DESC
153+
- [x] `LIMIT` and `OFFSET` pagination
154+
- [x] Computed fields: `price * 0.9 AS discounted`
155+
- [x] Vector KNN search: `vector_distance(field, :param)`
156+
- [x] Hybrid search (filters + vector)
157+
- [x] Full-text search: `LIKE 'prefix%'` (prefix), `fulltext(field, 'terms')` function
158+
159+
## What's Not Implemented (Yet...)
160+
161+
- [ ] JOINs (Redis doesn't support cross-index joins)
162+
- [ ] Subqueries
163+
- [ ] HAVING clause
164+
- [ ] DISTINCT
165+
- [ ] GEO field queries
166+
- [ ] Index creation from SQL (CREATE INDEX)
167+
168+
## Development
169+
170+
```bash
171+
# Install dependencies
172+
uv sync --all-extras
173+
174+
# Run tests (requires Docker for testcontainers)
175+
uv run pytest
176+
177+
# Run with coverage
178+
uv run pytest --cov=sql_redis --cov-report=html
179+
```
180+
181+
## Testing Philosophy
182+
183+
This project uses strict TDD with 100% test coverage as a hard requirement. The approach:
184+
185+
1. **Write failing tests first** — Define expected behavior before implementation
186+
2. **One test at a time** — Implement just enough to pass each test
187+
3. **No untestable code** — If we can't test it, we don't write it
188+
4. **Integration tests mirror raw Redis**`test_sql_queries.py` verifies SQL produces same results as equivalent `FT.AGGREGATE` commands in `test_redis_queries.py`
189+
190+
Coverage is enforced in CI. Pragmas (`# pragma: no cover`) are forbidden — if code can't be tested, it shouldn't exist.
191+

pyproject.toml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,16 @@ name = "sql-redis"
33
version = "0.1.0"
44
description = "SQL to Redis command translation utility"
55
requires-python = ">=3.11"
6-
dependencies = []
6+
dependencies = [
7+
"redis>=5.0.0",
8+
"sqlglot>=26.0.0",
9+
]
710

811
[project.optional-dependencies]
912
dev = [
1013
"pytest>=8.0.0",
14+
"pytest-cov>=4.0.0",
1115
"testcontainers[redis]>=4.0.0",
12-
"redis>=5.0.0",
1316
]
1417

1518
[build-system]

0 commit comments

Comments
 (0)