Skip to content

Commit 1308aa2

Browse files
committed
docs: sweep for staleness — mark shipped features, fix version targets
Audit found docs claiming Query/MSSQL/Redact/Graph were planned and integration deep-dives targeting versions one minor behind the renumbered roadmap. Changes: - COMPETITIVE_ANALYSIS.md: header v1.9.0 -> v1.13.5, add MSSQL as 4th dialect, mark Query (v1.12.0), Graph/Order (v1.11.0), Redact (v1.10.0) shipped, add Enum/Migrate/DBML rows for v1.14-v1.16, fix the dialect count in conversion table (3 -> 4) and the Query/MSSQL rows in the comparison matrix. - INTEGRATION_OPPORTUNITIES.md: rewrite the DuckDB section to reflect the actually-shipped query engine (was speculative pseudocode), drop the dual wrapper-vs-library implementation analysis (decision was made and shipped), renumber Recommended Integration Roadmap section (v1.16->v1.17 Parquet, +Atlas v1.19, dbt v1.20, GX v1.18), update graph command status to "Implemented in v1.11.0". - ATLAS_INTEGRATION_DEEP_DIVE.md: v1.18 -> v1.19 (3 references). - DBT_INTEGRATION_DEEP_DIVE.md: v1.19 -> v1.20 (3 references). - GREAT_EXPECTATIONS_INTEGRATION_DEEP_DIVE.md: v1.17 -> v1.18 (4 refs). - ENUM_CONVERSION.md: v1.13.0 -> v1.14.0. - ADDITIONAL_IDEAS.md: drop Validate from "future ideas" (shipped v1.8.0), update Detect-PII status (partially covered by v1.10.0 redact --generate-config), refresh shared infrastructure table.
1 parent 1ccd6f4 commit 1308aa2

7 files changed

Lines changed: 106 additions & 191 deletions

docs/COMPETITIVE_ANALYSIS.md

Lines changed: 28 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,21 @@
11
# Competitive Analysis
22

3-
**Last Updated**: 2025-12-26
3+
**Last Updated**: 2026-05-07
44
**Purpose**: Comprehensive competitive landscape and feature opportunity analysis
55

66
## Executive Summary
77

8-
sql-splitter occupies a **unique position** in the SQL dump processing ecosystem by combining multiple capabilities that currently require separate tools. As of v1.9.0, we offer: **split + merge + analyze + validate + sample (FK-preserving) + shard + convert + diff + redact**.
8+
sql-splitter occupies a **unique position** in the SQL dump processing ecosystem by combining multiple capabilities that currently require separate tools. As of v1.13.5, we offer: **split + merge + analyze + validate + sample (FK-preserving) + shard + convert + diff + redact + graph + order + query (DuckDB)**.
99

1010
No existing tool offers this combination in a single, streaming, CLI-first, multi-dialect binary.
1111

1212
**Key differentiators:**
1313

1414
- Works on dump files directly (no database connection required)
1515
- Streaming architecture handles 10GB+ dumps
16-
- Multi-dialect support (MySQL, PostgreSQL, SQLite)
16+
- Multi-dialect support (MySQL, PostgreSQL, SQLite, MSSQL)
1717
- 600+ MB/s throughput
18+
- Embedded DuckDB for SQL analytics on dumps without import
1819

1920
---
2021

@@ -35,9 +36,14 @@ No existing tool offers this combination in a single, streaming, CLI-first, mult
3536
| Dialect conversion | ✅ Implemented | v1.7.0 |
3637
| Validate (integrity checks) | ✅ Implemented | v1.8.0 |
3738
| Diff dumps | ✅ Implemented | v1.9.0 |
38-
| Redaction/anonymization | ✅ Implemented | v1.9.0 |
39-
| Query/Filter (WHERE-style) | 🟡 Planned ||
40-
| MSSQL support | 🟡 Planned ||
39+
| Redaction/anonymization | ✅ Implemented | v1.10.0 |
40+
| Graph (ERD generation) | ✅ Implemented | v1.11.0 |
41+
| Order (topological FK ordering) | ✅ Implemented | v1.11.0 |
42+
| Query (DuckDB SQL analytics) | ✅ Implemented | v1.12.0 |
43+
| MSSQL support | ✅ Implemented | v1.12.x |
44+
| Enum type conversion (PG↔MySQL) | 🟡 Planned | v1.14.0 |
45+
| Migrate (schema migration generation) | 🟡 Planned | v1.15.0 |
46+
| DBML import/export | 🟡 Planned | v1.16.0 |
4147

4248
---
4349

@@ -110,7 +116,7 @@ No existing tool offers this combination in a single, streaming, CLI-first, mult
110116

111117
| Tool | Language | Stars | MySQL | PostgreSQL | SQLite | Streaming | Notes |
112118
| ----------------------- | -------- | ----- | ----- | ---------- | ------ | --------- | ----------------------------- |
113-
| **sql-splitter** | Rust |||||| v1.9.0 |
119+
| **sql-splitter** | Rust |||||| v1.10.0, ~230 MB/s |
114120
| **nxs-data-anonymizer** | Go | 271 ||||| Go templates + Sprig |
115121
| **pynonymizer** | Python | 109 ||||| Faker integration, GDPR focus |
116122
| **myanon** | C | ~30 ||||| stdin/stdout streaming |
@@ -130,7 +136,7 @@ No existing tool offers this combination in a single, streaming, CLI-first, mult
130136

131137
| Tool | Language | Stars | Dialects | COPY↔INSERT | Streaming |
132138
| ------------------ | ----------- | ----- | --------- | ----------- | --------- |
133-
| **sql-splitter** | Rust || 3 (✅) |||
139+
| **sql-splitter** | Rust || 4 (✅) |||
134140
| **sqlglot** | Python | 7k+ | 31 |||
135141
| **pgloader** | Common Lisp | 5k+ | → PG only |||
136142
| **mysql2postgres** | Ruby | 300 | MySQL→PG | Partial ||
@@ -155,27 +161,27 @@ No existing tool offers this combination in a single, streaming, CLI-first, mult
155161

156162
### Query/Filter Dumps
157163

158-
| Tool | Language | Stars | Notes |
159-
| ---------------- | -------- | ----- | ----------------------------------- |
160-
| **sql-splitter** | Rust || 🟡 Planned: WHERE-style filtering |
161-
| **DuckDB** | C++ | 34.8k | Query SQL/CSV/JSON/Parquet directly |
162-
| **sqlglot** | Python | 7k+ | Parse/transpile, not filter |
164+
| Tool | Language | Stars | Notes |
165+
| ---------------- | -------- | ----- | -------------------------------------------- |
166+
| **sql-splitter** | Rust || ✅ Embedded DuckDB (v1.12.0), full SQL |
167+
| **DuckDB** | C++ | 34.8k | Query SQL/CSV/JSON/Parquet directly |
168+
| **sqlglot** | Python | 7k+ | Parse/transpile, not filter |
163169

164-
**[DuckDB](https://github.com/duckdb/duckdb)** could solve querying but is overkill for simple dump filtering.
170+
sql-splitter embeds DuckDB to give full SQL analytics on dumps without an import step (in-memory or disk-backed for >2GB dumps), with persistent caching that delivers a 400× speedup on repeat queries.
165171

166172
---
167173

168174
### MSSQL Support
169175

170176
| Tool | MSSQL |
171177
| ---------------- | ----------------- |
172-
| **sql-splitter** | 🟡 Planned |
178+
| **sql-splitter** | ✅ (v1.12.x) |
173179
| Jailer | ✅ (via JDBC) |
174180
| pynonymizer ||
175181
| sqlglot | ✅ (parsing only) |
176182
| pgloader ||
177183

178-
**Gap**: Major gap in ecosystem for MSSQL dump processing CLI tools.
184+
sql-splitter is now the only **streaming, file-based, multi-dialect** CLI with SQL Server support — Jailer/pynonymizer require live DB connections.
179185

180186
---
181187

@@ -247,13 +253,13 @@ No existing tool offers this combination in a single, streaming, CLI-first, mult
247253
| Sample + FK |||||||||
248254
| Tenant sharding |||| Limited | Limited ||| Via SQL |
249255
| Redaction || Basic |||||||
250-
| Query/Filter | 🟡 ||| Limited |||||
256+
| Query/Filter | ||| Limited |||||
251257
| Diff |||| Limited |||| Via SQL |
252258
| Convert dialects ||| → PG | Limited |||||
253259
| MySQL |||||||||
254260
| PostgreSQL |||||||||
255261
| SQLite |||||||||
256-
| MSSQL | 🟡 ||||||||
262+
| MSSQL | ||||||||
257263
| Streaming |||||||||
258264
| CLI-first |||||||||
259265
| Works on dumps |||||||||
@@ -263,15 +269,16 @@ No existing tool offers this combination in a single, streaming, CLI-first, mult
263269

264270
## Unique Value Proposition
265271

266-
1. **Unified tool** — Split + merge + sample + shard + convert + diff + redact in one binary
272+
1. **Unified tool** — Split + merge + sample + shard + convert + diff + redact + graph + order + query in one binary
267273
2. **Works on dump files** — No database connection required (unlike Jailer, Condenser, mydumper)
268274
3. **Streaming architecture** — Handle 10GB+ dumps without memory issues
269275
4. **CLI-first** — DevOps/automation friendly, pipe-compatible
270-
5. **Multi-dialect** — MySQL, PostgreSQL, SQLite in one tool
276+
5. **Multi-dialect** — MySQL, PostgreSQL, SQLite, MSSQL in one tool
271277
6. **FK-aware operations** — Sample and shard preserve referential integrity
272278
7. **Rust performance** — 600+ MB/s, faster than Python/Java alternatives
273279
8. **Compression support** — gzip, bz2, xz, zstd auto-detected
274280
9. **Composable** — Split → Sample → Redact → Convert → Merge pipeline
281+
10. **Embedded analytics** — DuckDB-powered SQL queries on dumps without import (v1.12.0)
275282

276283
---
277284

@@ -424,7 +431,7 @@ sql-splitter test dump.sql --config schema-tests.yaml
424431

425432
### Priorities
426433

427-
1. **Complete v2.0**Current roadmap features
434+
1. **Complete v1.14–v1.16**Enum, Migrate, DBML (planned core features)
428435
2. **Quick wins** — Schema drift (16h), size optimization (12h), cost estimation (8h)
429436
3. **Differentiation** — Data quality profiling, compliance checks
430437
4. **Future** — AI integration for schema suggestions, natural language queries

docs/INTEGRATION_OPPORTUNITIES.md

Lines changed: 47 additions & 124 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Integration Opportunities & Tool Synergies
22

3-
**Date**: 2025-12-24
3+
**Date**: 2025-12-24 (Updated 2026-05-07: DuckDB query engine shipped in v1.12.0)
44
**Purpose**: Identify strategic integrations to extend sql-splitter capabilities
55

66
## Philosophy: Build vs Integrate vs Wrap
@@ -13,7 +13,7 @@
1313

1414
## 🔥 Tier 1: High-Impact Integrations
1515

16-
### 1. DuckDB Integration ⭐⭐⭐⭐⭐
16+
### 1. DuckDB Integration ⭐⭐⭐⭐⭐ — ✅ Query Engine SHIPPED (v1.12.0)
1717

1818
**What is DuckDB?**
1919

@@ -24,47 +24,30 @@
2424

2525
**Synergy:** sql-splitter prepares data → DuckDB queries it
2626

27-
#### Integration Strategy A: Query Engine
27+
#### Integration Strategy A: Query Engine — ✅ SHIPPED v1.12.0
2828

29-
```bash
30-
# Load dump into DuckDB, run analytics
31-
sql-splitter query dump.sql --engine duckdb \
32-
--sql "SELECT user_id, COUNT(*) FROM orders GROUP BY user_id LIMIT 10"
33-
34-
# Behind the scenes:
35-
# 1. sql-splitter imports dump.sql into temp DuckDB file
36-
# 2. DuckDB executes query
37-
# 3. Output results
38-
```
39-
40-
**Implementation:**
41-
42-
```rust
43-
pub fn query_with_duckdb(dump: &Path, sql: &str) -> Result<Vec<Row>> {
44-
// Create temp DuckDB database
45-
let temp_db = tempfile::NamedTempFile::new()?;
46-
let conn = Connection::open(temp_db.path())?;
29+
The query engine shipped in v1.12.0. Actual usage:
4730

48-
// Import dump (convert INSERT → CREATE + INSERT for DuckDB)
49-
import_dump_to_duckdb(&conn, dump)?;
31+
```bash
32+
# Single query
33+
sql-splitter query dump.sql "SELECT user_id, COUNT(*) FROM orders GROUP BY user_id LIMIT 10"
5034

51-
// Execute query
52-
let mut stmt = conn.prepare(sql)?;
53-
let rows = stmt.query_map([], |row| {
54-
// Map row to our Row type
55-
})?;
35+
# Interactive REPL
36+
sql-splitter query dump.sql --interactive
5637

57-
Ok(rows)
58-
}
38+
# Export results
39+
sql-splitter query dump.sql "SELECT * FROM orders" -f json -o results.json
5940
```
6041

61-
**Benefits:**
42+
Implementation lives in `src/cmd/query.rs` and `src/duckdb/`. Features delivered:
6243

63-
- ✅ Full SQL analytics without database setup
64-
- ✅ Aggregations, JOINs, window functions
65-
- ✅ 100x faster than naive row filtering
44+
- In-memory and disk-backed modes (>2GB dumps)
45+
- Multi-dialect import (MySQL, PostgreSQL, SQLite, MSSQL)
46+
- 5 output formats (table, json, jsonl, csv, tsv)
47+
- Persistent SHA256-keyed cache (400× speedup on repeat queries)
48+
- `--tables` filter, `--memory-limit` config
6649

67-
**Effort:** ~16h (wrap DuckDB, import conversion)
50+
For the full design rationale and remaining Parquet export work, see [DUCKDB_INTEGRATION_DEEP_DIVE.md](features/DUCKDB_INTEGRATION_DEEP_DIVE.md).
6851

6952
---
7053

@@ -321,7 +304,7 @@ sql-splitter docs dump.sql -o docs/
321304

322305
### 6. Graphviz/Mermaid (Already Planned) ✅
323306

324-
**Status:** Already in roadmap for graph command
307+
**Status:** ✅ Implemented in v1.11.0 (graph command — HTML, DOT, Mermaid, JSON output)
325308

326309
**Additional integration:** Live preview
327310

@@ -679,71 +662,9 @@ terraform apply # Applies changes
679662

680663
## Integration Architecture
681664

682-
### Wrapper Pattern (Low Effort, High Value)
683-
684-
```rust
685-
// Simple wrapper around DuckDB CLI
686-
pub fn query_with_duckdb(dump: &Path, sql: &str) -> Result<String> {
687-
// Convert dump to DuckDB-compatible format
688-
let temp_dir = tempdir()?;
689-
convert_for_duckdb(dump, &temp_dir)?;
690-
691-
// Shell out to DuckDB
692-
let output = Command::new("duckdb")
693-
.arg(temp_dir.path().join("db.duckdb"))
694-
.arg("-c")
695-
.arg(sql)
696-
.output()?;
697-
698-
Ok(String::from_utf8(output.stdout)?)
699-
}
700-
```
701-
702-
**Pros:**
703-
704-
- ✅ Quick to implement
705-
- ✅ Leverage existing tools
706-
- ✅ No reimplementation
707-
708-
**Cons:**
709-
710-
- ❌ External dependency required
711-
- ❌ Less control over behavior
712-
713-
---
714-
715-
### Library Integration (Medium Effort, More Control)
716-
717-
```rust
718-
// Use DuckDB as library (via FFI or Rust bindings)
719-
use duckdb::{Connection, params};
720-
721-
pub fn query_with_duckdb_lib(dump: &Path, sql: &str) -> Result<Vec<Row>> {
722-
let conn = Connection::open_in_memory()?;
723-
724-
// Import dump directly into DuckDB
725-
import_dump(&conn, dump)?;
726-
727-
// Query
728-
let mut stmt = conn.prepare(sql)?;
729-
let rows = stmt.query_map(params![], |row| {
730-
// ...
731-
})?;
732-
733-
Ok(rows.collect()?)
734-
}
735-
```
736-
737-
**Pros:**
738-
739-
- ✅ No external binary required
740-
- ✅ Better error handling
741-
- ✅ Embedded in sql-splitter binary
742-
743-
**Cons:**
744-
745-
- ❌ More implementation work
746-
- ❌ Need to keep bindings updated
665+
> **Historical context:** Two patterns were considered for the DuckDB integration —
666+
> a CLI wrapper (shell out to `duckdb` binary) and a library integration (Rust FFI bindings).
667+
> The library path was chosen and shipped in v1.12.0 (`src/duckdb/`).
747668

748669
---
749670

@@ -777,33 +698,38 @@ pub async fn deploy_to_supabase(
777698

778699
## Recommended Integration Roadmap
779700

780-
### v1.16 — Query & Analytics
701+
> Note: this section was renumbered on 2026-05-07 to match the updated master roadmap.
702+
> The DuckDB query engine shipped in v1.12.0; v1.13.x was used for maintenance releases;
703+
> core features Enum/Migrate/DBML occupy v1.14–v1.16. Integrations follow at v1.17+.
781704

782-
- **DuckDB integration** (16h) — Query engine for dumps
783-
- **Parquet export** (12h) — Bridge to modern data stack
705+
### ✅ v1.12.0 — DuckDB Query Engine (SHIPPED)
784706

785-
### v1.17 — Schema Management
707+
- DuckDB integration as embedded library (16h, completed)
708+
- See [DUCKDB_INTEGRATION_DEEP_DIVE.md](features/DUCKDB_INTEGRATION_DEEP_DIVE.md)
786709

787-
- **Atlas HCL export** (20h) — Schema-as-code
788-
- **Liquibase changelog generation** (24h) — Migration tool integration
710+
### v1.17 — Parquet Export
711+
712+
- **Parquet export** (12h) — Bridge to modern data stack, extends DuckDB query engine
789713

790714
### v1.18 — Data Quality
791715

792716
- **Great Expectations integration** (16h) — Bootstrap testing
793717

794-
### v1.19 — Documentation
718+
### v1.19 — Schema Management
795719

796-
- **Self-contained schema browser** (32h) — Interactive docs
797-
- **tbls format export** (20h) — Compatibility
720+
- **Atlas HCL export** (20h) — Schema-as-code
721+
- **Liquibase changelog generation** (24h) — Migration tool integration
798722

799-
### v2.2Platform Integrations
723+
### v1.20dbt Integration
800724

801725
- **dbt project generation** (28h) — Data transformation
802-
- **GitHub Action** (12h) — CI/CD
803-
- **Airbyte connector** (24h) — ELT pipelines
804726

805-
### v2.3Cloud Deployment
727+
### Future (v2.x)Documentation & Cloud
806728

729+
- **Self-contained schema browser** (32h) — Interactive docs
730+
- **tbls format export** (20h) — Compatibility
731+
- **GitHub Action** (12h) — CI/CD
732+
- **Airbyte connector** (24h) — ELT pipelines
807733
- **Supabase deployment** (20h) — Instant database provisioning
808734
- **Terraform provider** (32h) — IaC integration
809735

@@ -813,12 +739,9 @@ pub async fn deploy_to_supabase(
813739

814740
**Under 20h effort, huge value:**
815741

816-
1. **DuckDB query engine** (16h)
817-
- Instant SQL analytics on dumps
818-
- No database setup required
819-
820-
2. **Parquet export** (12h)
742+
1. **Parquet export** (12h, planned v1.17.0)
821743
- Bridge SQL → data lakes
744+
- Extends already-shipped DuckDB query engine
822745
- Pandas/Spark/DuckDB compatible
823746

824747
3. **GitHub Action** (12h)
@@ -877,10 +800,10 @@ sql-splitter query dump.sql "SELECT COUNT(*) FROM users"
877800

878801
**Top 5 integrations for maximum impact:**
879802

880-
1. **DuckDB** — Query analytics on dumps (game changer)
881-
2. **Atlas/Liquibase** — Schema management workflows
882-
3. **dbt** — Bootstrap data transformation projects
883-
4. **Great Expectations** — Data quality testing
884-
5. **GitHub Actions** — CI/CD automation
803+
1. **DuckDB** — Query analytics on dumps (shipped v1.12.0; Parquet export remaining at v1.17.0)
804+
2. **Atlas/Liquibase** — Schema management workflows (planned v1.19.0)
805+
3. **dbt** — Bootstrap data transformation projects (planned v1.20.0)
806+
4. **Great Expectations** — Data quality testing (planned v1.18.0)
807+
5. **GitHub Actions** — CI/CD automation (future)
885808

886809
These integrations position sql-splitter as the **Swiss Army knife that plays well with others** rather than trying to replace every tool in the ecosystem.

0 commit comments

Comments
 (0)