@@ -24,25 +24,40 @@ PyDeequ 2.0 introduces a new multi-engine architecture with **DuckDB** and **Spa
2424### Architecture
2525
2626``` mermaid
27- flowchart LR
27+ flowchart TB
2828 subgraph CLIENT["Python Client"]
29- A["Python Code "] --> B["Protobuf<br/>Serialization "]
29+ A["pydeequ.connect() "] --> B["Engine Auto-Detection "]
3030 end
31- B -- gRPC --> C["Spark Connect (gRPC)"]
32- subgraph SERVER["Spark Connect Server"]
33- D["DeequRelationPlugin"] --> E["Deequ Core"] --> F["Spark DataFrame API"] --> G["(Data)"]
31+
32+ B --> C{Connection Type}
33+
34+ C -->|DuckDB| D["DuckDBEngine"]
35+ C -->|SparkSession| E["SparkEngine"]
36+
37+ subgraph DUCKDB["DuckDB Backend (Local)"]
38+ D --> F["SQL Operators"] --> G["DuckDB"] --> H["Local Files<br/>Parquet/CSV"]
3439 end
35- G --> H["Results"] -- gRPC --> I["Python DataFrame"]
36- %% Styling for compactness and distinction
37- classDef code fill:#C8F2FB,stroke:#35a7c2,color:#13505B,font-weight:bold;
38- class A code;
40+
41+ subgraph SPARK["Spark Connect Backend (Distributed)"]
42+ E --> I["Protobuf"] -- gRPC --> J["Spark Connect Server"]
43+ J --> K["DeequRelationPlugin"] --> L["Deequ Core"] --> M["Data Lake"]
44+ end
45+
46+ H --> N["Results"]
47+ M --> N
48+ N --> O["MetricResult / ConstraintResult / ColumnProfile"]
49+
50+ classDef duckdb fill:#FFF4CC,stroke:#E6B800,color:#806600;
51+ classDef spark fill:#CCE5FF,stroke:#0066CC,color:#003366;
52+ class D,F,G,H duckdb;
53+ class E,I,J,K,L,M spark;
3954```
4055
4156** How it works:**
42- 1 . ** Client Side ** : PyDeequ 2.0 builds checks and analyzers as Protobuf messages
43- 2 . ** Transport ** : Messages are sent via gRPC to the Spark Connect server
44- 3 . ** Server Side ** : The ` DeequRelationPlugin ` deserializes messages and executes Deequ operations
45- 4 . ** Results ** : Verification results are returned as a Spark DataFrame
57+ - ** Auto-detection ** : ` pydeequ.connect() ` inspects the connection type and creates the appropriate engine
58+ - ** DuckDB path ** : Direct SQL execution in-process, no JVM required
59+ - ** Spark path ** : Protobuf serialization over gRPC to Spark Connect server with Deequ plugin
60+ - ** Unified results ** : Both engines return the same ` MetricResult ` , ` ConstraintResult ` , and ` ColumnProfile ` types
4661
4762### Feature Support Matrix
4863
0 commit comments