Skip to content

Latest commit

 

History

History
136 lines (115 loc) · 9.8 KB

File metadata and controls

136 lines (115 loc) · 9.8 KB

Query Compilation 101

Prerequisite videos

Current Compiler Pipeline

%%{init: {"flowchart": {"defaultRenderer": "elk"} } }%%
flowchart LR
    SQL@{ shape: doc, label="SQL" } --> IR
    IR --> dataflow@{ shape: docs }

    subgraph IR["intermediate languages"]
      direction LR
      HIR@{ shape: doc } --> MIR@{ shape: docs } --> LIR@{ shape: docs }
      MIR -. optimizations .-> MIR
    end

    classDef purple fill:#472F85
    class SQL,HIR,MIR,LIR,dataflow purple
Loading

Representations:

  • SQL — source language
  • AST — a parsed version of a SQL query.
  • HIR — high-level intermediate representation.
  • MIR — mid-level intermediate representation.
  • LIR — low-level intermediate representation.
  • TDO — target language (timely & differential operators).

Transformations in the compile-time lifecycle of a dataflow.

For a one-off query, we run all the transformations until the MIR stage. Then we determine whether we need to serve the query on the "slow path", that is, creating a temporary dataflow and then deleting it. If we don't need to serve the query on the "slow path", then we can skip the MIR ⇒ LIR and the LIR ⇒ TDO steps. Existing "fast paths" include:

Currently, the optimization team is mostly concerned with the HIR ⇒ MIR and MIR ⇒ MIR stages.

Testing

Integration tests

  • Sqllogictest
    • Philip’s RQG tests will be in this format.
      • Add Philip to any PR where query plans may change.
    • A PR can be merged if it passes Fast SLT.
    • A PR does not need to pass Full SLT tests (test/sqllogictest/sqlite) to be merged.
      • Full SLT tests take 2-3 hours.
      • You can manually initiate full SLT tests on your branch here.
  • Testdrive

Unit tests

Performance tests

Tooling

  • mzt — can be used to create repositories of plans and write up a markdown that explains something based on those plans (see Alexander’s mzt-repos for example).