A guide to understanding the structure of the Postgres abstract syntax tree (AST) produced by pg-parser.
Postgres has its own SQL parser written in C, built on a Bison grammar (gram.y) and a Flex lexer (scan.l). When Postgres parses a SQL string, it produces an internal parse tree made up of C structs - SelectStmt, RangeVar, A_Expr, etc. - all defined in the Postgres source under include/nodes/parsenodes.h and related headers.
libpg_query extracts the Postgres parser into a standalone C library. It also defines a protobuf schema (pg_query.proto) that is auto-generated from the actual Postgres C struct definitions - each C node struct becomes a corresponding protobuf message with a 1:1 field mapping. The original C field names are preserved via json_name annotations on every protobuf field.
pg-parser compiles libpg_query to WebAssembly and converts the protobuf output into JSON. The AST you work with in JavaScript/TypeScript is a direct representation of the same parse tree that Postgres uses internally. The node names (SelectStmt, A_Expr, RangeVar, etc.), the field names (targetList, fromClause, whereClause, etc.), and the overall tree shape all come straight from the Postgres source.
This means the Postgres source code and libpg_query's protobuf schema are the authoritative references for understanding any node type. When you encounter an unfamiliar node, searching the Postgres source for its struct definition will tell you exactly what each field means.
Parsing a SQL string produces a ParseResult:
interface ParseResult {
version: number; // e.g. 170004 for Postgres 17.0.4
stmts: RawStmt[];
}version encodes the major and minor Postgres version. stmts contains one RawStmt per semicolon-delimited statement in the input.
Each RawStmt wraps a statement node with source location metadata:
interface RawStmt {
stmt: Node; // The actual statement (SelectStmt, InsertStmt, etc.)
stmt_location?: number; // Byte offset of the statement start
stmt_len?: number; // Length in bytes (0 = to end of string)
}Every AST node in the JSON output is wrapped in an object with a single key: the node's type name. This is the most important structural pattern to understand.
// A SelectStmt node:
{
SelectStmt: {
targetList: [ ... ],
fromClause: [ ... ]
}
}
// A column reference:
{
ColumnRef: {
fields: [{ String: { sval: "id" } }]
}
}
// A constant:
{
A_Const: {
ival: { ival: 42 }
}
}This wrapping exists because Postgres's parse tree is polymorphic - many fields can hold different kinds of nodes. In the C source, this is handled via a NodeTag enum and void pointers. In the protobuf schema, it's a oneof inside a Node message with 260+ variants. In the JSON output, it becomes an object with a single key that tells you the node type.
This pattern appears everywhere in the tree. The stmt field in RawStmt is a wrapped node. Every element in targetList, fromClause, and name is a wrapped node. Expression fields like whereClause, lexpr, and rexpr are wrapped nodes. When you traverse the tree, you'll unwrap nodes at nearly every level.
Not every nested object is wrapped though. Fields that are typed as a specific message (rather than the generic Node) are embedded directly. For example, RangeVar.alias is typed as Alias (not Node), so it appears as alias: { aliasname: "u" } - no wrapping. The wrapping only happens for fields typed as Node in the protobuf schema, which is how Postgres represents polymorphic tree positions.
You can unwrap manually:
const stmt = rawStmt.stmt;
if ('SelectStmt' in stmt) {
const selectStmt = stmt.SelectStmt;
// work with selectStmt
}Or use the unwrapNode() utility for type-safe unwrapping:
import { unwrapNode } from '@supabase/pg-parser';
const { type, node } = unwrapNode(rawStmt.stmt);
switch (type) {
case 'SelectStmt':
// `node` is narrowed to SelectStmt
console.log(node.targetList);
break;
case 'InsertStmt':
// `node` is narrowed to InsertStmt
console.log(node.relation);
break;
}The ~260 node types in Postgres's parse tree fall into several categories. You don't need to memorize them, but knowing the categories helps you reason about what kind of node to expect in a given position.
These represent top-level SQL commands. They appear as the stmt inside a RawStmt.
| Node | SQL |
|---|---|
SelectStmt |
SELECT |
InsertStmt |
INSERT |
UpdateStmt |
UPDATE |
DeleteStmt |
DELETE |
CreateStmt |
CREATE TABLE |
IndexStmt |
CREATE INDEX |
ViewStmt |
CREATE VIEW |
AlterTableStmt |
ALTER TABLE |
DropStmt |
DROP |
TransactionStmt |
BEGIN, COMMIT, ROLLBACK |
CreateSchemaStmt |
CREATE SCHEMA |
GrantStmt |
GRANT, REVOKE |
ExplainStmt |
EXPLAIN |
There are about 70+ statement node types covering every SQL command that Postgres supports.
Note: some names don't match the SQL keyword exactly. CREATE TABLE produces a CreateStmt (not CreateTableStmt). CREATE INDEX produces an IndexStmt. These names come directly from the Postgres C source.
These represent values, operators, and computations. They appear wherever a value is expected - whereClause, targetList values, HAVING, function arguments, etc.
| Node | Represents | Example |
|---|---|---|
A_Expr |
Operators and comparisons | a + b, x = 1, y BETWEEN 1 AND 10 |
A_Const |
Literal constants | 42, 'hello', 3.14 |
ColumnRef |
Column references | id, users.name, * |
FuncCall |
Function calls | count(*), now(), upper(name) |
BoolExpr |
Boolean logic | x AND y, NOT z |
SubLink |
Subqueries in expressions | WHERE id IN (SELECT ...) |
CaseExpr |
CASE expressions | CASE WHEN ... THEN ... END |
NullTest |
NULL checks | IS NULL, IS NOT NULL |
CoalesceExpr |
COALESCE | COALESCE(a, b, c) |
TypeCast |
Type casts | '2024-01-01'::date |
These represent the building blocks that statements are composed of. They typically don't appear at the top level - they appear as children of statement or expression nodes.
| Node | Represents | Found in |
|---|---|---|
ResTarget |
A SELECT target item | SelectStmt.targetList |
RangeVar |
A table reference | fromClause, InsertStmt.relation |
JoinExpr |
A JOIN clause | fromClause |
SortBy |
An ORDER BY item | SelectStmt.sortClause |
Alias |
An alias (AS name) | RangeVar.alias, RangeSubselect.alias |
WithClause |
A WITH (CTE) wrapper | SelectStmt.withClause |
OnConflictClause |
ON CONFLICT (upsert) | InsertStmt.onConflictClause |
WindowDef |
A WINDOW definition | SelectStmt.windowClause |
TypeName |
A type specification | TypeCast.typeName, column definitions |
Constraint |
A column/table constraint | CREATE TABLE column definitions |
ColumnDef |
A column definition | CREATE TABLE |
RangeSubselect |
A subquery in FROM | fromClause |
These wrap simple scalar values. They appear inside other nodes wherever a bare value is needed (operator names, identifier parts, etc.):
| Node | Contains |
|---|---|
String |
{ sval: "..." } |
Integer |
{ ival: 42 } |
Float |
{ fval: "3.14" } (note: string, not number) |
Boolean |
{ boolval: true } |
A_Star |
Represents * (no fields) |
Note: Float.fval is a string, not a JavaScript number. This preserves the exact precision from the SQL source.
Anywhere Postgres uses a List * in C, the JSON output uses an array of wrapped nodes. For example, SelectStmt.targetList is an array of wrapped ResTarget nodes:
{
SelectStmt: {
targetList: [
{ ResTarget: { name: "id", val: { ColumnRef: { ... } } } },
{ ResTarget: { name: "name", val: { ColumnRef: { ... } } } }
]
}
}Similarly, A_Expr.name is an array of wrapped String nodes representing the operator name:
{
A_Expr: {
kind: "AEXPR_OP",
name: [{ String: { sval: "+" } }],
lexpr: { ... },
rexpr: { ... }
}
}Postgres enums are represented as string values in the JSON. You'll run into them in a lot of places - expression types, join kinds, sort ordering, set operations, etc.
// A_Expr.kind - determines what kind of expression this is
"AEXPR_OP" // a + b, a = b
"AEXPR_OP_ANY" // a = ANY(b)
"AEXPR_IN" // a IN (1, 2, 3)
"AEXPR_LIKE" // a LIKE 'pattern'
"AEXPR_BETWEEN" // a BETWEEN x AND y
// BoolExpr.boolop
"AND_EXPR"
"OR_EXPR"
"NOT_EXPR"
// JoinExpr.jointype
"JOIN_INNER"
"JOIN_LEFT"
"JOIN_FULL"
"JOIN_RIGHT"
// SetOperation (SelectStmt.op) - for UNION/INTERSECT/EXCEPT queries
"SETOP_NONE"
"SETOP_UNION"
"SETOP_INTERSECT"
"SETOP_EXCEPT"
// SortByDir (SortBy.sortby_dir) - ORDER BY direction
"SORTBY_DEFAULT"
"SORTBY_ASC"
"SORTBY_DESC"
// SubLinkType (SubLink.subLinkType) - subquery context
"EXISTS_SUBLINK" // EXISTS (SELECT ...)
"ANY_SUBLINK" // x = ANY (SELECT ...)
"ALL_SUBLINK" // x > ALL (SELECT ...)
"EXPR_SUBLINK" // (SELECT ...)
// ObjectType (DropStmt.removeType, etc.) - what kind of object
"OBJECT_TABLE"
"OBJECT_INDEX"
"OBJECT_SEQUENCE"
"OBJECT_VIEW"
"OBJECT_FUNCTION"
"OBJECT_SCHEMA"Message fields (nested objects) and repeated fields (arrays) are omitted when not present. So a SelectStmt with no WHERE clause won't have a whereClause field, and an unused distinctClause is simply absent.
Scalar fields (booleans, numbers, strings, enums) always appear in the output, even when set to their default value. For example, you'll see groupDistinct: false, op: "SETOP_NONE", and name: "" in the output even though these are the defaults.
Always access message fields with optional chaining or check for undefined.
Most nodes include a location field - a byte offset into the original SQL string pointing to where that construct started. Useful for error reporting, source maps, or syntax highlighting. These are zero-based. When location is -1 or absent, the node doesn't have a meaningful source position (e.g. it was synthesized during parse tree construction).
A_Const uses a nested object to distinguish between value types:
// Integer constant: 42
{ A_Const: { ival: { ival: 42 } } }
// String constant: 'hello'
{ A_Const: { sval: { sval: "hello" } } }
// Float constant: 3.14
{ A_Const: { fval: { fval: "3.14" } } }
// NULL
{ A_Const: { isnull: true } }The double nesting (ival: { ival: 42 }) comes from the protobuf oneof - the outer key selects which variant is active, the inner object contains the actual value. This is a consequence of the wrapping pattern applied consistently.
Multi-part names like schema.table or schema.function are represented as arrays of String nodes:
// Unqualified: users
{ RangeVar: { relname: "users" } }
// Schema-qualified: public.users
{ RangeVar: { schemaname: "public", relname: "users" } }
// Function name: pg_catalog.now
{ FuncCall: { funcname: [{ String: { sval: "pg_catalog" } }, { String: { sval: "now" } }] } }RangeVar is special - it has dedicated schemaname and relname fields. Most other places (function names, operator names, type names) use a list of String nodes.
ColumnRef uses a fields array that can contain String nodes for names and A_Star for *:
// SELECT id
{ ColumnRef: { fields: [{ String: { sval: "id" } }] } }
// SELECT users.id
{ ColumnRef: { fields: [{ String: { sval: "users" } }, { String: { sval: "id" } }] } }
// SELECT *
{ ColumnRef: { fields: [{ A_Star: {} }] } }
// SELECT users.*
{ ColumnRef: { fields: [{ String: { sval: "users" } }, { A_Star: {} }] } }Here's what SELECT u.id, count(*) FROM users u WHERE u.active = true GROUP BY u.id looks like (some fields like location and scalar defaults omitted for clarity):
{
version: 170004,
stmts: [{
stmt: {
SelectStmt: { // Statement type
targetList: [
{
ResTarget: { // First SELECT item
val: {
ColumnRef: { // u.id
fields: [
{ String: { sval: "u" } },
{ String: { sval: "id" } }
]
}
}
}
},
{
ResTarget: { // Second SELECT item
val: {
FuncCall: { // count(*)
funcname: [{ String: { sval: "count" } }],
agg_star: true,
funcformat: "COERCE_EXPLICIT_CALL"
}
}
}
}
],
fromClause: [
{
RangeVar: { // FROM users u
relname: "users",
inh: true,
relpersistence: "p",
alias: { aliasname: "u" }
}
}
],
whereClause: {
A_Expr: { // WHERE u.active = true
kind: "AEXPR_OP",
name: [{ String: { sval: "=" } }],
lexpr: {
ColumnRef: {
fields: [
{ String: { sval: "u" } },
{ String: { sval: "active" } }
]
}
},
rexpr: {
A_Const: { boolval: { boolval: true } }
}
}
},
groupClause: [
{
ColumnRef: { // GROUP BY u.id
fields: [
{ String: { sval: "u" } },
{ String: { sval: "id" } }
]
}
}
],
limitOption: "LIMIT_OPTION_DEFAULT",
op: "SETOP_NONE"
}
}
}]
}pg-parser ships generated TypeScript types for every node in the AST. These are produced using pg-proto-parser, which reads libpg_query's protobuf schema and generates type-safe interfaces for each AST node. The types are version-specific - each supported Postgres version (15, 16, 17) has its own set of type definitions, since node fields can change between Postgres releases.
When you construct a PgParser with a known version, all return types are automatically narrowed:
const parser = new PgParser({ version: 17 });
const tree = await unwrapParseResult(parser.parse(sql));
// tree is ParseResult<17>The Node type is a union of all possible wrapped node types:
type Node =
| { SelectStmt: SelectStmt }
| { InsertStmt: InsertStmt }
| { A_Expr: A_Expr }
| { ColumnRef: ColumnRef }
// ... ~260 more variantsAll node fields are optional in the types (reflecting protobuf 3 semantics where any field can be absent). Use optional chaining or the assertDefined() utility when traversing.
For the full list of node types and their fields, see the generated type definitions or refer to the protobuf schema.