Description
When the WHERE clause wraps a column in a function (length(col), lower(col), coalesce(col, default), extract(field FROM col), date_trunc(unit, col)), ColumnUsage correctly captures the bare column and operator, but drops the wrapping function and any literal arguments. Downstream consumers that need to know "is this predicate col < N or length(col) < N" can't tell.
Reproduction
sql := "SELECT * FROM orders WHERE length(notes) < 100 AND extract(year FROM created_at) = 2025"
parsed, _ := postgresparser.ParseSQL(sql)
for _, u := range parsed.ColumnUsage {
fmt.Printf("col=%s op=%s usage=%v functions=%v\n",
u.Column, u.Operator, u.UsageType, u.Functions)
}
Output today:
col=notes op=< usage=Filter functions=[]
col=created_at op== usage=Filter functions=[]
Expected output
col=notes op=< usage=Filter functions=[length] args=[]
col=created_at op== usage=Filter functions=[extract] args=["year"]
The wrapping function name and any literal args (e.g. the year field from extract, the 'month' unit from date_trunc, the default value from coalesce(col, default)) should be surfaced on the ColumnUsage entry — or in a new FunctionWrapper struct attached to it.
Suggested API
Either extend the existing Functions []string slice with a structured form, or add a sibling field:
type ColumnUsage struct {
TableAlias string
Column string
Operator string
UsageType SQLUsageType
Functions []string // existing — unchanged for back-compat
Function *FunctionWrapper // NEW — populated only when wrapper is on the predicate column
}
type FunctionWrapper struct {
Name string // length, lower, upper, coalesce, extract, date_trunc, char_length, octet_length
Args []FunctionArg // literal args (extract field, date_trunc unit, coalesce default)
IsNested bool // true for lower(lower(col)) etc.
}
type FunctionArg struct {
Literal *string // e.g. "year" for extract, "month" for date_trunc, default for coalesce
}
Suggested test cases
cases := []struct {
name string
sql string
expected map[string]*FunctionWrapper // keyed by column name
}{
{
name: "bare length",
sql: "SELECT * FROM t WHERE length(notes) < 100",
expected: map[string]*FunctionWrapper{"notes": {Name: "length"}},
},
{
name: "lower with table alias",
sql: "SELECT * FROM users u WHERE lower(u.email) = 'a@b.com'",
expected: map[string]*FunctionWrapper{"email": {Name: "lower"}},
},
{
name: "coalesce with default",
sql: "SELECT * FROM t WHERE coalesce(name, '') <> ''",
expected: map[string]*FunctionWrapper{"name": {Name: "coalesce", Args: []FunctionArg{{Literal: ptr("")}}}},
},
{
name: "extract field",
sql: "SELECT * FROM t WHERE extract(year FROM created_at) = 2025",
expected: map[string]*FunctionWrapper{"created_at": {Name: "extract", Args: []FunctionArg{{Literal: ptr("year")}}}},
},
{
name: "date_trunc month",
sql: "SELECT * FROM t WHERE date_trunc('month', created_at) = '2025-03-01'",
expected: map[string]*FunctionWrapper{"created_at": {Name: "date_trunc", Args: []FunctionArg{{Literal: ptr("month")}}}},
},
{
name: "nested LOWER(LOWER(col))",
sql: "SELECT * FROM t WHERE lower(lower(name)) = 'foo'",
expected: map[string]*FunctionWrapper{"name": {Name: "lower", IsNested: true}},
},
{
name: "expression inside function — no wrapper attached",
sql: "SELECT * FROM t WHERE length(name || ' suffix') < 50",
expected: nil, // wraps an expression, not a bare column
},
{
name: "quoted column with parens",
sql: `SELECT * FROM t WHERE length("Customer (legacy)") < 50`,
expected: map[string]*FunctionWrapper{"Customer (legacy)": {Name: "length"}},
},
{
name: "non-WHERE function call must NOT be captured",
sql: "SELECT length(name) FROM t WHERE id = 1",
expected: nil,
},
{
name: "JOIN ON clause — initial scope is WHERE only",
sql: "SELECT * FROM a JOIN b ON lower(a.x) = lower(b.y) WHERE a.x = 1",
expected: nil, // capture only WHERE wrappers in v1
},
}
Use case
Tools that build mock data, generate synthetic test rows for query planners, or build EXPLAIN-aware test harnesses need this metadata. Without it they can't distinguish col < N from length(col) < N and either:
- emit invalid SQL when manufacturing predicate-satisfying values (the column type doesn't match the function output type), or
- skip the predicate entirely, lowering the fidelity of generated data.
Today consumers work around this by re-scanning the original SQL string with regex post-processors, which break on nested calls (lower(lower(col))), expressions inside functions (length(col || 'x')), and quoted column names with parens.
Which layer?
Core parser (extend ColumnUsage / new field on parse tree extraction).
Scope
- Initial scope: WHERE clauses only.
- Out of scope (file as follow-ups): JOIN ON, HAVING, ORDER BY, multi-arg functions like
greatest(a, b) or concat(a, b), user-defined functions.
Backward compatibility
The existing Functions []string field already exists on ColumnUsage and reports function names for projection columns. Adding a new structured Function *FunctionWrapper field for predicate wrappers does not break existing consumers — they keep using Functions. New consumers opt in by reading Function.
Description
When the WHERE clause wraps a column in a function (
length(col),lower(col),coalesce(col, default),extract(field FROM col),date_trunc(unit, col)),ColumnUsagecorrectly captures the bare column and operator, but drops the wrapping function and any literal arguments. Downstream consumers that need to know "is this predicatecol < Norlength(col) < N" can't tell.Reproduction
Output today:
Expected output
The wrapping function name and any literal args (e.g. the
yearfield fromextract, the'month'unit fromdate_trunc, the default value fromcoalesce(col, default)) should be surfaced on theColumnUsageentry — or in a newFunctionWrapperstruct attached to it.Suggested API
Either extend the existing
Functions []stringslice with a structured form, or add a sibling field:Suggested test cases
Use case
Tools that build mock data, generate synthetic test rows for query planners, or build EXPLAIN-aware test harnesses need this metadata. Without it they can't distinguish
col < Nfromlength(col) < Nand either:Today consumers work around this by re-scanning the original SQL string with regex post-processors, which break on nested calls (
lower(lower(col))), expressions inside functions (length(col || 'x')), and quoted column names with parens.Which layer?
Core parser (extend ColumnUsage / new field on parse tree extraction).
Scope
greatest(a, b)orconcat(a, b), user-defined functions.Backward compatibility
The existing
Functions []stringfield already exists onColumnUsageand reports function names for projection columns. Adding a new structuredFunction *FunctionWrapperfield for predicate wrappers does not break existing consumers — they keep usingFunctions. New consumers opt in by readingFunction.