Skip to content

ColumnUsage drops function wrappers (length/lower/coalesce/extract/date_trunc) when a column is wrapped in a WHERE predicate #51

@eitamring

Description

@eitamring

Description

When the WHERE clause wraps a column in a function (length(col), lower(col), coalesce(col, default), extract(field FROM col), date_trunc(unit, col)), ColumnUsage correctly captures the bare column and operator, but drops the wrapping function and any literal arguments. Downstream consumers that need to know "is this predicate col < N or length(col) < N" can't tell.

Reproduction

sql := "SELECT * FROM orders WHERE length(notes) < 100 AND extract(year FROM created_at) = 2025"
parsed, _ := postgresparser.ParseSQL(sql)

for _, u := range parsed.ColumnUsage {
    fmt.Printf("col=%s op=%s usage=%v functions=%v\n",
        u.Column, u.Operator, u.UsageType, u.Functions)
}

Output today:

col=notes      op=<  usage=Filter  functions=[]
col=created_at op==  usage=Filter  functions=[]

Expected output

col=notes      op=<  usage=Filter  functions=[length]            args=[]
col=created_at op==  usage=Filter  functions=[extract]           args=["year"]

The wrapping function name and any literal args (e.g. the year field from extract, the 'month' unit from date_trunc, the default value from coalesce(col, default)) should be surfaced on the ColumnUsage entry — or in a new FunctionWrapper struct attached to it.

Suggested API

Either extend the existing Functions []string slice with a structured form, or add a sibling field:

type ColumnUsage struct {
    TableAlias string
    Column     string
    Operator   string
    UsageType  SQLUsageType
    Functions  []string         // existing — unchanged for back-compat
    Function   *FunctionWrapper // NEW — populated only when wrapper is on the predicate column
}

type FunctionWrapper struct {
    Name     string         // length, lower, upper, coalesce, extract, date_trunc, char_length, octet_length
    Args     []FunctionArg  // literal args (extract field, date_trunc unit, coalesce default)
    IsNested bool           // true for lower(lower(col)) etc.
}

type FunctionArg struct {
    Literal *string // e.g. "year" for extract, "month" for date_trunc, default for coalesce
}

Suggested test cases

cases := []struct {
    name     string
    sql      string
    expected map[string]*FunctionWrapper // keyed by column name
}{
    {
        name: "bare length",
        sql:  "SELECT * FROM t WHERE length(notes) < 100",
        expected: map[string]*FunctionWrapper{"notes": {Name: "length"}},
    },
    {
        name: "lower with table alias",
        sql:  "SELECT * FROM users u WHERE lower(u.email) = 'a@b.com'",
        expected: map[string]*FunctionWrapper{"email": {Name: "lower"}},
    },
    {
        name: "coalesce with default",
        sql:  "SELECT * FROM t WHERE coalesce(name, '') <> ''",
        expected: map[string]*FunctionWrapper{"name": {Name: "coalesce", Args: []FunctionArg{{Literal: ptr("")}}}},
    },
    {
        name: "extract field",
        sql:  "SELECT * FROM t WHERE extract(year FROM created_at) = 2025",
        expected: map[string]*FunctionWrapper{"created_at": {Name: "extract", Args: []FunctionArg{{Literal: ptr("year")}}}},
    },
    {
        name: "date_trunc month",
        sql:  "SELECT * FROM t WHERE date_trunc('month', created_at) = '2025-03-01'",
        expected: map[string]*FunctionWrapper{"created_at": {Name: "date_trunc", Args: []FunctionArg{{Literal: ptr("month")}}}},
    },
    {
        name: "nested LOWER(LOWER(col))",
        sql:  "SELECT * FROM t WHERE lower(lower(name)) = 'foo'",
        expected: map[string]*FunctionWrapper{"name": {Name: "lower", IsNested: true}},
    },
    {
        name: "expression inside function — no wrapper attached",
        sql:  "SELECT * FROM t WHERE length(name || ' suffix') < 50",
        expected: nil, // wraps an expression, not a bare column
    },
    {
        name: "quoted column with parens",
        sql:  `SELECT * FROM t WHERE length("Customer (legacy)") < 50`,
        expected: map[string]*FunctionWrapper{"Customer (legacy)": {Name: "length"}},
    },
    {
        name: "non-WHERE function call must NOT be captured",
        sql:  "SELECT length(name) FROM t WHERE id = 1",
        expected: nil,
    },
    {
        name: "JOIN ON clause — initial scope is WHERE only",
        sql:  "SELECT * FROM a JOIN b ON lower(a.x) = lower(b.y) WHERE a.x = 1",
        expected: nil, // capture only WHERE wrappers in v1
    },
}

Use case

Tools that build mock data, generate synthetic test rows for query planners, or build EXPLAIN-aware test harnesses need this metadata. Without it they can't distinguish col < N from length(col) < N and either:

  • emit invalid SQL when manufacturing predicate-satisfying values (the column type doesn't match the function output type), or
  • skip the predicate entirely, lowering the fidelity of generated data.

Today consumers work around this by re-scanning the original SQL string with regex post-processors, which break on nested calls (lower(lower(col))), expressions inside functions (length(col || 'x')), and quoted column names with parens.

Which layer?

Core parser (extend ColumnUsage / new field on parse tree extraction).

Scope

  • Initial scope: WHERE clauses only.
  • Out of scope (file as follow-ups): JOIN ON, HAVING, ORDER BY, multi-arg functions like greatest(a, b) or concat(a, b), user-defined functions.

Backward compatibility

The existing Functions []string field already exists on ColumnUsage and reports function names for projection columns. Adding a new structured Function *FunctionWrapper field for predicate wrappers does not break existing consumers — they keep using Functions. New consumers opt in by reading Function.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions