Skip to content

fix(dbt): preserve source identifier quoting when it contains dots or spaces#5827

Open
jagannalla wants to merge 2 commits into
SQLMesh:mainfrom
jagannalla:fix/source-canonical-name-quotes
Open

fix(dbt): preserve source identifier quoting when it contains dots or spaces#5827
jagannalla wants to merge 2 commits into
SQLMesh:mainfrom
jagannalla:fix/source-canonical-name-quotes

Conversation

@jagannalla

@jagannalla jagannalla commented Jun 4, 2026

Copy link
Copy Markdown

Problem

When a dbt source table identifier contains a dot (e.g., "FILENAME.CSV") or a space, SourceConfig.canonical_name() unconditionally strips the quoting. This renders the canonical name as database.schema.FILENAME.CSV.

When sqlglot tries to parse this unquoted string back into an AST, it splits the path by . into 4 components, causing a ValueError: too many values to unpack (expected 3) and failing to load the project.

Solution

This PR checks if the resolved table name contains a dot (.) or a space ( ), and if so, preserves the quoting for the identifier component during canonicalization. Standard table names remain unquoted and normalized.

Test Plan

@StuffbyYuki

Copy link
Copy Markdown
Collaborator

@jagannalla Does it make more sense to have logic like:

identifier = relation.identifier or ""
needs_identifier_quoting = "." in identifier or " " in identifier

than

needs_identifier_quoting = bool(
    self.table_name and ("." in self.table_name or " " in self.table_name)
)

?

The former approach might be slightly safer because it checks the final, dbt-resolved name, not just what's in the config file.

Let me know if I'm missing anything!

@noezhiya-dot

Copy link
Copy Markdown

Clean fix. The conditional quoting logic is straightforward and the test coverage is thorough — covers dots, spaces, standard identifiers, and the database-omission edge case. One small observation: the needs_identifier_quoting check runs on every call even when the identifier is already quoted by dbt. Since the downstream relation.quote() handles double-quoting gracefully, this is fine, but worth noting that the check is purely a string heuristic (not semantic). Approved on review.

@jagannalla

Copy link
Copy Markdown
Author

Thanks for the reviews, @StuffbyYuki and @noezhiya-dot!
I have updated the PR to address the feedback:

  • Changed the check to use the resolved relation.identifier instead of self.table_name to ensure we are always checking the final, dbt-resolved name.
  • Verified that all tests are passing.
    Ready for merge!

@StuffbyYuki

Copy link
Copy Markdown
Collaborator

@jagannalla Can you make sure all commits have DCO checks? Thanks!

@noezhiya-dot

Copy link
Copy Markdown

Nice targeted fix. The root cause is clear: unquoting identifiers with dots/spaces breaks sqlglot's AST parsing. The fix is minimal and correct — only preserve quoting when the identifier actually needs it. Tests cover all the key cases (dots, spaces, standard identifiers, target database omission). LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SourceConfig.canonical_name() fails with ValueError: too many values to unpack for source identifiers containing dots

4 participants