SNOW-3473261: Add Iceberg tag time travel via version_tag#4211
Merged
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4211 +/- ##
=======================================
Coverage 95.49% 95.49%
=======================================
Files 171 171
Lines 44219 44243 +24
Branches 7540 7548 +8
=======================================
+ Hits 42226 42252 +26
+ Misses 1227 1226 -1
+ Partials 766 765 -1 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
sfc-gh-aling
reviewed
May 6, 2026
cdd1ba1 to
21807f4
Compare
21807f4 to
05bf4cf
Compare
sfc-gh-igarish
added a commit
that referenced
this pull request
Jun 10, 2026
Address review feedback on PR #4211 from @sfc-gh-yuwang: SQL string literals embedded in the time-travel clause (``statement``, ``stream``, ``version_tag``, and the string form of ``timestamp``) were interpolated raw — a tag name like ``release_'s`` produced broken SQL and an untrusted value could close the literal and inject arbitrary text (e.g. ``x'); DROP TABLE foo; --``). Centralize escaping in a single ``_quote`` helper inside ``generate_sql_clause`` that doubles embedded ``'`` to ``''`` (Snowflake's standard string-literal escape) and apply it to all four string-valued parameters. Numeric parameters (``offset``, ``version``) need no escaping. The pattern was pre-existing for ``statement``, ``stream``, and ``timestamp``; this commit hardens those at the same time so the four parameters share one consistent escape path. Added ``test_time_travel_string_literal_escaping`` to ``tests/unit/test_utils.py`` covering all four parameters with both benign embedded quotes (``release_'s``) and an explicit injection payload, plus the typed ``TO_TIMESTAMP_NTZ`` variant. Co-authored-by: Cursor <cursoragent@cursor.com>
sfc-gh-joshi
approved these changes
Jun 10, 2026
Add ``version_tag`` time-travel parameter that emits Snowflake's released ``AT(VERSION_TAG => '<name>')`` clause — the tag-only form of Iceberg time travel (Spark Iceberg's ``VERSION AS OF '<tag_name>'`` for tag reads). Scope: tag reads only. Branch reads are deferred until Snowflake's unified ``VERSION_REF`` syntax (which subsumes branch + tag) ships; the Python kwarg ``version_tag`` and SQL ``VERSION_TAG`` keyword match Snowflake's already-released grammar. Mirrors the merged ``version=`` (snapshot-id) PR (#4231): * Hidden behind ``**kwargs`` on ``Session.table()``, ``DataFrameReader.table()``, and ``Table.__init__()`` — consumed by Snowpark Connect, not part of the advertised public surface yet. * ``DataFrameReader`` reader option aliases ``version_tag`` and ``version-tag`` (matching the ``snapshot-id`` / ``snapshot_id`` style); both auto-set ``time_travel_mode='at'`` since tag reads are positional (bound to a specific snapshot), not range-of-time. * Validation: ``time_travel_mode='before'`` is rejected; ``version_tag`` is mutually exclusive with the other time-travel parameters (``statement``, ``offset``, ``timestamp``, ``stream``, ``version``); non-string and empty tag names are rejected. * No AST proto changes — the field travels through ``**kwargs`` the same way ``version=`` does. * No CHANGELOG entry — internal kwargs surface, not customer- visible (matches PR #4231). Unit tests cover SQL emission, validation, conflicts, and ``_extract_time_travel_from_options`` extraction for both alias forms. Two integ tests under ``tests/integ/test_dataframe.py`` are added but ``pytest.mark.skip``'d pending a CI account with a CLD-linked unmanaged Iceberg table that has named tags and ``FEATURE_ICEBERG_TIME_TRAVEL`` enabled (manual reproducer lives in the snowflake-eng/sas repo). Co-authored-by: Cursor <cursoragent@cursor.com>
Address review feedback on PR #4211 from @sfc-gh-yuwang: SQL string literals embedded in the time-travel clause (``statement``, ``stream``, ``version_tag``, and the string form of ``timestamp``) were interpolated raw — a tag name like ``release_'s`` produced broken SQL and an untrusted value could close the literal and inject arbitrary text (e.g. ``x'); DROP TABLE foo; --``). Centralize escaping in a single ``_quote`` helper inside ``generate_sql_clause`` that doubles embedded ``'`` to ``''`` (Snowflake's standard string-literal escape) and apply it to all four string-valued parameters. Numeric parameters (``offset``, ``version``) need no escaping. The pattern was pre-existing for ``statement``, ``stream``, and ``timestamp``; this commit hardens those at the same time so the four parameters share one consistent escape path. Added ``test_time_travel_string_literal_escaping`` to ``tests/unit/test_utils.py`` covering all four parameters with both benign embedded quotes (``release_'s``) and an explicit injection payload, plus the typed ``TO_TIMESTAMP_NTZ`` variant. Co-authored-by: Cursor <cursoragent@cursor.com>
193bde0 to
e471446
Compare
sfc-gh-yuwang
approved these changes
Jun 10, 2026
sfc-gh-yuwang
left a comment
Collaborator
There was a problem hiding this comment.
preapproval assuming the comment is addressed
Address review feedback on PR #4211 from @sfc-gh-yuwang: replace the inline ``_quote`` helper added in the previous commit with the existing ``snowflake.snowpark._internal.analyzer.datatype_mapper.str_to_sql`` helper, which is already the canonical SQL string-literal emitter in Snowpark Python (used by ``server_connection`` and ``session``). This keeps the time-travel clause consistent with the rest of the SQL generation and picks up two additional escape behaviors for free: * ``\`` → ``\\`` (backslash doubling) * embedded newline → literal ``\n`` two-char sequence on top of the single-quote doubling that was already in place. Imported lazily inside ``generate_sql_clause`` because ``analyzer.datatype_mapper`` imports from this module at top level — matches the lazy-import pattern already used elsewhere in this file for analyzer cross-references. Extended ``test_time_travel_string_literal_escaping`` to pin the backslash and newline behavior in addition to the existing single-quote and injection-payload cases, so any future change to ``str_to_sql`` that would affect time-travel emission fails here instead of silently producing broken SQL. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add Iceberg tag time travel via a
version_tag=parameter that emitsSnowflake's released
AT(VERSION_TAG => '<name>')clause — the tag-onlyform of Iceberg time travel (Spark Iceberg's
VERSION AS OF '<tag_name>'for tag reads).Scope: tag reads only. Branch reads are deferred until Snowflake's
unified
VERSION_REFsyntax (which subsumes branch + tag) ships; thePython kwarg
version_tagand SQLVERSION_TAGkeyword matchSnowflake's already-released grammar, so this PR can land independently.
This is the tag-only companion to the merged snapshot-id PR (#4231) and
mirrors that PR's surface decisions point-for-point:
version_tagis hidden behind**kwargsonSession.table(),DataFrameReader.table(), andTable.__init__()— consumed by SnowparkConnect, not part of the advertised public Snowpark Python surface
yet. Direct callers can still pass it; the doc page just doesn't list it.
version_tagandversion-tag(matching theexisting
snapshot-id/snapshot_idstyle). Both auto-settime_travel_mode='at'since tag reads are positional (bound to aspecific snapshot), not range-of-time.
**kwargsthesame way
version=does — keeps the AST surface minimal and lets themonorepo generate any AST coverage later if/when this becomes a
first-class API.
(matches PR Add Iceberg snapshot-id time travel support (version=) #4231).
Validation
time_travel_mode='before'is rejected — tag reads are positional, notrange-of-time.
version_tagis mutually exclusive withstatement,offset,timestamp,stream, andversion.version_tagvalues are rejected.'at'(same shape asversion=).Test plan
tests/unit/test_utils.py:test_time_travel_version_tag(SQL emission, mutual exclusion,mode validation, type / empty validation) and
test_extract_time_travel_version_tag_option(canonical +hyphen option aliases, before-mode rejection). 17/17 tests in
test_utils.pypass.tests/integ/test_dataframe.py:test_iceberg_version_tag_time_travel_session_tableandtest_iceberg_version_tag_time_travel_dataframe_reader_option.Both are
pytest.mark.skip'd pending a CI account with aCLD-linked unmanaged Iceberg table that has named tags and
FEATURE_ICEBERG_TIME_TRAVELenabled — same posture as themerged snapshot-id integ tests. Manual reproducer lives in
snowflake-eng/sas(tests/sas_tests/test_iceberg_version_tag_sample.py).Future work
version_ref=(or equivalent) once Snowflake ships the unifiedAT(VERSION_REF => '<name>')clause that handles both branches andtags. At that point
version_tagcan either alias toversion_reforremain as a tag-specific spelling for clarity.
version=andversion_tag=to the public surface oncethe Snowpark Connect consumer is stable.
Related
version=for snapshot ids)