Skip to content

SNOW-3473261: Add Iceberg tag time travel via version_tag#4211

Merged
sfc-gh-aling merged 3 commits into
mainfrom
igarish/snowpark-python-iceberg-tags
Jun 11, 2026
Merged

SNOW-3473261: Add Iceberg tag time travel via version_tag#4211
sfc-gh-aling merged 3 commits into
mainfrom
igarish/snowpark-python-iceberg-tags

Conversation

@sfc-gh-igarish

@sfc-gh-igarish sfc-gh-igarish commented May 4, 2026

Copy link
Copy Markdown
Collaborator

Summary

Add Iceberg tag time travel via a version_tag= parameter that emits
Snowflake's released AT(VERSION_TAG => '<name>') clause — the tag-only
form of Iceberg time travel (Spark Iceberg's
VERSION AS OF '<tag_name>' for tag reads).

Scope: tag reads only. Branch reads are deferred until Snowflake's
unified VERSION_REF syntax (which subsumes branch + tag) ships; the
Python kwarg version_tag and SQL VERSION_TAG keyword match
Snowflake's already-released grammar, so this PR can land independently.

This is the tag-only companion to the merged snapshot-id PR (#4231) and
mirrors that PR's surface decisions point-for-point:

  • version_tag is hidden behind **kwargs on Session.table(),
    DataFrameReader.table(), and Table.__init__() — consumed by Snowpark
    Connect, not part of the advertised public Snowpark Python surface
    yet. Direct callers can still pass it; the doc page just doesn't list it.
  • Reader option aliases version_tag and version-tag (matching the
    existing snapshot-id / snapshot_id style). Both auto-set
    time_travel_mode='at' since tag reads are positional (bound to a
    specific snapshot), not range-of-time.
  • No AST proto changes. The field travels through **kwargs the
    same way version= does — keeps the AST surface minimal and lets the
    monorepo generate any AST coverage later if/when this becomes a
    first-class API.
  • No CHANGELOG entry — internal kwargs surface, not customer-visible
    (matches PR Add Iceberg snapshot-id time travel support (version=) #4231).

Validation

  • time_travel_mode='before' is rejected — tag reads are positional, not
    range-of-time.
  • version_tag is mutually exclusive with statement, offset,
    timestamp, stream, and version.
  • Non-string and empty version_tag values are rejected.
  • When provided without a mode, defaults to 'at' (same shape as
    version=).

Test plan

  • Unit tests in tests/unit/test_utils.py:
    test_time_travel_version_tag (SQL emission, mutual exclusion,
    mode validation, type / empty validation) and
    test_extract_time_travel_version_tag_option (canonical +
    hyphen option aliases, before-mode rejection). 17/17 tests in
    test_utils.py pass.
  • Black 22.3.0 + flake8 + pre-commit clean.
  • Integ tests added under tests/integ/test_dataframe.py:
    test_iceberg_version_tag_time_travel_session_table and
    test_iceberg_version_tag_time_travel_dataframe_reader_option.
    Both are pytest.mark.skip'd pending a CI account with a
    CLD-linked unmanaged Iceberg table that has named tags and
    FEATURE_ICEBERG_TIME_TRAVEL enabled — same posture as the
    merged snapshot-id integ tests. Manual reproducer lives in
    snowflake-eng/sas (tests/sas_tests/test_iceberg_version_tag_sample.py).

Future work

  • Add version_ref= (or equivalent) once Snowflake ships the unified
    AT(VERSION_REF => '<name>') clause that handles both branches and
    tags. At that point version_tag can either alias to version_ref or
    remain as a tag-specific spelling for clarity.
  • Promote both version= and version_tag= to the public surface once
    the Snowpark Connect consumer is stable.

Related

@sfc-gh-igarish sfc-gh-igarish requested review from a team as code owners May 4, 2026 22:31
@sfc-gh-igarish sfc-gh-igarish requested review from sfc-gh-aalam, sfc-gh-aling and sfc-gh-yuwang and removed request for sfc-gh-aling May 4, 2026 22:31
@codecov-commenter

codecov-commenter commented May 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.49%. Comparing base (e22bc94) to head (660a324).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4211   +/-   ##
=======================================
  Coverage   95.49%   95.49%           
=======================================
  Files         171      171           
  Lines       44219    44243   +24     
  Branches     7540     7548    +8     
=======================================
+ Hits        42226    42252   +26     
+ Misses       1227     1226    -1     
+ Partials      766      765    -1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread src/snowflake/snowpark/_internal/proto/ast.proto Outdated
Comment thread src/snowflake/snowpark/_internal/utils.py Outdated
Comment thread src/snowflake/snowpark/dataframe_reader.py Outdated
@sfc-gh-igarish sfc-gh-igarish force-pushed the igarish/snowpark-python-iceberg-tags branch from cdd1ba1 to 21807f4 Compare June 10, 2026 21:11
@sfc-gh-igarish sfc-gh-igarish changed the title Add Iceberg tag time travel support SNOW-3473261: Add Iceberg branch / tag time travel via version_ref Jun 10, 2026
@sfc-gh-igarish sfc-gh-igarish changed the title SNOW-3473261: Add Iceberg branch / tag time travel via version_ref SNOW-3473261: Add Iceberg tag time travel via version_ref Jun 10, 2026
@sfc-gh-igarish sfc-gh-igarish force-pushed the igarish/snowpark-python-iceberg-tags branch from 21807f4 to 05bf4cf Compare June 10, 2026 21:22
@sfc-gh-igarish sfc-gh-igarish changed the title SNOW-3473261: Add Iceberg tag time travel via version_ref SNOW-3473261: Add Iceberg tag time travel via version_tag Jun 10, 2026
Comment thread src/snowflake/snowpark/_internal/utils.py
sfc-gh-igarish added a commit that referenced this pull request Jun 10, 2026
Address review feedback on PR #4211 from @sfc-gh-yuwang:
SQL string literals embedded in the time-travel clause (``statement``,
``stream``, ``version_tag``, and the string form of ``timestamp``) were
interpolated raw — a tag name like ``release_'s`` produced broken SQL
and an untrusted value could close the literal and inject arbitrary
text (e.g. ``x'); DROP TABLE foo; --``).

Centralize escaping in a single ``_quote`` helper inside
``generate_sql_clause`` that doubles embedded ``'`` to ``''``
(Snowflake's standard string-literal escape) and apply it to all four
string-valued parameters. Numeric parameters (``offset``, ``version``)
need no escaping.

The pattern was pre-existing for ``statement``, ``stream``, and
``timestamp``; this commit hardens those at the same time so the four
parameters share one consistent escape path.

Added ``test_time_travel_string_literal_escaping`` to
``tests/unit/test_utils.py`` covering all four parameters with both
benign embedded quotes (``release_'s``) and an explicit injection
payload, plus the typed ``TO_TIMESTAMP_NTZ`` variant.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment thread tests/integ/test_dataframe.py
sfc-gh-igarish and others added 2 commits June 10, 2026 16:30
Add ``version_tag`` time-travel parameter that emits Snowflake's
released ``AT(VERSION_TAG => '<name>')`` clause — the tag-only form
of Iceberg time travel (Spark Iceberg's
``VERSION AS OF '<tag_name>'`` for tag reads).

Scope: tag reads only. Branch reads are deferred until Snowflake's
unified ``VERSION_REF`` syntax (which subsumes branch + tag) ships;
the Python kwarg ``version_tag`` and SQL ``VERSION_TAG`` keyword
match Snowflake's already-released grammar.

Mirrors the merged ``version=`` (snapshot-id) PR (#4231):
* Hidden behind ``**kwargs`` on ``Session.table()``,
  ``DataFrameReader.table()``, and ``Table.__init__()`` — consumed
  by Snowpark Connect, not part of the advertised public surface
  yet.
* ``DataFrameReader`` reader option aliases ``version_tag`` and
  ``version-tag`` (matching the ``snapshot-id`` / ``snapshot_id``
  style); both auto-set ``time_travel_mode='at'`` since tag reads
  are positional (bound to a specific snapshot), not range-of-time.
* Validation: ``time_travel_mode='before'`` is rejected;
  ``version_tag`` is mutually exclusive with the other time-travel
  parameters (``statement``, ``offset``, ``timestamp``, ``stream``,
  ``version``); non-string and empty tag names are rejected.
* No AST proto changes — the field travels through ``**kwargs``
  the same way ``version=`` does.
* No CHANGELOG entry — internal kwargs surface, not customer-
  visible (matches PR #4231).

Unit tests cover SQL emission, validation, conflicts, and
``_extract_time_travel_from_options`` extraction for both alias
forms. Two integ tests under ``tests/integ/test_dataframe.py`` are
added but ``pytest.mark.skip``'d pending a CI account with a
CLD-linked unmanaged Iceberg table that has named tags and
``FEATURE_ICEBERG_TIME_TRAVEL`` enabled (manual reproducer lives
in the snowflake-eng/sas repo).

Co-authored-by: Cursor <cursoragent@cursor.com>
Address review feedback on PR #4211 from @sfc-gh-yuwang:
SQL string literals embedded in the time-travel clause (``statement``,
``stream``, ``version_tag``, and the string form of ``timestamp``) were
interpolated raw — a tag name like ``release_'s`` produced broken SQL
and an untrusted value could close the literal and inject arbitrary
text (e.g. ``x'); DROP TABLE foo; --``).

Centralize escaping in a single ``_quote`` helper inside
``generate_sql_clause`` that doubles embedded ``'`` to ``''``
(Snowflake's standard string-literal escape) and apply it to all four
string-valued parameters. Numeric parameters (``offset``, ``version``)
need no escaping.

The pattern was pre-existing for ``statement``, ``stream``, and
``timestamp``; this commit hardens those at the same time so the four
parameters share one consistent escape path.

Added ``test_time_travel_string_literal_escaping`` to
``tests/unit/test_utils.py`` covering all four parameters with both
benign embedded quotes (``release_'s``) and an explicit injection
payload, plus the typed ``TO_TIMESTAMP_NTZ`` variant.

Co-authored-by: Cursor <cursoragent@cursor.com>
@sfc-gh-igarish sfc-gh-igarish force-pushed the igarish/snowpark-python-iceberg-tags branch from 193bde0 to e471446 Compare June 10, 2026 23:30
Comment thread src/snowflake/snowpark/_internal/utils.py Outdated

@sfc-gh-yuwang sfc-gh-yuwang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

preapproval assuming the comment is addressed

Address review feedback on PR #4211 from @sfc-gh-yuwang:
replace the inline ``_quote`` helper added in the previous commit
with the existing ``snowflake.snowpark._internal.analyzer.datatype_mapper.str_to_sql``
helper, which is already the canonical SQL string-literal emitter in
Snowpark Python (used by ``server_connection`` and ``session``).

This keeps the time-travel clause consistent with the rest of the
SQL generation and picks up two additional escape behaviors for
free:
* ``\`` → ``\\`` (backslash doubling)
* embedded newline → literal ``\n`` two-char sequence
on top of the single-quote doubling that was already in place.

Imported lazily inside ``generate_sql_clause`` because
``analyzer.datatype_mapper`` imports from this module at top level —
matches the lazy-import pattern already used elsewhere in this file
for analyzer cross-references.

Extended ``test_time_travel_string_literal_escaping`` to pin the
backslash and newline behavior in addition to the existing
single-quote and injection-payload cases, so any future change to
``str_to_sql`` that would affect time-travel emission fails here
instead of silently producing broken SQL.

Co-authored-by: Cursor <cursoragent@cursor.com>
@sfc-gh-aling sfc-gh-aling merged commit 9975596 into main Jun 11, 2026
27 of 31 checks passed
@sfc-gh-aling sfc-gh-aling deleted the igarish/snowpark-python-iceberg-tags branch June 11, 2026 01:28
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 11, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants