Skip to content

SNOW-3392625: Add artifact_repository support to dbapi udtf_configs#4184

Merged
sfc-gh-aling merged 2 commits into
mainfrom
qding/dbapi-artifact-repository
Apr 20, 2026
Merged

SNOW-3392625: Add artifact_repository support to dbapi udtf_configs#4184
sfc-gh-aling merged 2 commits into
mainfrom
qding/dbapi-artifact-repository

Conversation

@sfc-gh-qding

@sfc-gh-qding sfc-gh-qding commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator
  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-3392625

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I am requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I am adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

    Forward the artifact_repository parameter from dbapi() udtf_configs through to session.udtf.register().

    session.udtf.register() already accepts artifact_repository, but dbapi() udtf_configs never extracted or forwarded it. This is a pure parameter-forwarding change across two layers:

    1. dataframe_reader.py: extract artifact_repository from udtf_configs
    2. base_driver.py: accept and forward to session.udtf.register()

    This enables users to specify a custom artifact repository (e.g. PyPI) for packages used by the internal UDTF created during distributed dbapi ingestion. Without this, drivers not in the Anaconda channel (e.g. pymongo for MongoDB/DocumentDB) cannot be used with udtf_configs, forcing users to fall back to local mode (single SP node, much slower) or upload a pure-Python .zip to a stage as a workaround.

    Usage

    udtf_configs = {
        "external_access_integration": "MY_EAI",
        "artifact_repository": "snowflake.snowpark.pypi_shared_repository",
        "packages": ["pymongo"],
    }
    
    df = session.read.dbapi(
        create_conn,
        table="my_collection",
        custom_schema=schema,
        predicates=predicates,
        fetch_size=100000,
        udtf_configs=udtf_configs,
    )

    Testing

    • No new tests needed: existing dbapi UDTF tests pass unchanged (artifact_repository defaults to None)
    • 3 lines added across 2 files — pure parameter forwarding, no new logic
    • Follows the same pattern as PR SNOW-1708509: Add support for artifact_repository to udxf #3044 which added artifact_repository to udxf registration

.... Generated with Cortex Code

@github-actions

github-actions Bot commented Apr 17, 2026

Copy link
Copy Markdown

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@sfc-gh-qding sfc-gh-qding changed the title Add artifact_repository support to dbapi udtf_configs SNOW-3392625: Add artifact_repository support to dbapi udtf_configs Apr 17, 2026
@codecov-commenter

codecov-commenter commented Apr 17, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.42%. Comparing base (7c853e1) to head (da2fb7c).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4184   +/-   ##
=======================================
  Coverage   95.42%   95.42%           
=======================================
  Files         171      171           
  Lines       43615    43683   +68     
  Branches     7459     7475   +16     
=======================================
+ Hits        41620    41685   +65     
- Misses       1220     1221    +1     
- Partials      775      777    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@sfc-gh-joshi sfc-gh-joshi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing a test failure in CI:
https://github.com/snowflakedb/snowpark-python/actions/runs/24552702132/job/71782205220?pr=4184

_______________________ test_telemetry_tracking_for_udtf _______________________
[gw9] linux -- Python 3.9.25 /home/runner/work/snowpark-python/snowpark-python/.tox/py39-notdoctest-ci/bin/python3
tests/integ/test_data_source_api.py:542: in test_telemetry_tracking_for_udtf
    df = session.read.dbapi(
.tox/py39-notdoctest-ci/lib/python3.9/site-packages/snowflake/snowpark/_internal/utils.py:1153: in call_wrapper
    return func(*args, **kwargs)
.tox/py39-notdoctest-ci/lib/python3.9/site-packages/snowflake/snowpark/dataframe_reader.py:2300: in dbapi
    df = partitioner._udtf_ingestion(
E   TypeError: _udtf_ingestion() got an unexpected keyword argument 'artifact_repository'

It looks like the argument needs to be propagated somewhere else as well.

Can you also add a new test case verifying that the issue that motivated this ticket is fixed?

@sfc-gh-qding

Copy link
Copy Markdown
Collaborator Author

I have read the CLA Document and I hereby sign the CLA

@sfc-gh-yuwang

Copy link
Copy Markdown
Collaborator

would also need an update of the change log for this change since it is user facing

Comment thread tests/integ/test_data_source_api.py Outdated

return FakeConnection()

with mock.patch(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a real integration test here instead of a mock. you can ask ai to do that. you can include some requirements like:

  1. the test need to be a integration test that test real end to end workflow
  2. use oracledb(or other db you want to test with), follow other tests in the repo to use the correct EAI and create connection function
  3. run and fix the test until it succeed.

This shall enable AI to create a correct integ test

@sfc-gh-qding sfc-gh-qding force-pushed the qding/dbapi-artifact-repository branch 2 times, most recently from b4ffbf3 to 7da6486 Compare April 17, 2026 23:06
@sfc-gh-qding

Copy link
Copy Markdown
Collaborator Author

I have read the CLA Document and I hereby sign the CLA

Forward the artifact_repository parameter from dbapi() udtf_configs
through to session.udtf.register(), enabling users to specify a
custom artifact repository (e.g. PyPI) for packages used by the
internal UDTF created during distributed dbapi ingestion.

Changes:
- dataframe_reader.py: extract artifact_repository from udtf_configs
- datasource_partitioner.py: forward through _udtf_ingestion wrapper
- base_driver.py: accept and forward to session.udtf.register()
- CHANGELOG.md: add new feature entry
- test_data_source_api.py: add test_dbapi_udtf_artifact_repository

.... Generated with [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code)

Co-Authored-By: Cortex Code <noreply@snowflake.com>
@sfc-gh-qding sfc-gh-qding force-pushed the qding/dbapi-artifact-repository branch from 7da6486 to e548ee5 Compare April 18, 2026 00:54
@sfc-gh-qding

Copy link
Copy Markdown
Collaborator Author

I have read the CLA Document and I hereby sign the CLA

@sfc-gh-yuwang sfc-gh-yuwang left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can you record the information of artifact repository somewhere so we can refer to it in the future?

@sfc-gh-qding

Copy link
Copy Markdown
Collaborator Author

LGTM, can you record the information of artifact repository somewhere so we can refer to it in the future?

Seems like someone with privilege needs to log in to the test account and run:
DESCRIBE ARTIFACT REPOSITORY testdb_snowpark_python.testschema_snowpark_python.SNOWPARK_PYTHON_TEST_REPOSITORY;

Comment on lines +427 to +438

def create_connection_oracledb():
import oracledb

host = ORACLEDB_CONNECTION_PARAMETERS["host"]
port = ORACLEDB_CONNECTION_PARAMETERS["port"]
service_name = ORACLEDB_CONNECTION_PARAMETERS["service_name"]
username = ORACLEDB_CONNECTION_PARAMETERS["username"]
password = ORACLEDB_CONNECTION_PARAMETERS["password"]
dsn = f"{host}:{port}/{service_name}"
connection = oracledb.connect(user=username, password=password, dsn=dsn)
return connection

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def create_connection_oracledb():
import oracledb
host = ORACLEDB_CONNECTION_PARAMETERS["host"]
port = ORACLEDB_CONNECTION_PARAMETERS["port"]
service_name = ORACLEDB_CONNECTION_PARAMETERS["service_name"]
username = ORACLEDB_CONNECTION_PARAMETERS["username"]
password = ORACLEDB_CONNECTION_PARAMETERS["password"]
dsn = f"{host}:{port}/{service_name}"
connection = oracledb.connect(user=username, password=password, dsn=dsn)
return connection

This method looks to already be defined at the top level of the file.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the create connection function needed to be defined within the test to be correctly pickled.
if using the module level function, inside udtf it will try to import something like: from tests.integ.datasource.test_oracledb import create_connection, which would fail

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for clarifying. Can we add a comment making this clear to make sure it doesn't get accidentally deleted in the future?

Comment thread tests/integ/datasource/test_oracledb.py Outdated
Comment on lines +454 to +461
found = False
for q in his.queries:
if (
"CREATE" in q.sql_text.upper()
and "FUNCTION" in q.sql_text.upper()
and "SNOWPARK_PYTHON_TEST_REPOSITORY" in q.sql_text
):
found = True

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
found = False
for q in his.queries:
if (
"CREATE" in q.sql_text.upper()
and "FUNCTION" in q.sql_text.upper()
and "SNOWPARK_PYTHON_TEST_REPOSITORY" in q.sql_text
):
found = True
found = any(
(
"CREATE" in q.sql_text.upper()
and "FUNCTION" in q.sql_text.upper()
and "SNOWPARK_PYTHON_TEST_REPOSITORY" in q.sql_text
)
for q in his.queries
)

nit: this is more concise (may need to re-run formatter)

Address review comment: replace manual loop with any() generator expression.

I have read the CLA Document and I hereby sign the CLA

.... Generated with [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code)

Co-Authored-By: Cortex Code <noreply@snowflake.com>
@sfc-gh-aling sfc-gh-aling merged commit 2163baa into main Apr 20, 2026
28 of 29 checks passed
@sfc-gh-aling sfc-gh-aling deleted the qding/dbapi-artifact-repository branch April 20, 2026 21:46
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 20, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants