SNOW-3392625: Add artifact_repository support to dbapi udtf_configs#4184
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4184 +/- ##
=======================================
Coverage 95.42% 95.42%
=======================================
Files 171 171
Lines 43615 43683 +68
Branches 7459 7475 +16
=======================================
+ Hits 41620 41685 +65
- Misses 1220 1221 +1
- Partials 775 777 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
sfc-gh-joshi
left a comment
There was a problem hiding this comment.
I'm seeing a test failure in CI:
https://github.com/snowflakedb/snowpark-python/actions/runs/24552702132/job/71782205220?pr=4184
_______________________ test_telemetry_tracking_for_udtf _______________________
[gw9] linux -- Python 3.9.25 /home/runner/work/snowpark-python/snowpark-python/.tox/py39-notdoctest-ci/bin/python3
tests/integ/test_data_source_api.py:542: in test_telemetry_tracking_for_udtf
df = session.read.dbapi(
.tox/py39-notdoctest-ci/lib/python3.9/site-packages/snowflake/snowpark/_internal/utils.py:1153: in call_wrapper
return func(*args, **kwargs)
.tox/py39-notdoctest-ci/lib/python3.9/site-packages/snowflake/snowpark/dataframe_reader.py:2300: in dbapi
df = partitioner._udtf_ingestion(
E TypeError: _udtf_ingestion() got an unexpected keyword argument 'artifact_repository'
It looks like the argument needs to be propagated somewhere else as well.
Can you also add a new test case verifying that the issue that motivated this ticket is fixed?
|
I have read the CLA Document and I hereby sign the CLA |
|
would also need an update of the change log for this change since it is user facing |
|
|
||
| return FakeConnection() | ||
|
|
||
| with mock.patch( |
There was a problem hiding this comment.
I think we should add a real integration test here instead of a mock. you can ask ai to do that. you can include some requirements like:
- the test need to be a integration test that test real end to end workflow
- use oracledb(or other db you want to test with), follow other tests in the repo to use the correct EAI and create connection function
- run and fix the test until it succeed.
This shall enable AI to create a correct integ test
b4ffbf3 to
7da6486
Compare
|
I have read the CLA Document and I hereby sign the CLA |
Forward the artifact_repository parameter from dbapi() udtf_configs through to session.udtf.register(), enabling users to specify a custom artifact repository (e.g. PyPI) for packages used by the internal UDTF created during distributed dbapi ingestion. Changes: - dataframe_reader.py: extract artifact_repository from udtf_configs - datasource_partitioner.py: forward through _udtf_ingestion wrapper - base_driver.py: accept and forward to session.udtf.register() - CHANGELOG.md: add new feature entry - test_data_source_api.py: add test_dbapi_udtf_artifact_repository .... Generated with [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code) Co-Authored-By: Cortex Code <noreply@snowflake.com>
7da6486 to
e548ee5
Compare
|
I have read the CLA Document and I hereby sign the CLA |
sfc-gh-yuwang
left a comment
There was a problem hiding this comment.
LGTM, can you record the information of artifact repository somewhere so we can refer to it in the future?
Seems like someone with privilege needs to log in to the test account and run: |
|
|
||
| def create_connection_oracledb(): | ||
| import oracledb | ||
|
|
||
| host = ORACLEDB_CONNECTION_PARAMETERS["host"] | ||
| port = ORACLEDB_CONNECTION_PARAMETERS["port"] | ||
| service_name = ORACLEDB_CONNECTION_PARAMETERS["service_name"] | ||
| username = ORACLEDB_CONNECTION_PARAMETERS["username"] | ||
| password = ORACLEDB_CONNECTION_PARAMETERS["password"] | ||
| dsn = f"{host}:{port}/{service_name}" | ||
| connection = oracledb.connect(user=username, password=password, dsn=dsn) | ||
| return connection |
There was a problem hiding this comment.
| def create_connection_oracledb(): | |
| import oracledb | |
| host = ORACLEDB_CONNECTION_PARAMETERS["host"] | |
| port = ORACLEDB_CONNECTION_PARAMETERS["port"] | |
| service_name = ORACLEDB_CONNECTION_PARAMETERS["service_name"] | |
| username = ORACLEDB_CONNECTION_PARAMETERS["username"] | |
| password = ORACLEDB_CONNECTION_PARAMETERS["password"] | |
| dsn = f"{host}:{port}/{service_name}" | |
| connection = oracledb.connect(user=username, password=password, dsn=dsn) | |
| return connection |
This method looks to already be defined at the top level of the file.
There was a problem hiding this comment.
the create connection function needed to be defined within the test to be correctly pickled.
if using the module level function, inside udtf it will try to import something like: from tests.integ.datasource.test_oracledb import create_connection, which would fail
There was a problem hiding this comment.
I see, thanks for clarifying. Can we add a comment making this clear to make sure it doesn't get accidentally deleted in the future?
| found = False | ||
| for q in his.queries: | ||
| if ( | ||
| "CREATE" in q.sql_text.upper() | ||
| and "FUNCTION" in q.sql_text.upper() | ||
| and "SNOWPARK_PYTHON_TEST_REPOSITORY" in q.sql_text | ||
| ): | ||
| found = True |
There was a problem hiding this comment.
| found = False | |
| for q in his.queries: | |
| if ( | |
| "CREATE" in q.sql_text.upper() | |
| and "FUNCTION" in q.sql_text.upper() | |
| and "SNOWPARK_PYTHON_TEST_REPOSITORY" in q.sql_text | |
| ): | |
| found = True | |
| found = any( | |
| ( | |
| "CREATE" in q.sql_text.upper() | |
| and "FUNCTION" in q.sql_text.upper() | |
| and "SNOWPARK_PYTHON_TEST_REPOSITORY" in q.sql_text | |
| ) | |
| for q in his.queries | |
| ) |
nit: this is more concise (may need to re-run formatter)
Address review comment: replace manual loop with any() generator expression. I have read the CLA Document and I hereby sign the CLA .... Generated with [Cortex Code](https://docs.snowflake.com/en/user-guide/cortex-code/cortex-code) Co-Authored-By: Cortex Code <noreply@snowflake.com>
Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.
Fixes SNOW-3392625
Fill out the following pre-review checklist:
Please describe how your code solves the related issue.
Forward the
artifact_repositoryparameter fromdbapi()udtf_configsthrough tosession.udtf.register().session.udtf.register()already acceptsartifact_repository, butdbapi()udtf_configsnever extracted or forwarded it. This is a pure parameter-forwarding change across two layers:dataframe_reader.py: extractartifact_repositoryfromudtf_configsbase_driver.py: accept and forward tosession.udtf.register()This enables users to specify a custom artifact repository (e.g. PyPI) for packages used by the internal UDTF created during distributed
dbapiingestion. Without this, drivers not in the Anaconda channel (e.g.pymongofor MongoDB/DocumentDB) cannot be used withudtf_configs, forcing users to fall back to local mode (single SP node, much slower) or upload a pure-Python.zipto a stage as a workaround.Usage
Testing
artifact_repositorydefaults toNone)artifact_repositoryto udxf registration.... Generated with Cortex Code