4117433: Update _asset_utils.py to fix Performance issue#46572
4117433: Update _asset_utils.py to fix Performance issue#46572sangeetha-cpu wants to merge 2 commits intoAzure:mainfrom
Conversation
|
Thank you for your contribution @sangeetha-cpu! We will review the pull request and get back to you soon. |
There was a problem hiding this comment.
Pull request overview
This PR targets a performance improvement in asset upload file enumeration by avoiding descending into ignored directories during folder traversal. It also includes import/type-related refactors in _asset_utils.py (currently introducing runtime import issues that need correction).
Changes:
- Prune ignored directories during
os.walkinget_upload_files_from_folderto reduce traversal cost (e.g., large.venv-like folders). - Reformat several imports (including
typingand constants imports). - Update restclient model imports and adjust
_get_latesttype handling (currently incorrect/broken).
| from azure.ai.ml._restclient.v2022_05_01.models import ( | ||
| DataVersionBaseData, | ||
| ModelVersionData, | ||
| ModelVersionResourceArmPaginatedResult, | ||
| ) | ||
| from azure.ai.ml._restclient.arm_ml_service.models import PendingUploadRequestDto | ||
| from azure.ai.ml._restclient.v2023_04_01.models import PendingUploadRequestDto |
There was a problem hiding this comment.
The new restclient imports reference modules that don't exist in this package (azure.ai.ml._restclient.v2022_05_01.models and azure.ai.ml._restclient.v2023_04_01.models are not present under _restclient/). This will raise ImportError at runtime. Please switch these imports back to an existing restclient version (e.g., v2022_02_01_preview.models / v2023_04_01_preview.models) or keep using arm_ml_service.models, matching the operations you use in this file.
| if latest and isinstance(latest, ModelVersionResourceArmPaginatedResult): | ||
| # Data list return object doesn't require this since its elements are already DatasetVersionResources | ||
| latest = cast(ModelVersionData, latest) |
There was a problem hiding this comment.
The isinstance(latest, ModelVersionResourceArmPaginatedResult) branch is effectively dead: the ModelVersionsOperations.list() paging implementation yields elements from deserialized.value (typed as List[ModelVersionData]), so result.next() returns a ModelVersionData, not the paginated result wrapper. This check should likely be against ModelVersionData (or removed entirely) to keep the logic accurate and avoid confusion.
| if latest and isinstance(latest, ModelVersionResourceArmPaginatedResult): | |
| # Data list return object doesn't require this since its elements are already DatasetVersionResources | |
| latest = cast(ModelVersionData, latest) |
| TYPE_CHECKING, | ||
| Any, | ||
| Dict, | ||
| Iterable, | ||
| List, | ||
| Optional, | ||
| Tuple, | ||
| Union, |
There was a problem hiding this comment.
There are trailing spaces in the multi-line from typing import (...) block (e.g., after TYPE_CHECKING, / Any, / Dict, etc.). These will trip linters (W291). Please remove the trailing whitespace and keep formatting consistent.
| TYPE_CHECKING, | |
| Any, | |
| Dict, | |
| Iterable, | |
| List, | |
| Optional, | |
| Tuple, | |
| Union, | |
| TYPE_CHECKING, | |
| Any, | |
| Dict, | |
| Iterable, | |
| List, | |
| Optional, | |
| Tuple, | |
| Union, |
| for root, dirs, files in os.walk(path, followlinks=True, topdown=True): | ||
| # Skip ignored directories to avoid descending into them | ||
| # This significantly improves performance when large ignored folders (e.g., .venv) are present | ||
| dirs[:] = [ | ||
| d for d in dirs | ||
| if not ignore_file.is_file_excluded(Path(root).joinpath(d).as_posix()) | ||
| ] |
There was a problem hiding this comment.
The PR description is still the template text and doesn’t describe the actual change or link the relevant issue (the title suggests a performance fix). Please update the PR description to summarize what changed and reference the tracking issue/bug ID.
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines