Skip to content

Commit 17a48ac

Browse files
committed
fix(models): address PR #594 review feedback
Greptile P1: ProviderRepository.load emitted its DeprecationWarning inside a `try/except Exception` block. Under `filterwarnings("error", DeprecationWarning)` the warn would raise, the except would swallow it, and `load()` would silently return None (losing the registry). Move the warn outside the catch-all so the strict-warning path no longer drops valid configs. Greptile P2 / johnnygreco: `_warn_on_implicit_provider` and `_warn_on_explicit_default` use `stacklevel=2`, which lands inside pydantic v2's validator dispatch rather than at the user's `ModelConfig(...)` / `ModelProviderRegistry(...)` call. That broke both attribution (the source line was unhelpful) and Python's once-per-location dedup (every call collapsed to the same pydantic-internal key, suppressing all but the first warning). Introduce `data_designer.config.utils.warning_helpers.warn_at_caller`, which walks past the helper, validator, and any pydantic frames to find the user's call site and emits via `warnings.warn_explicit` with the user frame's `__warningregistry__`. Keeps attribution accurate and dedup keyed on the user's (filename, lineno). johnnygreco: align the `provider_repository.py` warning copy with the sibling site in `default_model_settings.py` ("specify provider= explicitly on each ModelConfig instead") so both YAML-default warning sites give the same migration instruction. The previous wording pointed users at "ModelConfig entries" inside `model_providers.yaml`, where ModelConfig entries don't actually live. johnnygreco: dedup the cascade in `DataDesigner.__init__`. With `model_providers=None` and a YAML `default:`, the user previously saw two DeprecationWarnings for the same root cause — `get_default_provider_name()` warns about the YAML key, then `resolve_model_provider_registry(...)` re-warns from `_warn_on_explicit_default`. Suppress the registry-level duplicate in the YAML-fallback branch via `warnings.catch_warnings()` so users see exactly one warning per user action. johnnygreco: tighten `_warn_on_explicit_default` to fire only when `default is not None`. Passing `default=None` explicitly is semantically equivalent to omitting it (caller is opting *out* of a registry-level default), and shouldn't trigger the deprecation nudge. johnnygreco: add a `model_validate({...})` regression test for `ModelConfig` so the deserialization path (legacy on-disk configs) is pinned alongside the construction path. Tests: - Update `test_load_exists` and `test_save` to omit `default=` so the roundtrip stops exercising the deprecated YAML-default path unguarded (Greptile note). - Wrap `test_resolve_model_provider_registry_with_explicit_default`, `test_get_provider`, and `test_init_user_supplied_providers_preserve_first_wins_over_yaml_default` in `pytest.warns` so the suite stays green under `-W error::DeprecationWarning` (Greptile note). - Add `test_explicit_default_none_does_not_emit_deprecation_warning` to pin the tightened predicate. - Add `test_init_yaml_default_emits_single_deprecation_warning` to pin the cascade-dedup behavior. Refs #589 Made-with: Cursor
1 parent 741c278 commit 17a48ac

9 files changed

Lines changed: 221 additions & 30 deletions

File tree

packages/data-designer-config/src/data_designer/config/models.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55

66
import json
77
import logging
8-
import warnings
98
from abc import ABC, abstractmethod
109
from enum import Enum
1110
from pathlib import Path
@@ -32,6 +31,7 @@
3231
load_image_path_to_base64,
3332
)
3433
from data_designer.config.utils.io_helpers import smart_load_yaml
34+
from data_designer.config.utils.warning_helpers import warn_at_caller
3535

3636
logger = logging.getLogger(__name__)
3737

@@ -542,12 +542,17 @@ def _convert_inference_parameters(cls, value: Any) -> Any:
542542
@model_validator(mode="after")
543543
def _warn_on_implicit_provider(self) -> Self:
544544
if self.provider is None:
545-
msg = (
545+
# Use ``warn_at_caller`` so the warning is attributed to the user's
546+
# ``ModelConfig(...)`` / ``model_validate(...)`` call rather than a
547+
# pydantic-internal frame. Without this, every call dedupes to the
548+
# same pydantic line and only the first emission is shown. See
549+
# PR #594 review.
550+
warn_at_caller(
546551
f"ModelConfig.provider=None is deprecated and will be required in a future release. "
547552
f"Specify provider= explicitly on ModelConfig(alias={self.alias!r}, ...). "
548-
"See issue #589."
553+
"See issue #589.",
554+
DeprecationWarning,
549555
)
550-
warnings.warn(msg, DeprecationWarning, stacklevel=2)
551556
return self
552557

553558

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
"""Helpers for emitting warnings that attribute correctly to user code.
5+
6+
Pydantic v2 dispatches ``@model_validator`` / ``@field_validator`` callbacks
7+
through several internal frames. ``warnings.warn(stacklevel=N)`` from inside a
8+
validator therefore tends to land inside pydantic's machinery rather than at
9+
the user's ``ModelConfig(...)`` / ``ModelProviderRegistry(...)`` call site.
10+
11+
That breaks two things:
12+
13+
1. Attribution — the displayed source location is unhelpful.
14+
2. Deduplication — Python's default once-per-location dedup key is
15+
``(category, module, lineno)``. When every call resolves to the same
16+
pydantic-internal line, every warning after the first is silently
17+
suppressed regardless of which user file triggered it.
18+
19+
``warn_at_caller`` walks the stack to the first frame outside pydantic (and
20+
outside this helper / the calling validator) and uses
21+
``warnings.warn_explicit`` to attribute the warning there.
22+
"""
23+
24+
from __future__ import annotations
25+
26+
import sys
27+
import warnings
28+
29+
30+
def warn_at_caller(message: str, category: type[Warning]) -> None:
31+
"""Emit ``message`` attributed to the first non-pydantic frame above the caller.
32+
33+
Intended to be invoked from inside a pydantic validator. The walk skips this
34+
helper's own frame and the calling validator's frame, then walks past any
35+
pydantic-internal frames until it finds the user's call site.
36+
37+
The user frame's ``__warningregistry__`` is passed to
38+
``warnings.warn_explicit`` so Python's built-in once-per-location dedup keys
39+
on the *user's* (filename, lineno) rather than an internal pydantic frame.
40+
That matches how ``warnings.warn`` would dedup if ``stacklevel`` could
41+
correctly point at the user.
42+
43+
We deliberately do *not* pass ``module_globals`` — it's only used for
44+
``linecache`` source-line display, and for scripts run with ``python -c``
45+
(where the user frame's ``__loader__`` is ``BuiltinImporter`` for
46+
``__main__``) the lookup raises ``ImportError("'__main__' is not a built-in
47+
module")``. Skipping ``module_globals`` keeps the warning path robust at
48+
the cost of an empty source line in the formatted output.
49+
"""
50+
# Skip frame 0 (warn_at_caller) and frame 1 (the validator that called us).
51+
frame = sys._getframe(2) if hasattr(sys, "_getframe") else None
52+
while frame is not None:
53+
module_name = frame.f_globals.get("__name__", "")
54+
if not module_name.startswith("pydantic"):
55+
warnings.warn_explicit(
56+
message,
57+
category,
58+
frame.f_code.co_filename,
59+
frame.f_lineno,
60+
module=module_name,
61+
registry=frame.f_globals.setdefault("__warningregistry__", {}),
62+
)
63+
return
64+
frame = frame.f_back
65+
66+
# Fallback: never escaped pydantic frames (or no frame access). Use stacklevel.
67+
warnings.warn(message, category, stacklevel=3)

packages/data-designer-config/tests/config/test_models.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -494,6 +494,17 @@ def test_model_config_provider_none_emits_deprecation_warning():
494494
ModelConfig(alias="legacy", model="legacy-model", provider=None)
495495

496496

497+
def test_model_config_provider_none_via_model_validate_emits_deprecation_warning():
498+
"""Regression for #589 / PR #594 review: deserialising legacy on-disk configs
499+
via ``ModelConfig.model_validate(...)`` must surface the same
500+
``DeprecationWarning`` as direct construction. Both paths funnel through
501+
the same validator today, so this pin protects against a future refactor
502+
that, e.g., only runs the validator on construction and not on revalidation.
503+
"""
504+
with pytest.warns(DeprecationWarning, match="ModelConfig.provider=None is deprecated"):
505+
ModelConfig.model_validate({"alias": "legacy", "model": "legacy-model"})
506+
507+
497508
def test_model_config_with_provider_does_not_warn():
498509
"""Pin the post-deprecation happy path: specifying ``provider=`` must not
499510
emit any deprecation warning.

packages/data-designer-engine/src/data_designer/engine/model_provider.py

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,14 +3,14 @@
33

44
from __future__ import annotations
55

6-
import warnings
76
from functools import cached_property
87

98
from pydantic import BaseModel, field_validator, model_validator
109
from typing_extensions import Self
1110

1211
from data_designer.config.mcp import MCPProviderT
1312
from data_designer.config.models import ModelProvider
13+
from data_designer.config.utils.warning_helpers import warn_at_caller
1414
from data_designer.engine.errors import NoModelProvidersError, UnknownProviderError
1515

1616

@@ -56,17 +56,21 @@ def check_default_exists(self) -> Self:
5656

5757
@model_validator(mode="after")
5858
def _warn_on_explicit_default(self) -> Self:
59-
# Fires only when the caller actually passed ``default=`` (not when it's
60-
# left at the field default of None). ``resolve_model_provider_registry``
61-
# avoids passing ``default=`` in the single-provider case so common
62-
# construction paths stay quiet. See issue #589.
63-
if "default" in self.model_fields_set:
64-
warnings.warn(
59+
# Fires only when the caller actually passed a non-None ``default=``.
60+
# The ``model_fields_set`` guard distinguishes "caller opted into the
61+
# deprecated field" from "field at its default value of None", and the
62+
# ``self.default is not None`` clause additionally lets callers
63+
# explicitly opt *out* via ``default=None`` without tripping the
64+
# warning. ``resolve_model_provider_registry`` avoids passing
65+
# ``default=`` in the single-provider case so common construction paths
66+
# stay quiet. ``warn_at_caller`` keeps attribution and dedup correct
67+
# under pydantic's validator dispatch. See issue #589 / PR #594 review.
68+
if "default" in self.model_fields_set and self.default is not None:
69+
warn_at_caller(
6570
"ModelProviderRegistry.default is deprecated and will be removed in a "
6671
"future release. Specify provider= explicitly on each ModelConfig "
6772
"instead of relying on a registry-level default. See issue #589.",
6873
DeprecationWarning,
69-
stacklevel=2,
7074
)
7175
return self
7276

packages/data-designer-engine/tests/engine/test_model_provider.py

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,10 +58,14 @@ def test_no_duplicate_provider_names(stub_foo_provider: ModelProvider):
5858

5959

6060
def test_get_provider(stub_foo_provider: ModelProvider, stub_bar_provider: ModelProvider):
61-
registry = ModelProviderRegistry(
62-
providers=[stub_foo_provider, stub_bar_provider],
63-
default="foo",
64-
)
61+
# Multi-provider construction with an explicit default exercises the #589
62+
# deprecation path; wrap so this test stays green if the project ever runs
63+
# with ``-W error::DeprecationWarning``.
64+
with pytest.warns(DeprecationWarning, match="ModelProviderRegistry.default is deprecated"):
65+
registry = ModelProviderRegistry(
66+
providers=[stub_foo_provider, stub_bar_provider],
67+
default="foo",
68+
)
6569

6670
assert registry.get_provider(None) == stub_foo_provider
6771
assert registry.get_provider("foo") == stub_foo_provider
@@ -82,8 +86,15 @@ def test_resolve_model_provider_registry(stub_foo_provider: ModelProvider) -> No
8286
def test_resolve_model_provider_registry_with_explicit_default(
8387
stub_foo_provider: ModelProvider, stub_bar_provider: ModelProvider
8488
) -> None:
85-
"""Test resolve_model_provider_registry with explicit default."""
86-
registry = resolve_model_provider_registry([stub_foo_provider, stub_bar_provider], default_provider_name="bar")
89+
"""Test resolve_model_provider_registry with explicit default.
90+
91+
The multi-provider/explicit-default path is the deprecated one (see #589),
92+
so the construction emits a ``DeprecationWarning``. Wrap the call in
93+
``pytest.warns`` so this test stays green if the project ever runs under
94+
``-W error::DeprecationWarning``.
95+
"""
96+
with pytest.warns(DeprecationWarning, match="ModelProviderRegistry.default is deprecated"):
97+
registry = resolve_model_provider_registry([stub_foo_provider, stub_bar_provider], default_provider_name="bar")
8798

8899
assert registry.get_default_provider_name() == "bar"
89100

packages/data-designer/src/data_designer/cli/repositories/provider_repository.py

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,24 @@ def load(self) -> ModelProviderRegistry | None:
3636

3737
try:
3838
config_dict = load_config_file(self.config_file)
39-
if config_dict.get("default") is not None:
40-
warnings.warn(
41-
f"The 'default:' key in {self.config_file} is deprecated and will be "
42-
"removed in a future release. Remove it and refer to providers by name "
43-
"in your ModelConfig entries. See issue #589.",
44-
DeprecationWarning,
45-
stacklevel=2,
46-
)
39+
except Exception:
40+
return None
41+
42+
# Emit the deprecation warning *outside* the validation try/except below.
43+
# ``DeprecationWarning`` is an ``Exception`` subclass, so under
44+
# ``filterwarnings("error", DeprecationWarning)`` a warn raised inside
45+
# the catch-all would be silently swallowed and ``load`` would drop the
46+
# registry. See PR #594 review.
47+
if config_dict.get("default") is not None:
48+
warnings.warn(
49+
f"The 'default:' key in {self.config_file} is deprecated and will "
50+
"be removed in a future release. Remove it and specify provider= "
51+
"explicitly on each ModelConfig instead. See issue #589.",
52+
DeprecationWarning,
53+
stacklevel=2,
54+
)
55+
56+
try:
4757
return ModelProviderRegistry.model_validate(config_dict)
4858
except Exception:
4959
return None

packages/data-designer/src/data_designer/interface/data_designer.py

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
from __future__ import annotations
55

66
import logging
7+
import warnings
78
from pathlib import Path
89
from typing import TYPE_CHECKING
910

@@ -163,7 +164,21 @@ def __init__(
163164
self._model_providers = self._resolve_model_providers(model_providers)
164165
default_provider_name = None
165166
self._mcp_providers = mcp_providers or []
166-
self._model_provider_registry = resolve_model_provider_registry(self._model_providers, default_provider_name)
167+
# When the YAML carries a default, ``get_default_provider_name`` already
168+
# nudged the user with a ``DeprecationWarning``. Building the registry
169+
# below would re-fire ``ModelProviderRegistry._warn_on_explicit_default``
170+
# for the same root cause, so suppress that second warning. See PR #594
171+
# review.
172+
with warnings.catch_warnings():
173+
if default_provider_name is not None:
174+
warnings.filterwarnings(
175+
"ignore",
176+
message="ModelProviderRegistry.default is deprecated",
177+
category=DeprecationWarning,
178+
)
179+
self._model_provider_registry = resolve_model_provider_registry(
180+
self._model_providers, default_provider_name
181+
)
167182
self._seed_reader_registry = SeedReaderRegistry(readers=seed_readers or DEFAULT_SEED_READERS)
168183

169184
@property

packages/data-designer/tests/cli/repositories/test_provider_repository.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,19 +23,25 @@ def test_load_does_not_exist():
2323

2424

2525
def test_load_exists(tmp_path: Path, stub_model_providers: list[ModelProvider]):
26+
# Roundtrip test for the load/save cycle. We deliberately leave ``default``
27+
# unset so this test does not exercise the deprecated YAML ``default:`` path
28+
# — that path is covered by ``test_load_with_yaml_default_emits_deprecation_warning``
29+
# below. See issue #589.
2630
providers_file_path = tmp_path / MODEL_PROVIDERS_FILE_NAME
2731
save_config_file(
2832
providers_file_path,
29-
ModelProviderRegistry(providers=stub_model_providers, default=stub_model_providers[0].name).model_dump(),
33+
ModelProviderRegistry(providers=stub_model_providers).model_dump(exclude_none=True),
3034
)
3135
repository = ProviderRepository(tmp_path)
3236
assert repository.load() is not None
3337
assert repository.load().providers == stub_model_providers
3438

3539

3640
def test_save(tmp_path: Path, stub_model_providers: list[ModelProvider]):
41+
# As above, leave ``default`` unset so the roundtrip stays clear of the
42+
# YAML-default deprecation. See issue #589.
3743
repository = ProviderRepository(tmp_path)
38-
repository.save(ModelProviderRegistry(providers=stub_model_providers, default=stub_model_providers[0].name))
44+
repository.save(ModelProviderRegistry(providers=stub_model_providers))
3945
assert repository.load() is not None
4046
assert repository.load().providers == stub_model_providers
4147

packages/data-designer/tests/interface/test_data_designer.py

Lines changed: 63 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
import contextlib
77
import json
88
import logging
9+
import warnings
910
from datetime import datetime
1011
from pathlib import Path
1112
from typing import Any
@@ -464,7 +465,13 @@ def test_init_user_supplied_providers_preserve_first_wins_over_yaml_default(
464465
),
465466
]
466467

467-
with patch.object(dd_mod, "get_default_provider_name", return_value="second-provider"):
468+
# Multi-provider construction (user-supplied list of length > 1) still
469+
# passes ``default=`` to ``ModelProviderRegistry`` — that's the deprecated
470+
# path under #589 — so the registry-level deprecation fires here.
471+
with (
472+
patch.object(dd_mod, "get_default_provider_name", return_value="second-provider"),
473+
pytest.warns(DeprecationWarning, match="ModelProviderRegistry.default is deprecated"),
474+
):
468475
data_designer = DataDesigner(
469476
artifact_path=stub_artifact_path,
470477
model_providers=user_providers,
@@ -513,6 +520,61 @@ def test_init_no_user_providers_uses_yaml_default(
513520
assert data_designer.model_provider_registry.get_default_provider_name() == "yaml-second"
514521

515522

523+
def test_init_yaml_default_emits_single_deprecation_warning(
524+
stub_artifact_path: Path,
525+
stub_managed_assets_path: Path,
526+
) -> None:
527+
"""Regression for PR #594 review: when ``DataDesigner()`` falls back to the
528+
YAML's ``providers:`` and ``default:``, the user should see a single
529+
``DeprecationWarning`` (the YAML one) rather than a duplicate cascade where
530+
``ModelProviderRegistry._warn_on_explicit_default`` also fires for the same
531+
root cause. See issue #589.
532+
"""
533+
yaml_providers = [
534+
ModelProvider(
535+
name="yaml-first",
536+
endpoint="https://yaml-first.example.com/v1",
537+
api_key="yaml-first-key",
538+
),
539+
ModelProvider(
540+
name="yaml-second",
541+
endpoint="https://yaml-second.example.com/v1",
542+
api_key="yaml-second-key",
543+
),
544+
]
545+
546+
with warnings.catch_warnings(record=True) as caught:
547+
warnings.simplefilter("always", DeprecationWarning)
548+
with (
549+
patch.object(dd_mod, "get_default_providers", return_value=yaml_providers),
550+
patch.object(dd_mod, "get_default_provider_name") as mock_get_default,
551+
):
552+
mock_get_default.side_effect = lambda: (
553+
warnings.warn(
554+
"The 'default:' key in /fake/path is deprecated and will "
555+
"be removed in a future release. Remove it and specify provider= "
556+
"explicitly on each ModelConfig instead. See issue #589.",
557+
DeprecationWarning,
558+
stacklevel=2,
559+
)
560+
or "yaml-second"
561+
)
562+
DataDesigner(
563+
artifact_path=stub_artifact_path,
564+
secret_resolver=PlaintextResolver(),
565+
managed_assets_path=stub_managed_assets_path,
566+
)
567+
568+
deprecation_messages = [str(w.message) for w in caught if issubclass(w.category, DeprecationWarning)]
569+
yaml_default_warnings = [m for m in deprecation_messages if "'default:' key" in m]
570+
registry_default_warnings = [m for m in deprecation_messages if "ModelProviderRegistry.default is deprecated" in m]
571+
assert len(yaml_default_warnings) == 1, deprecation_messages
572+
assert registry_default_warnings == [], (
573+
"Registry-level deprecation should be suppressed in the YAML-fallback path "
574+
"to avoid two warnings for the same root cause."
575+
)
576+
577+
516578
def test_run_config_setting_persists(stub_artifact_path, stub_model_providers):
517579
"""Test that run config setting persists across multiple calls."""
518580
data_designer = DataDesigner(artifact_path=stub_artifact_path, model_providers=stub_model_providers)

0 commit comments

Comments
 (0)