Skip to content

Commit 7181db3

Browse files
authored
chore: lazy 3rd party imports (#222)
1 parent 1ee37bc commit 7181db3

169 files changed

Lines changed: 998 additions & 250 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,3 +93,6 @@ docs/notebook_source/*.csv
9393
docs/**/artifacts/
9494

9595
tests_e2e/uv.lock
96+
97+
# Performance profiling
98+
perf_*.txt

AGENTS.md

Lines changed: 143 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -158,12 +158,13 @@ Type annotations are REQUIRED for all code in this project. This is strictly enf
158158
### Import Style
159159

160160
- **ALWAYS** use absolute imports, never relative imports
161-
- Place imports at module level, not inside functions
161+
- Place imports at module level, not inside functions (exception: it is unavoidable for performance reasons)
162162
- Import sorting is handled by `ruff`'s `isort` - imports should be grouped and sorted:
163163
1. Standard library imports
164-
2. Third-party imports
164+
2. Third-party imports (use `lazy_heavy_imports` for heavy libraries)
165165
3. First-party imports (`data_designer`)
166166
- Use standard import conventions (enforced by `ICN`)
167+
- See [Lazy Loading and TYPE_CHECKING](#lazy-loading-and-type_checking) section for optimization guidelines
167168

168169
```python
169170
# Good
@@ -184,6 +185,146 @@ Type annotations are REQUIRED for all code in this project. This is strictly enf
184185
path = Path(filename)
185186
```
186187

188+
### Lazy Loading and TYPE_CHECKING
189+
190+
This project uses lazy loading for heavy third-party dependencies to optimize import performance.
191+
192+
#### When to Use Lazy Loading
193+
194+
**Heavy third-party libraries** (>100ms import cost) should be lazy-loaded via `lazy_heavy_imports.py`:
195+
196+
```python
197+
# ❌ Don't import directly
198+
import pandas as pd
199+
import numpy as np
200+
201+
# ✅ Use lazy loading with IDE support
202+
from typing import TYPE_CHECKING
203+
from data_designer.lazy_heavy_imports import pd, np
204+
205+
if TYPE_CHECKING:
206+
import pandas as pd # For IDE autocomplete and type hints
207+
import numpy as np
208+
```
209+
210+
This pattern provides:
211+
- Runtime lazy loading (fast startup)
212+
- Full IDE support (autocomplete, type hints)
213+
- Type checker validation
214+
215+
**See [lazy_heavy_imports.py](src/data_designer/lazy_heavy_imports.py) for the current list of lazy-loaded libraries.**
216+
217+
#### Adding New Heavy Dependencies
218+
219+
If you add a new dependency with significant import cost (>100ms):
220+
221+
1. **Add to `lazy_heavy_imports.py`:**
222+
```python
223+
_LAZY_IMPORTS = {
224+
# ... existing entries ...
225+
"your_lib": "your_library_name",
226+
}
227+
```
228+
229+
2. **Update imports across codebase:**
230+
```python
231+
from typing import TYPE_CHECKING
232+
from data_designer.lazy_heavy_imports import your_lib
233+
234+
if TYPE_CHECKING:
235+
import your_library_name as your_lib # For IDE support
236+
```
237+
238+
3. **Verify with performance test:**
239+
```bash
240+
make perf-import CLEAN=1
241+
```
242+
243+
#### Using TYPE_CHECKING Blocks
244+
245+
`TYPE_CHECKING` blocks defer imports that are only needed for type hints, preventing circular dependencies and reducing import time.
246+
247+
**For internal data_designer imports:**
248+
249+
```python
250+
from __future__ import annotations # Always include at top
251+
252+
from typing import TYPE_CHECKING
253+
254+
# Runtime imports
255+
from pathlib import Path
256+
from data_designer.config.base import ConfigBase
257+
258+
if TYPE_CHECKING:
259+
# Type-only imports - only visible to type checkers
260+
from data_designer.engine.models.facade import ModelFacade
261+
262+
def get_model(model: ModelFacade) -> str:
263+
return model.name
264+
```
265+
266+
**For lazy-loaded libraries (see pattern in "When to Use Lazy Loading" above):**
267+
- Import from `lazy_heavy_imports` for runtime
268+
- Add full import in `TYPE_CHECKING` block for IDE support
269+
270+
**Rules for TYPE_CHECKING:**
271+
272+
**DO put in TYPE_CHECKING:**
273+
- Internal `data_designer` imports used **only** in type hints
274+
- Imports that would cause circular dependencies
275+
- **Full imports of lazy-loaded libraries for IDE support** (e.g., `import pandas as pd` in addition to runtime `from data_designer.lazy_heavy_imports import pd`)
276+
277+
**DON'T put in TYPE_CHECKING:**
278+
- **Standard library imports** (`Path`, `Any`, `Callable`, `Literal`, `TypeAlias`, etc.)
279+
- **Pydantic model types** used in field definitions (needed at runtime for validation)
280+
- **Types used in discriminated unions** (Pydantic needs them at runtime)
281+
- **Any import used at runtime** (instantiation, method calls, base classes, etc.)
282+
283+
**Examples:**
284+
285+
```python
286+
# ✅ CORRECT - Lazy-loaded library with IDE support
287+
from typing import TYPE_CHECKING
288+
from data_designer.lazy_heavy_imports import pd
289+
290+
if TYPE_CHECKING:
291+
import pandas as pd # IDE gets full type hints
292+
293+
def load_data(path: str) -> pd.DataFrame: # IDE understands pd.DataFrame
294+
return pd.read_csv(path)
295+
296+
# ✅ CORRECT - Standard library NOT in TYPE_CHECKING
297+
from pathlib import Path
298+
from typing import Any
299+
300+
def process_file(path: Path) -> Any:
301+
return path.read_text()
302+
303+
# ✅ CORRECT - Internal type-only import
304+
from typing import TYPE_CHECKING
305+
306+
if TYPE_CHECKING:
307+
from data_designer.engine.models.facade import ModelFacade
308+
309+
def get_model(model: ModelFacade) -> str: # Only used in type hint
310+
return model.name
311+
312+
# ❌ INCORRECT - Pydantic field type in TYPE_CHECKING
313+
from typing import TYPE_CHECKING
314+
315+
if TYPE_CHECKING:
316+
from data_designer.config.models import ModelConfig # Wrong!
317+
318+
class MyConfig(BaseModel):
319+
model: ModelConfig # Pydantic needs this at runtime!
320+
321+
# ✅ CORRECT - Pydantic field type at runtime
322+
from data_designer.config.models import ModelConfig
323+
324+
class MyConfig(BaseModel):
325+
model: ModelConfig
326+
```
327+
187328
### Naming Conventions (PEP 8)
188329

189330
Follow PEP 8 naming conventions:

Makefile

Lines changed: 45 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -45,14 +45,25 @@ help:
4545
@echo " check-license-headers - Check if all files have license headers"
4646
@echo " update-license-headers - Add license headers to all files"
4747
@echo ""
48+
@echo "⚡ Performance:"
49+
@echo " perf-import - Profile import time and show summary"
50+
@echo " perf-import CLEAN=1 - Clean cache, then profile import time"
51+
@echo " perf-import NOFILE=1 - Profile without writing to file (for CI)"
52+
@echo ""
4853
@echo "═════════════════════════════════════════════════════════════"
4954
@echo "💡 Tip: Run 'make <command>' to execute any command above"
5055
@echo ""
5156

52-
clean:
53-
@echo "🧹 Cleaning up coverage reports and cache files..."
54-
rm -rf htmlcov .coverage .pytest_cache
57+
clean-pycache:
58+
@echo "🧹 Cleaning up Python cache files..."
5559
find . -type d -name __pycache__ -exec rm -rf {} + 2>/dev/null || true
60+
find . -type f -name "*.pyc" -delete 2>/dev/null || true
61+
@echo "✅ Cache cleaned!"
62+
63+
clean: clean-pycache
64+
@echo "🧹 Cleaning up coverage reports and test cache..."
65+
rm -rf htmlcov .coverage .pytest_cache
66+
@echo "✅ Cleaned!"
5667

5768
coverage:
5869
@echo "📊 Running tests with coverage analysis..."
@@ -168,4 +179,34 @@ install-dev-notebooks:
168179
$(call install-pre-commit-hooks)
169180
@echo "✅ Dev + notebooks installation complete!"
170181

171-
.PHONY: clean coverage format format-check lint lint-fix test test-e2e test-run-tutorials test-run-recipes test-run-all-examples check-license-headers update-license-headers check-all check-all-fix install install-dev install-dev-notebooks generate-colab-notebooks
182+
perf-import:
183+
ifdef CLEAN
184+
@$(MAKE) clean-pycache
185+
endif
186+
@echo "⚡ Profiling import time for data_designer.essentials..."
187+
ifdef NOFILE
188+
@PERF_OUTPUT=$$(uv run python -X importtime -c "import data_designer.essentials" 2>&1); \
189+
echo "$$PERF_OUTPUT"; \
190+
echo ""; \
191+
echo "Summary:"; \
192+
echo "$$PERF_OUTPUT" | tail -1 | awk '{printf " Total: %.3fs\n", $$5/1000000}'; \
193+
echo ""; \
194+
echo "💡 Top 10 slowest imports:"; \
195+
printf "%-12s %-12s %s\n" "Self (s)" "Cumulative (s)" "Module"; \
196+
printf "%-12s %-12s %s\n" "--------" "--------------" "------"; \
197+
echo "$$PERF_OUTPUT" | grep "import time:" | sort -rn -k5 | head -10 | awk '{printf "%-12.3f %-12.3f %s", $$3/1000000, $$5/1000000, $$7; for(i=8;i<=NF;i++) printf " %s", $$i; printf "\n"}'
198+
else
199+
@PERF_FILE="perf_import_$$(date +%Y%m%d_%H%M%S).txt"; \
200+
uv run python -X importtime -c "import data_designer.essentials" > "$$PERF_FILE" 2>&1; \
201+
echo "📊 Import profile saved to $$PERF_FILE"; \
202+
echo ""; \
203+
echo "Summary:"; \
204+
tail -1 "$$PERF_FILE" | awk '{printf " Total: %.3fs\n", $$5/1000000}'; \
205+
echo ""; \
206+
echo "💡 Top 10 slowest imports:"; \
207+
printf "%-12s %-12s %s\n" "Self (s)" "Cumulative (s)" "Module"; \
208+
printf "%-12s %-12s %s\n" "--------" "--------------" "------"; \
209+
grep "import time:" "$$PERF_FILE" | sort -rn -k5 | head -10 | awk '{printf "%-12.3f %-12.3f %s", $$3/1000000, $$5/1000000, $$7; for(i=8;i<=NF;i++) printf " %s", $$i; printf "\n"}'
210+
endif
211+
212+
.PHONY: clean clean-pycache coverage format format-check lint lint-fix test test-e2e test-run-tutorials test-run-recipes test-run-all-examples check-license-headers update-license-headers check-all check-all-fix install install-dev install-dev-notebooks generate-colab-notebooks perf-import

src/data_designer/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
# SPDX-License-Identifier: Apache-2.0
33

4+
from __future__ import annotations
5+
46
try:
57
from data_designer._version import __version__
68
except ImportError:

src/data_designer/cli/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
# SPDX-License-Identifier: Apache-2.0
33

4+
from __future__ import annotations
5+
46
from data_designer.cli.main import app, main
57

68
__all__ = ["app", "main"]

src/data_designer/cli/commands/download.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
# SPDX-License-Identifier: Apache-2.0
33

4+
from __future__ import annotations
5+
46
import typer
57

68
from data_designer.cli.controllers.download_controller import DownloadController

src/data_designer/cli/commands/list.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
# SPDX-License-Identifier: Apache-2.0
33

4+
from __future__ import annotations
5+
46
from rich.table import Table
57

68
from data_designer.cli.repositories.model_repository import ModelRepository

src/data_designer/cli/commands/models.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
# SPDX-License-Identifier: Apache-2.0
33

4+
from __future__ import annotations
5+
46
from data_designer.cli.controllers.model_controller import ModelController
57
from data_designer.config.utils.constants import DATA_DESIGNER_HOME
68

src/data_designer/cli/commands/providers.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
# SPDX-License-Identifier: Apache-2.0
33

4+
from __future__ import annotations
5+
46
from data_designer.cli.controllers.provider_controller import ProviderController
57
from data_designer.config.utils.constants import DATA_DESIGNER_HOME
68

src/data_designer/cli/commands/reset.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
22
# SPDX-License-Identifier: Apache-2.0
33

4+
from __future__ import annotations
5+
46
import typer
57

68
from data_designer.cli.repositories.model_repository import ModelRepository

0 commit comments

Comments
 (0)