Skip to content

Commit d14c9b3

Browse files
authored
feat(cli): add plugin catalog core (#618)
* feat(cli): add plugin catalog services Add typed catalog and tap models, persistent tap storage, cached catalog loading, compatibility evaluation, install plan generation, and runtime plugin discovery helpers. Refs #617 * feat(cli): add plugins command group Wire list, search, info, install, installed, and tap management commands through the existing command-controller CLI pattern. Refs #617 * test(cli): cover plugin catalog workflows Add regression coverage for tap caching, catalog compatibility, installer command generation, local path resolution, and Typer command delegation. Refs #617 * fix(cli): align plugin taps with schema v2 Validate tap catalogs against the schema v2 contract used by NVIDIA-NeMo/DataDesignerPlugins#36, including source union fields, docs URLs, package paths, compatibility metadata, and unique runtime plugin names. Derive Git install targets as package-qualified PEP 508 direct references so git tap entries install the package described by the catalog source metadata. Refs #617 * fix(cli): address plugin review feedback - Invalidate import caches before post-install entry point verification - Make tap aliases case-insensitive and cache catalogs by alias plus URL - Prefer compatible catalog entries before falling back to forced installs - Clarify unused --tap behavior and list installed entry points without imports - Add direct controller coverage and update CLI plugin documentation Refs #617 * fix(cli): gate incompatible plugin installs Fetch install targets before compatibility filtering so the controller owns the final --force decision and the incompatible install guard stays reachable. Refs #617 * style(cli): format plugin catalog files Apply ruff formatting to the plugin command and tap repository tests so CI format checks pass on the PR merge commit. Refs #617 * fix(cli): reject duplicate plugin entry names Key catalog duplicate detection by entry_point.name so distinct catalog entries cannot register the same runtime plugin name. Refs #617 * fix(cli): preserve GitHub tree tap paths * fix(cli): verify plugin entry point names * align plugin CLI with catalog schema - adopt catalog terminology for plugin source aliases - parse package-first plugin catalog metadata from the plugin repo - install package requirements with optional catalog indexes * tidy plugin catalog workflow docs * align plugin catalog CLI with package contract * add plugin package uninstall workflow * test plugin package command targets * document plugin package aliases * address plugin catalog review feedback * prefer runtime plugin lookup matches * rename plugins command to plugin * show plugin package descriptions * rename plugin catalogs command * add protected plugin package installs * document plugin package install modes * avoid building project during plugin installs * harden plugin package installs * tighten plugin catalog contracts * fix no-args help exit code * make plugin docs links robust * document plugin CLI catalog workflows * clarify plugin entry point verification * simplify plugin CLI docs * narrow plugin search fields * hide plugin catalog cache ttl * remove plugin catalog trust flag * improve plugin CLI recovery UX * polish plugin catalog table display * stabilize plugin catalog table test * tighten plugin catalog edge cases * harden plugin catalog verification - Escape catalog-provided Rich markup before rendering CLI output - Reject runtime plugin names that collide after enum-key normalization - Load installed runtime entry points in a subprocess before reporting success * simplify plugin entry point verification Load matching entry points directly after install instead of spawning a separate Python process. This keeps the check package-scoped while still catching broken entry-point targets and non-Plugin objects. * require newer uv for plugin plans Use uv >= 0.10.0 as the single supported uv requirement for plugin package commands. Auto mode now falls back to a pip plan with an upgrade warning when uv is unavailable or too old, while explicit uv selection remains strict. * verify pip fallback availability * polish plugin CLI status markers * clarify plugin compatibility labels * simplify plugin info install details * address plugin CLI review nits * support versioned plugin package installs * share plugin install metadata rendering * show installed plugin packages * harden versioned plugin installs - Preserve catalog requirement constraints for versioned installs - Remove stale install-plan metadata fields - Expand parser, uv, controller, and local-catalog dry-run coverage * harden plugin help tests * show plugin package versions Add package version metadata support for plugin catalogs and resolve current versions from exact requirements or simple indexes when catalog entries omit them. Update plugin list/info/install metadata to show the plugin package version and Data Designer compatibility requirement while removing the separate Data Designer version line. * format plugin catalog tests * harden plugin package metadata checks * harden plugin CLI test coverage * add plugin discovery docs (#642) Signed-off-by: Johnny Greco <jogreco@nvidia.com> --------- Signed-off-by: Johnny Greco <jogreco@nvidia.com>
1 parent 1d203b1 commit d14c9b3

33 files changed

Lines changed: 8478 additions & 68 deletions

architecture/cli.md

Lines changed: 51 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# CLI
22

3-
The CLI (`data-designer`) provides an interactive command-line interface for configuring models, providers, tools, and personas, as well as running dataset generation. It uses a layered architecture for config management and delegates generation to the public `DataDesigner` API.
3+
The CLI (`data-designer`) provides an interactive command-line interface for configuring models, providers, MCP providers, and tools, downloading managed persona datasets, discovering, installing, and uninstalling plugin packages from catalogs, and running dataset generation. It uses a layered architecture for setup workflows and delegates generation to the public `DataDesigner` API.
44

55
Source: `packages/data-designer/src/data_designer/cli/`
66

77
## Overview
88

9-
The CLI is built on Typer with lazy command loading to keep startup fast. Config management commands follow a **command → controller → service → repository** layering pattern. Generation commands bypass this stack and use the public `DataDesigner` class directly.
9+
The CLI is built on Typer with lazy command loading to keep startup fast. Config management and plugin catalog commands follow a **command → controller → service → repository** layering pattern. Generation commands bypass this stack and use the public `DataDesigner` class directly.
1010

1111
## Key Components
1212

@@ -20,9 +20,9 @@ The CLI is built on Typer with lazy command loading to keep startup fast. Config
2020

2121
`create_lazy_typer_group` and `_LazyCommand` stubs defer importing command modules until a command is actually invoked. This keeps `data-designer --help` fast — only the command names and descriptions are loaded eagerly; the full module (and its dependencies) loads on first use.
2222

23-
### Layering Pattern (Config Management)
23+
### Layering Pattern (Setup Workflows)
2424

25-
Config management commands (models, providers, tools, personas) follow a consistent four-layer pattern:
25+
Config management commands (models, providers, MCP providers, tools) follow a consistent four-layer pattern:
2626

2727
| Layer | Role | Example |
2828
|-------|------|---------|
@@ -31,10 +31,22 @@ Config management commands (models, providers, tools, personas) follow a consist
3131
| **Service** | Domain rules: uniqueness, merge, delete-all | `ModelService.add/update/delete` over `ModelRepository` |
3232
| **Repository** | File I/O for typed config registries | `ModelRepository` extends `ConfigRepository[ModelConfigRegistry]` |
3333

34-
Repositories: `ModelRepository`, `ProviderRepository`, `ToolRepository`, `MCPProviderRepository`, `PersonaRepository`.
34+
Repositories: `ModelRepository`, `ProviderRepository`, `MCPProviderRepository`, and `ToolRepository`.
35+
`PersonaRepository` provides read-only locale metadata for managed persona dataset downloads.
3536

3637
Services mirror the repository domains with business logic (validation, conflict resolution).
3738

39+
Plugin catalog commands use the same layering shape:
40+
41+
| Layer | Role | Example |
42+
|-------|------|---------|
43+
| **Command** | Thin Typer entry, wires `DATA_DESIGNER_HOME` and command options | `plugin` subcommands (`list`, `search`, `info`, `install`, `uninstall`, `installed`, `catalog`) → `PluginCatalogController(DATA_DESIGNER_HOME)` |
44+
| **Controller** | UX flow: catalog tables, package metadata, compatibility display, install/uninstall confirmations | `PluginCatalogController` composes catalog + install services |
45+
| **Service** | Domain rules: package listing, compatibility checks, uv/pip install and uninstall commands, runtime entry-point checks | `PluginCatalogService`, `PluginInstallService` |
46+
| **Repository** | File/cache I/O for catalog aliases and catalog documents | `PluginCatalogRepository` |
47+
48+
The built-in `nvidia` catalog points at `https://nvidia-nemo.github.io/DataDesignerPlugins/catalog/plugins.json`. `NVIDIA-NeMo/DataDesignerPlugins` defines the catalog format. Each catalog entry is an installable package with docs, install metadata, compatibility constraints, and one or more runtime plugins. Users install and uninstall packages, not individual runtime plugins. Commands that take a package name also accept the package alias from the `data-designer-{alias}` package-name pattern; for example, `data-designer-calculator` can be addressed as `calculator`. If a user passes a runtime plugin name where a package is required, the CLI reports the package that owns that runtime plugin.
49+
3850
### Generation Commands
3951

4052
`preview`, `create`, and `validate` commands use `GenerationController`, which:
@@ -62,6 +74,37 @@ User invokes command (e.g., `data-designer config models`)
6274
→ Repository reads/writes config files
6375
```
6476

77+
### Plugin Catalog Discovery
78+
```
79+
User invokes command (e.g., `data-designer plugin list`)
80+
→ Command function wires DATA_DESIGNER_HOME and catalog options
81+
→ PluginCatalogController resolves the catalog alias and chooses table or narrow-terminal layout
82+
→ PluginCatalogService loads packages and filters out incompatible packages by default
83+
→ PluginCatalogRepository reads local config and cached/remote catalog JSON
84+
```
85+
86+
### Plugin Install/Uninstall
87+
```
88+
User invokes command (e.g., `data-designer plugin install calculator`)
89+
→ PluginCatalogController resolves the plugin package name or package alias
90+
→ PluginCatalogService evaluates Python and Data Designer compatibility
91+
→ PluginInstallService chooses uv or pip and builds the command.
92+
In active uv projects it uses `uv add` so the package is recorded in
93+
`pyproject.toml`; otherwise it installs into the current Python environment.
94+
Data Designer itself is already installed, so its packages are not reinstalled
95+
or replaced while installing plugin dependencies.
96+
→ PluginInstallService verifies the package's runtime plugin entry points can load
97+
```
98+
99+
```
100+
User invokes command (e.g., `data-designer plugin uninstall calculator`)
101+
→ PluginCatalogController resolves the plugin package name or package alias
102+
→ PluginInstallService chooses uv or pip and builds the uninstall command.
103+
Active uv projects remove the dependency from project metadata and uninstall
104+
the package from the current environment.
105+
→ PluginInstallService verifies the package's runtime plugin entry-point metadata is removed
106+
```
107+
65108
### Generation
66109
```
67110
User invokes command (e.g., `data-designer create config.yaml`)
@@ -73,8 +116,9 @@ User invokes command (e.g., `data-designer create config.yaml`)
73116
## Design Decisions
74117

75118
- **Lazy command loading** keeps `data-designer --help` responsive: command modules (and their heavy dependencies, such as the engine and model stacks) load only when a command is invoked, not at process startup.
76-
- **Controller/service/repo for config, direct API for generation** — config management benefits from the layered pattern (testable services, swappable repositories). Generation doesn't need this indirection; it delegates to the same `DataDesigner` class that Python users call directly.
77-
- **`DATA_DESIGNER_HOME`** centralizes all CLI-managed state (model configs, provider configs, tool configs, personas) in a single directory, defaulting to `~/.data_designer/`.
119+
- **Controller/service/repo for setup workflows, direct API for generation** — config and plugin catalog workflows benefit from the layered pattern (testable services, swappable repositories). Generation doesn't need this indirection; it delegates to the same `DataDesigner` class that Python users call directly.
120+
- **`DATA_DESIGNER_HOME`** centralizes CLI-managed state (model configs, provider configs, MCP provider configs, tool configs, managed assets, plugin catalog aliases, and catalog caches) in a single directory, defaulting to `~/.data-designer/`.
121+
- **Package-first plugin catalogs** match how users install plugins: one package can provide one or more runtime plugins, but install and uninstall commands always target the package.
78122
- **Rich-based UI** provides formatted tables, progress bars, and interactive prompts without requiring a web interface.
79123

80124
## Cross-References

docs/plugins/available.md

Lines changed: 0 additions & 20 deletions
This file was deleted.

docs/plugins/discover.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
# Discover Plugins
2+
3+
The Data Designer CLI is the recommended way to discover and install published plugins. It uses plugin catalogs to show install details and compatibility before installing the selected plugin package into your current environment or active `uv` project.
4+
5+
Plugins are distributed as Python packages. A single package can expose one or more runtime plugins, so the CLI installs and uninstalls packages rather than individual runtime plugin names.
6+
7+
## NVIDIA catalog
8+
9+
The default `nvidia` catalog is maintained in the [DataDesignerPlugins repository](https://github.com/NVIDIA-NeMo/DataDesignerPlugins). You do not need to configure it before using the CLI.
10+
11+
You can also browse the first-party [plugin documentation](https://nvidia-nemo.github.io/DataDesignerPlugins/plugins/) and [plugin package source](https://github.com/NVIDIA-NeMo/DataDesignerPlugins/tree/main/plugins) directly.
12+
13+
## Find a plugin package
14+
15+
When a CLI command requires a plugin package argument, you can pass either the full package name or the package alias. The package alias is the package name without the `data-designer-` prefix. For example, `data-designer-github` can be addressed as `github`.
16+
17+
Start by listing or searching the compatible packages in the default catalog. Search can match package names, package aliases, descriptions, runtime plugin names, and runtime plugin types.
18+
19+
```bash
20+
# List compatible plugin packages from the default NVIDIA catalog
21+
data-designer plugin list
22+
23+
# Search for a package
24+
data-designer plugin search github
25+
26+
# Inspect one package before installing it
27+
data-designer plugin info github
28+
```
29+
30+
## Install a plugin package
31+
32+
Install the package by full package name or package alias:
33+
34+
```bash
35+
data-designer plugin install github
36+
```
37+
38+
After installation, Data Designer discovers the package's `data_designer.plugins` entry points. Use `installed` to see the plugin packages available in the current Python environment and the runtime plugins they expose:
39+
40+
```bash
41+
data-designer plugin installed
42+
```
43+
44+
Uninstall with the same package name or alias:
45+
46+
```bash
47+
data-designer plugin uninstall github
48+
```
49+
50+
!!! note
51+
Plugins are ordinary Python packages. You can still publish a plugin to PyPI or another package index and install it directly with `pip` or `uv`. This is the path we recommend for individual plugin developers from the community. See [Community plugins](#community-plugins) below.
52+
53+
## How catalogs work
54+
55+
A plugin catalog is a JSON file that tells Data Designer which plugin packages are available and how to install them. The catalog can be hosted anywhere that serves raw JSON. Each entry points to an installable Python package and includes its docs URL, Python and Data Designer compatibility requirements, the runtime plugins it exposes after installation, and the installer metadata needed to fetch the package.
56+
57+
The package itself can live in any Python package index, or be referenced with any valid [PEP 508 direct reference](https://packaging.python.org/en/latest/specifications/dependency-specifiers/#direct-references). The package does not have to live in the same repository as the catalog.
58+
59+
The NVIDIA catalog is published at:
60+
61+
```text
62+
https://nvidia-nemo.github.io/DataDesignerPlugins/catalog/plugins.json
63+
```
64+
65+
The NVIDIA plugin packages are served from a PyPI-compatible Python Simple API index published beside that catalog:
66+
67+
```text
68+
https://nvidia-nemo.github.io/DataDesignerPlugins/simple/
69+
```
70+
71+
Catalog discovery and runtime plugin discovery are separate. Reading a catalog lets the CLI show available packages and install plans without importing plugin code. Runtime plugins become available only after their package is installed and Data Designer discovers the package's `data_designer.plugins` entry points.
72+
73+
Other catalogs can follow the same pattern as the NVIDIA plugin repository: publish a raw `catalog/plugins.json` file and, for index-backed packages, a PyPI-compatible hosted package index. Catalog entries can also point to packages on the installer's default index or to direct package references.
74+
75+
## Use another catalog
76+
77+
Add a catalog when a team or community publishes a compatible catalog JSON file. For example, an internal platform team might publish a catalog that lists approved Data Designer plugin packages and points each package at an internal Python package index. Teammates can then add that catalog once and install approved plugins by package name or alias.
78+
79+
Choose a short catalog name and use it with `--catalog`:
80+
81+
```bash
82+
data-designer plugin catalog add <name> <catalog-url-or-path>
83+
data-designer plugin --catalog <name> list
84+
data-designer plugin --catalog <name> install <package-or-alias>
85+
```
86+
87+
For published catalogs, prefer sharing the raw catalog JSON URL. Local catalog files and directories are useful while authoring or testing a catalog before publishing it.
88+
89+
```bash
90+
# See configured catalog names
91+
data-designer plugin catalog list
92+
93+
# Remove a catalog
94+
data-designer plugin catalog remove <name>
95+
```
96+
97+
## Community plugins
98+
99+
We do not have any community plugins to list here yet, but yours could be the first! If you build a plugin that could be useful to other Data Designer users, we would love to hear about it.
100+
101+
To get started, follow the patterns in the [plugin overview](overview.md) and [Build Your Own](build_your_own.md) guides, then publish your plugin package to PyPI. When your plugin is ready, open an issue on the [Data Designer GitHub repository](https://github.com/NVIDIA-NeMo/DataDesigner/issues) with the package name, source repository, documentation link, supported Data Designer versions, and the plugin types it provides. The Data Designer team will review the plugin and add it here if it seems generally useful for the community.

docs/plugins/overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Data Designer supports three plugin types:
1212

1313
## Use an Installed Plugin
1414

15-
Plugin packages register their `Plugin` objects through Python package [entry points](https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/#using-package-metadata). Data Designer discovers installed plugin entry points automatically, so no extra registration code is required. Simply install the plugin package and use its new object types in your Data Designer workflow.
15+
Plugin packages register their `Plugin` objects through Python package [entry points](https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/#using-package-metadata). Data Designer discovers installed plugin entry points automatically, so no extra registration code is required. Once a plugin package is installed, use its new object types in your Data Designer workflow.
1616

1717
If you install a plugin after `data_designer` has already been imported, restart the Python process so plugin discovery can rebuild from the new entry points.
1818

@@ -22,7 +22,7 @@ For implementation instructions across all plugin types, see the [Build Your Own
2222

2323
## Find Plugins
2424

25-
NVIDIA-maintained plugin packages live in the [DataDesignerPlugins](https://github.com/NVIDIA-NeMo/DataDesignerPlugins) repository. See [Available Plugins](available.md) for lists of first-party and community-contributed plugins.
25+
Use the Data Designer CLI to discover and install published plugin packages from catalogs. See [Discover Plugins](discover.md) for the catalog workflow, first-party plugin documentation, and source links.
2626

2727
## Discovery troubleshooting
2828

fern/docs.yml

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -340,9 +340,6 @@ redirects:
340340
destination: "/nemo/datadesigner/plugins/file-system-seed-reader-plugins"
341341
- source: "/nemo/datadesigner/plugins/example"
342342
destination: "/nemo/datadesigner/plugins/example-plugin"
343-
- source: "/nemo/datadesigner/plugins/available"
344-
destination: "/nemo/datadesigner/plugins/available-plugin-list"
345-
346343
# Code Reference: mkdocstrings tree -> Fern Topic Overviews subsection.
347344
# Underscored page names get kebab'd at the page-slug level too (Fern's title
348345
# slugifier drops underscores), so the snake_case modules need per-page rules.

fern/versions/latest.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -137,8 +137,8 @@ navigation:
137137
path: ./v0.5.8/pages/plugins/example.mdx
138138
- page: FileSystemSeedReader Plugins
139139
path: ./v0.5.8/pages/plugins/filesystem_seed_reader.mdx
140-
- page: Available Plugin List
141-
path: ./v0.5.8/pages/plugins/available.mdx
140+
- page: Discover
141+
path: ./v0.5.8/pages/plugins/discover.mdx
142142
- section: Code Reference
143143
contents:
144144
- section: Topic Overviews

fern/versions/v0.5.8.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -115,8 +115,8 @@ navigation:
115115
path: ./v0.5.8/pages/plugins/example.mdx
116116
- page: FileSystemSeedReader Plugins
117117
path: ./v0.5.8/pages/plugins/filesystem_seed_reader.mdx
118-
- page: Available Plugin List
119-
path: ./v0.5.8/pages/plugins/available.mdx
118+
- page: Discover
119+
path: ./v0.5.8/pages/plugins/discover.mdx
120120
- section: Code Reference
121121
contents:
122122
- section: Topic Overviews

fern/versions/v0.5.8/pages/plugins/available.mdx

Lines changed: 0 additions & 6 deletions
This file was deleted.

0 commit comments

Comments
 (0)