Skip to content

Commit 8ecacb8

Browse files
committed
docs: refine plugins dev note
1 parent e606458 commit 8ecacb8

2 files changed

Lines changed: 45 additions & 34 deletions

File tree

docs/devnotes/posts/have-it-your-way.md

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -45,15 +45,15 @@ config_builder.add_column(
4545
config_builder.add_processor(RobotSFTProcessor(output_column="messages"))
4646
```
4747

48-
That is the point of plugins: install a package, import its config classes, and keep the workflow declarative. The Isaac run reader, event labeler, and trainer-format processor own the custom parsing, labeling, validation, and export shape, while Data Designer still handles discovery, dependency ordering, model calls, previews, and output.
48+
That is the point of plugins: install a package, import its config classes, and keep the workflow declarative. The Isaac run reader, event labeler, and trainer-format processor own the project-specific parsing and trainer-facing shape. Data Designer still does the framework work, from component discovery and dependency ordering to model execution and output handling.
4949

5050
---
5151

5252
## **Customization Is the Normal Case**
5353

5454
![A confused engineer trying to fit custom building blocks into the wrong framework slots](assets/have-it-your-way/customization-blocks-confusion.png){ .devnote-section-graphic }
5555

56-
The mess usually starts innocently. A team defines a Data Designer config, then discovers that its seed data lives in an internal layout, its generated column needs a domain simulator, and its trainer expects a slightly different record shape. Someone writes a small reader beside the notebook. Someone patches a generator into a project folder. Someone adds a cleanup script after preview because the final export has one more organization-specific rule. Each choice is reasonable because every project has its own corpus, policy, ontology, simulator, and training stack.
56+
The mess usually starts innocently. A team defines a Data Designer config, then discovers that its seed data lives in an internal layout, its generated column needs a domain simulator, and its trainer expects a slightly different record shape. Someone writes a small reader beside the notebook. Someone patches a generator into a project folder. Someone adds a cleanup script after preview because the final export has one more organization-specific rule. Each choice is reasonable because every project brings a different corpus, policy model, domain vocabulary, or training stack.
5757

5858
The problem is that the custom behavior now lives around Data Designer instead of inside the Data Designer workflow. It is harder to validate, harder to share, harder to version, and easier to lose. Plugins give that bespoke work a clean package boundary – a name, typed config, runtime implementation, entry point, and tests that travel together. Users still declare the dataset they want, but the local reader, domain generator, or trainer-format processor becomes a normal Data Designer component instead of another layer of glue.
5959

@@ -75,7 +75,7 @@ The first plugin boundaries match the places where real projects most often need
7575

7676
</div>
7777

78-
These boundaries are intentionally narrow. A plugin should own the behavior that is specific to your use case. Data Designer should keep owning the pipeline responsibilities: validation, dependency resolution, batching, model calls, logging, previews, output handling. That split lets custom components use the normal workflow without moving orchestration into the project.
78+
These boundaries are intentionally narrow. A plugin should own the behavior that is specific to your use case. Data Designer validates configs and resolves dependencies. It plans batches, runs models, records logs, shows previews, then writes the output. That split lets custom components use the normal workflow without moving orchestration into the project.
7979

8080
What about [custom columns](../../concepts/custom_columns.md)? Start with a custom column when you are prototyping column-generator behavior or need a one-off column that only one project uses. Custom columns keep the logic in a Python function inside the config, with declared dependencies and optional model access. When that logic needs a stable config schema, tests, packaging, docs, or reuse across teams, promote it to a column generator plugin.
8181

@@ -109,7 +109,7 @@ class MarkdownSectionSeedSource(FileSystemSeedSource):
109109
seed_type: Literal["markdown-sections"] = "markdown-sections"
110110
```
111111

112-
The implementation class is where the old helper code should move. For a filesystem seed reader, Data Designer gives you a small interface instead of a blank page: implement `build_manifest(...)` to build a cheap index of candidate inputs, and implement `hydrate_row(...)` to turn each selected manifest row into one or more dataset rows. That split matters because Data Designer can sample, shuffle, partition, and batch against the lightweight manifest before paying the cost of reading files, parsing sections, or calling project-specific libraries. The parser can still be a normal helper function; the reader class is the framework boundary.
112+
The implementation class is where the old helper code should move. For a filesystem seed reader, Data Designer gives you a small interface instead of a blank page: implement `build_manifest(...)` to build a cheap index of candidate inputs, and implement `hydrate_row(...)` to turn each selected manifest row into one or more dataset rows. That split matters because Data Designer can plan work against the lightweight manifest before paying the cost of reading files, parsing sections, or calling project-specific libraries. The parser can still be a normal helper function; the reader class is the framework boundary.
113113

114114
```python
115115
# impl.py
@@ -141,7 +141,6 @@ class MarkdownSectionSeedReader(FileSystemSeedReader[MarkdownSectionSeedSource])
141141
context: SeedReaderFileSystemContext,
142142
) -> list[dict[str, str]]:
143143
# Fast path: enumerate candidate files and return cheap metadata.
144-
# Data Designer can index, sample, shuffle, and batch these rows.
145144
matched_paths = self.get_matching_relative_paths(
146145
context=context,
147146
file_pattern=self.source.file_pattern,
@@ -235,24 +234,24 @@ Reusable plugins also need a discovery layer. Once a plugin is useful beyond one
235234

236235
The NVIDIA catalog is backed by [NVIDIA-NeMo/DataDesignerPlugins](https://github.com/NVIDIA-NeMo/DataDesignerPlugins), a dedicated home for first-party plugin packages, packaging examples, and plugin-specific docs. Keeping those packages outside the core repository lets them carry optional dependencies, target narrower use cases, and move at their own pace while still using the same plugin interface once installed.
237236

238-
For users, the catalog makes discovering and installing first-party plugins seamless. The common flow is intentionally short: list the compatible packages, search for what you need, and install the package by name or alias.
237+
For users, the first-party path is short: list what is available, search for what you need, and install by package name or alias.
239238

240239
```bash
241240
data-designer plugin list
242-
data-designer plugin search github
243-
data-designer plugin install github
241+
data-designer plugin search <keyword>
242+
data-designer plugin install <package-name>
244243
```
245244

246-
After installation, normal entry point discovery takes over. Import the plugin's config classes and keep building the same declarative workflow.
245+
After installation, there is no separate registration step. Data Designer discovers the package's entry points, so users import the plugin's config classes and keep building the same declarative workflow.
247246

248-
The same pattern works for teams and communities. A platform group can publish a catalog of approved internal plugins backed by an internal package index or direct package references. A community can publish a catalog for a domain or workflow. The catalog gives users a trusted path to the plugins they prefer, while plugin packages remain independently versioned and distributed.
247+
Catalogs are not limited to NVIDIA plugins. A platform group can publish a catalog of approved internal plugins backed by an internal package index or direct package references. A community can publish a catalog for a domain or workflow. The catalog gives users a trusted path to the plugins they prefer, while plugin packages remain independently versioned and distributed.
249248

250249
```bash
251-
data-designer plugin catalog add internal <catalog-url>
252-
data-designer plugin --catalog internal install <package-or-alias>
250+
data-designer plugin catalog add <catalog-name> <catalog-url>
251+
data-designer plugin --catalog <catalog-name> install <package-name>
253252
```
254253

255-
That is the foundation for a richer Data Designer plugin ecosystem: the core framework provides the stable runtime, plugin authors provide specialized capabilities, and catalogs make those capabilities discoverable. For more information, see [Discover Plugins](../../plugins/discover.md).
254+
This provides a foundation for a rich Data Designer plugin ecosystem: the core framework provides the stable runtime, plugin authors provide specialized capabilities, and catalogs make those capabilities discoverable. For more information, see [Discover Plugins](../../plugins/discover.md).
256255

257256
---
258257

0 commit comments

Comments
 (0)