Skip to content

Commit 9dbb9e5

Browse files
committed
docs: refine plugin dev note
1 parent e08695c commit 9dbb9e5

3 files changed

Lines changed: 47 additions & 55 deletions

File tree

docs/css/style.css

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,10 @@ h2 {
162162
clear: right;
163163
}
164164

165+
.md-post--excerpt .devnote-hide-in-index {
166+
display: none;
167+
}
168+
165169
@media screen and (max-width: 60em) {
166170
.md-typeset img.devnote-float-right,
167171
.md-typeset img.devnote-section-graphic {
Binary file not shown.

docs/devnotes/posts/have-it-your-way.md

Lines changed: 43 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,13 @@ authors:
99

1010
<p class="devnote-dek"><em>A plugin framework for the custom pieces every real project ends up needing</em></p>
1111

12-
![Data Designer plugin extensions](assets/have-it-your-way/data-designer-plugins-hero.png){ .devnote-float-right }
12+
![Data Designer plugin extensions](assets/have-it-your-way/data-designer-plugins-hero.png){ .devnote-float-right .devnote-hide-in-index }
1313

1414
Data Designer is built around a simple idea: describe the dataset you want, and let the framework handle execution. A config points to seed data, defines generated columns, picks models, and shapes the final records — no orchestration code required. [Data Designer plugins](../../plugins/overview.md) keep that promise when a project needs something custom.
1515

1616
<!-- more -->
1717

18-
Suppose a robotics team has [Isaac Sim](https://developer.nvidia.com/isaac/sim)-generated warehouse runs and wants to turn robot poses, camera views, and event metadata into instruction data. With an internal simulation-log plugin, the user-facing part can still be this small:
18+
What does "something custom" actually look like? Picture a robotics team sitting on a pile of [Isaac Sim](https://developer.nvidia.com/isaac/sim)-generated warehouse runs, trying to turn robot poses, camera views, and event metadata into instruction data. With an internal simulation-log plugin, the user-facing part can still be this small:
1919

2020
```bash
2121
uv pip install data-designer-isaac-logs
@@ -26,40 +26,25 @@ from data_designer_isaac_logs.config import IsaacRunSeedSource
2626
from data_designer_isaac_logs.config import WarehouseEventLabelColumnConfig
2727
from data_designer_isaac_logs.config import RobotSFTProcessor
2828

29-
builder.with_seed_dataset(
29+
config_builder.with_seed_dataset(
3030
IsaacRunSeedSource(
3131
run_dir="s3://warehouse-sim/rare-events/",
3232
streams=("robot_pose", "overhead_rgb", "event_log"),
3333
max_events=10_000,
3434
)
3535
)
36-
builder.add_column(
36+
config_builder.add_column(
3737
WarehouseEventLabelColumnConfig(
3838
name="safety_instruction",
3939
pose_column="robot_pose",
4040
event_log_column="event_log",
4141
)
4242
)
43-
builder.add_processor(RobotSFTProcessor(output_column="messages"))
43+
config_builder.add_processor(RobotSFTProcessor(output_column="messages"))
4444
```
4545

4646
That is the point of plugins: install a package, import its config classes, and keep the workflow declarative. The Isaac run reader, event labeler, and trainer-format processor own the custom parsing, labeling, validation, and export shape, while Data Designer still handles discovery, dependency ordering, model calls, previews, and output.
4747

48-
49-
<div class="devnote-clear"></div>
50-
51-
!!! tip "TL;DR - What plugins give you"
52-
53-
1. Plugins expose custom behavior through the same Data Designer config and runtime paths as built-in components.
54-
55-
2. A plugin is a Python package discovered through the `data_designer.plugins` entry point group. Once installed, there is no manual registration step in user code.
56-
57-
3. Plugin configs use the same typed config model and serialization behavior as core config types. The engine receives an implementation through the plugin registry.
58-
59-
4. Plugins can start as a local editable install, move to an internal package index, and later be published publicly.
60-
61-
5. NVIDIA-maintained plugins now live in [NVIDIA-NeMo/DataDesignerPlugins](https://github.com/NVIDIA-NeMo/DataDesignerPlugins), separate from the core repo and installed as packages.
62-
6348
---
6449

6550
## **Customization Is the Normal Case**
@@ -68,7 +53,7 @@ That is the point of plugins: install a package, import its config classes, and
6853

6954
The mess usually starts innocently. A team defines a Data Designer config, then discovers that its seed data lives in an internal layout, its generated column needs a domain simulator, and its trainer expects a slightly different record shape. Someone writes a small reader beside the notebook. Someone patches a generator into a project folder. Someone adds a cleanup script after preview because the final export has one more organization-specific rule. Each choice is reasonable because every project has its own corpus, policy, ontology, simulator, and training stack.
7055

71-
The problem is that the custom behavior now lives around Data Designer instead of inside the Data Designer workflow. It is harder to validate, harder to share, harder to version, and easier to lose. Plugins give that bespoke work a clean package boundary: a name, typed config, runtime implementation, entry point, and tests that travel together. Users still declare the dataset they want, but the local reader, domain generator, or trainer-format processor becomes a normal Data Designer component instead of another layer of glue.
56+
The problem is that the custom behavior now lives around Data Designer instead of inside the Data Designer workflow. It is harder to validate, harder to share, harder to version, and easier to lose. Plugins give that bespoke work a clean package boundary – a name, typed config, runtime implementation, entry point, and tests that travel together. Users still declare the dataset they want, but the local reader, domain generator, or trainer-format processor becomes a normal Data Designer component instead of another layer of glue.
7257

7358
<div class="devnote-clear"></div>
7459

@@ -96,12 +81,29 @@ These boundaries are intentionally narrow. A plugin should own the behavior that
9681

9782
Consider a markdown seed reader. The one-off version might be a helper function that walks a directory, splits files into sections, returns a DataFrame, and then gets copied into the next project that needs it. That can work for one project. It becomes a problem when the reader needs options, tests, documentation, versioning, or reuse across teams. At that point, the helper has become a capability whether or not it is packaged like one.
9883

99-
A plugin packages the same idea as a small Python project:
84+
A plugin packages that same helper as a small Python project:
10085

10186
- A user-facing config class describes the options.
10287
- An implementation class does the work.
10388
- A `Plugin` object connects the config to the implementation.
104-
- A Python entry point exposes the plugin to Data Designer.
89+
- An entry point registers the plugin with Data Designer.
90+
91+
The config class declares the user-facing options. For a directory-backed reader, Data Designer's `FileSystemSeedSource` already has fields for `path`, `file_pattern`, and `recursive`, we just need to define the seed type discriminator:
92+
93+
```python
94+
# config.py
95+
from __future__ import annotations
96+
97+
from typing import Literal
98+
99+
from data_designer.config.seed_source import FileSystemSeedSource
100+
101+
102+
class MarkdownSectionSeedSource(FileSystemSeedSource):
103+
"""Configure the markdown sections seed reader."""
104+
105+
seed_type: Literal["markdown-sections"] = "markdown-sections"
106+
```
105107

106108
The implementation class is where the old helper code should move. For a filesystem seed reader, Data Designer gives you a small interface instead of a blank page: implement `build_manifest(...)` to build a cheap index of candidate inputs, and implement `hydrate_row(...)` to turn each selected manifest row into one or more dataset rows. That split matters because Data Designer can sample, shuffle, partition, and batch against the lightweight manifest before paying the cost of reading files, parsing sections, or calling project-specific libraries. The parser can still be a normal helper function; the reader class is the framework boundary.
107109

@@ -173,17 +175,9 @@ class MarkdownSectionSeedReader(FileSystemSeedReader[MarkdownSectionSeedSource])
173175
]
174176
```
175177

176-
The class should own only the domain-specific behavior: how to find candidate files, how to parse them, and what rows it emits. Let Data Designer keep owning attachment, sampling, shuffling, batching, DuckDB registration, dependency resolution, and execution. The same rule applies to column generators and processors: choose the closest base class, keep options on the config object, implement the narrow runtime method, and leave orchestration out of the plugin.
178+
The same rule applies to column generators and processors: choose the closest base class, keep options on the config object, implement the narrow runtime method, and leave orchestration out of the plugin.
177179

178-
The entry point exposes the plugin package to Data Designer:
179-
180-
```toml
181-
# pyproject.toml
182-
[project.entry-points."data_designer.plugins"]
183-
markdown-sections = "data_designer_markdown_sections.plugin:plugin"
184-
```
185-
186-
The plugin object tells Data Designer what kind of extension this is and where to find the config and implementation:
180+
Two small files connect the plugin to Data Designer — a `Plugin` descriptor that names the config and implementation, and a Python entry point that exposes them at install time:
187181

188182
```python
189183
# plugin.py
@@ -196,6 +190,12 @@ plugin = Plugin(
196190
)
197191
```
198192

193+
```toml
194+
# pyproject.toml
195+
[project.entry-points."data_designer.plugins"]
196+
markdown-sections = "data_designer_markdown_sections.plugin:plugin"
197+
```
198+
199199
After that, users do not import engine internals or run registration code. They import the config class and use it:
200200

201201
```python
@@ -221,7 +221,7 @@ builder.add_column(
221221
results = DataDesigner().preview(builder, num_records=5)
222222
```
223223

224-
No custom orchestration. No separate DataFrame preparation step. The reader is part of the Data Designer workflow. For the same package shape applied to other extension points, see the [Build Your Own plugin guide](../../plugins/build_your_own.md#implementation-patterns), [Column Generators](../../code_reference/engine/column_generators.md), and [Engine Processors](../../code_reference/engine/processors.md) documentation.
224+
No custom orchestration. No separate DataFrame preparation step. The reader is part of the Data Designer workflow.
225225

226226
---
227227

@@ -243,34 +243,22 @@ It is useful for the broader community too. If you build a plugin that should be
243243

244244
## **A Repository for First-Party Plugins**
245245

246-
We also created [NVIDIA-NeMo/DataDesignerPlugins](https://github.com/NVIDIA-NeMo/DataDesignerPlugins), a dedicated repository for NVIDIA-maintained plugins. It is where we will publish first-party plugin packages, recommended packaging examples, and plugin-specific docs as the catalog grows.
246+
We recently created [NVIDIA-NeMo/DataDesignerPlugins](https://github.com/NVIDIA-NeMo/DataDesignerPlugins), a dedicated repository for NVIDIA-maintained plugins. It is where we will publish first-party plugin packages, recommended packaging examples, and plugin-specific docs as the catalog grows.
247247

248248
The split keeps the core Data Designer repo focused on the framework: the config API, engine execution, model integration, validation behavior, and stable plugin interface. Plugin packages can depend on optional libraries, target narrower use cases, and move at a different release pace, while still installing separately and using the same plugin interface once installed.
249249

250250
---
251251

252-
## **Start with One Capability**
253-
254-
If you have custom Data Designer code that keeps getting copied between projects, it is a strong candidate for a plugin.
255-
256-
Pick one capability. Give it a typed config. Write the implementation behind the matching plugin boundary. Add an `assert_valid_plugin(...)` test so structural problems fail early:
257-
258-
```python
259-
from data_designer.engine.testing import assert_valid_plugin
260-
from data_designer_markdown_sections.plugin import plugin
261-
262-
assert_valid_plugin(plugin)
263-
```
264-
265-
Then run a tiny `preview` before you trust it in a larger generation job.
252+
## **Where to Go Next**
266253

267-
For implementation details, see:
254+
Interested in building your own plugin? Here are some resources to get you started:
268255

269-
- [Plugins overview](../../plugins/overview.md)
270-
- [Build Your Own](../../plugins/build_your_own.md)
271-
- [Using Models in Plugins](../../plugins/models.md)
272-
- [Available Plugins](../../plugins/available.md)
273-
- [Markdown Section Seed Reader recipe](../../recipes/plugin_development/markdown_seed_reader.md)
256+
1. [Plugins overview](../../plugins/overview.md) — learn how plugins fit into Data Designer
257+
2. [Build Your Own](../../plugins/build_your_own.md) — follow the authoring guide for seed readers, column generators, and processors
258+
3. [Using Models in Plugins](../../plugins/models.md) — call configured models from plugin code
259+
4. [Markdown Section Seed Reader recipe](../../recipes/plugin_development/markdown_seed_reader.md) — study the complete version of the example from this post
260+
5. [Available Plugins](../../plugins/available.md) — browse the catalog and learn how to submit your own plugin
261+
6. [DataDesignerPlugins on GitHub](https://github.com/NVIDIA-NeMo/DataDesignerPlugins) — explore first-party plugin packages
274262

275263
Moving plugins out of experimental mode means Data Designer no longer has to predict every customization users will need. The framework provides the pipeline. Plugins supply the custom pieces.
276264

0 commit comments

Comments
 (0)