You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/devnotes/posts/have-it-your-way.md
+21-16Lines changed: 21 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,6 +13,8 @@ authors:
13
13
14
14
Data Designer is built around a simple idea: describe the dataset you want, and let the framework handle execution. A config points to seed data, defines generated columns, picks models, and shapes the final records — no orchestration code required. [Data Designer plugins](../../plugins/overview.md) keep that promise when a project needs something custom.
15
15
16
+
As of Data Designer v0.6.0, plugins are out of experimental mode and stable. They are the supported path for turning reusable project-specific logic into normal Data Designer components.
17
+
16
18
<!-- more -->
17
19
18
20
What does "something custom" actually look like? Picture a robotics team sitting on a pile of [Isaac Sim](https://developer.nvidia.com/isaac/sim)-generated warehouse runs, trying to turn robot poses, camera views, and event metadata into instruction data. With an internal simulation-log plugin, the user-facing part can still be this small:
@@ -81,7 +83,7 @@ What about [custom columns](../../concepts/custom_columns.md)? Start with a cust
81
83
82
84
## **Author a Plugin: From Glue Code to Seed Reader**
83
85
84
-
Consider a markdown seed reader. The one-off version might be a helper function that walks a directory, splits files into sections, returns a DataFrame, and then gets copied into the next project that needs it. That can work for one project. It becomes a problem when the reader needs options, tests, documentation, versioning, or reuse across teams. At that point, the helper has become a capability whether or not it is packaged like one.
86
+
To make this concrete, let's walk through a full example. Consider a markdown seed reader. The one-off version might be a helper function that walks a directory, splits files into sections, returns a DataFrame, and then gets copied into the next project that needs it. That can work for one project. It becomes a problem when the reader needs options, tests, documentation, versioning, or reuse across teams. At that point, the helper has become a capability whether or not it is packaged like one.
85
87
86
88
A plugin packages that same helper as a small Python project:
87
89
@@ -227,27 +229,30 @@ No custom orchestration. No separate DataFrame preparation step. The reader is p
227
229
228
230
---
229
231
230
-
## **Start Local, Share When Useful**
232
+
## **Building the Plugin Ecosystem**
231
233
232
-
A plugin does not need to start as a public package. Most should start locally. Start with a local Python package and install it in editable mode:
234
+
Reusable plugins also need a discovery layer. Once a plugin is useful beyond one project, users need a simple way to find the right package, install it, and get back to declaring datasets. That is why Data Designer includes a built-in NVIDIA plugin catalog and a CLI workflow for discovery and installation.
233
235
234
-
```bash
235
-
uv pip install -e .
236
-
```
236
+
The NVIDIA catalog is backed by [NVIDIA-NeMo/DataDesignerPlugins](https://github.com/NVIDIA-NeMo/DataDesignerPlugins), a dedicated home for first-party plugin packages, packaging examples, and plugin-specific docs. Keeping those packages outside the core repository lets them carry optional dependencies, target narrower use cases, and move at their own pace while still using the same plugin interface once installed.
237
237
238
-
That is enough for Data Designer to discover the entry point. You can iterate on the config class and implementation while testing the plugin in a real preview loop. When the shape stabilizes, the same package can move to an internal index, a GitHub repo, or PyPI.
238
+
For users, the catalog makes discovering and installing first-party plugins seamless. The common flow is intentionally short: list the compatible packages, search for what you need, and install the package by name or alias.
239
239
240
-
This is useful inside teams. A data platform group can maintain seed readers for internal systems. An applied science group can maintain generators for its domain. A training group can maintain processors that emit exactly the record shapes its trainers consume. Everyone else installs a package and uses typed configs in the same workflow they already know.
241
-
242
-
It is useful for the broader community too. If you build a plugin that should be discoverable by other Data Designer users, publish it and follow the instructions in [Available Plugins](../../plugins/available.md) to request a catalog listing.
240
+
```bash
241
+
data-designer plugin list
242
+
data-designer plugin search github
243
+
data-designer plugin install github
244
+
```
243
245
244
-
---
246
+
After installation, normal entry point discovery takes over. Import the plugin's config classes and keep building the same declarative workflow.
245
247
246
-
## **A Repository for First-Party Plugins**
248
+
The same pattern works for teams and communities. A platform group can publish a catalog of approved internal plugins backed by an internal package index or direct package references. A community can publish a catalog for a domain or workflow. The catalog gives users a trusted path to the plugins they prefer, while plugin packages remain independently versioned and distributed.
247
249
248
-
We recently created [NVIDIA-NeMo/DataDesignerPlugins](https://github.com/NVIDIA-NeMo/DataDesignerPlugins), a dedicated repository for NVIDIA-maintained plugins. It is where we will publish first-party plugin packages, recommended packaging examples, and plugin-specific docs as the catalog grows.
The split keeps the core Data Designer repo focused on the framework: the config API, engine execution, model integration, validation behavior, and stable plugin interface. Plugin packages can depend on optional libraries, target narrower use cases, and move at a different release pace, while still installing separately and using the same plugin interface once installed.
255
+
That is the foundation for a richer Data Designer plugin ecosystem: the core framework provides the stable runtime, plugin authors provide specialized capabilities, and catalogs make those capabilities discoverable. For more information, see [Discover Plugins](../../plugins/discover.md).
251
256
252
257
---
253
258
@@ -259,9 +264,9 @@ Interested in building your own plugin? Here are some resources to get you start
259
264
2.[Build Your Own](../../plugins/build_your_own.md) — follow the authoring guide for seed readers, column generators, and processors
260
265
3.[Using Models in Plugins](../../plugins/models.md) — call configured models from plugin code
261
266
4.[Markdown Section Seed Reader recipe](../../recipes/plugin_development/markdown_seed_reader.md) — study the complete version of the example from this post
262
-
5.[Available Plugins](../../plugins/available.md) — browse the catalog and learn how to submit your own plugin
267
+
5.[Discover Plugins](../../plugins/discover.md) — learn how to discover and install plugins
263
268
6.[DataDesignerPlugins on GitHub](https://github.com/NVIDIA-NeMo/DataDesignerPlugins) — explore first-party plugin packages
264
269
265
270
Moving plugins out of experimental mode means Data Designer no longer has to predict every customization users will need. The framework provides the pipeline. Plugins supply the custom pieces.
Copy file name to clipboardExpand all lines: fern/versions/v0.5.8/pages/devnotes/posts/have-it-your-way.mdx
+20-15Lines changed: 20 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,6 +15,8 @@ import { Authors } from "@/components/Authors";
15
15
16
16
Data Designer is built around a simple idea: describe the dataset you want, and let the framework handle execution. A config points to seed data, defines generated columns, picks models, and shapes the final records - no orchestration code required. [Data Designer plugins](/plugins/overview) keep that promise when a project needs something custom.
17
17
18
+
As of Data Designer v0.6.0, plugins are out of experimental mode and stable. They are the supported path for turning reusable project-specific logic into normal Data Designer components.
19
+
18
20
{/* more */}
19
21
20
22
What does "something custom" actually look like? Picture a robotics team sitting on a pile of [Isaac Sim](https://developer.nvidia.com/isaac/sim)-generated warehouse runs, trying to turn robot poses, camera views, and event metadata into instruction data. With an internal simulation-log plugin, the user-facing part can still be this small:
@@ -75,7 +77,7 @@ What about [custom columns](/concepts/custom-columns)? Start with a custom colum
75
77
76
78
## Author a Plugin: From Glue Code to Seed Reader
77
79
78
-
Consider a markdown seed reader. The one-off version might be a helper function that walks a directory, splits files into sections, returns a DataFrame, and then gets copied into the next project that needs it. That can work for one project. It becomes a problem when the reader needs options, tests, documentation, versioning, or reuse across teams. At that point, the helper has become a capability whether or not it is packaged like one.
80
+
To make this concrete, let's walk through a full example. Consider a markdown seed reader. The one-off version might be a helper function that walks a directory, splits files into sections, returns a DataFrame, and then gets copied into the next project that needs it. That can work for one project. It becomes a problem when the reader needs options, tests, documentation, versioning, or reuse across teams. At that point, the helper has become a capability whether or not it is packaged like one.
79
81
80
82
A plugin packages that same helper as a small Python project:
81
83
@@ -217,27 +219,30 @@ No custom orchestration. No separate DataFrame preparation step. The reader is p
217
219
218
220
---
219
221
220
-
## Start Local, Share When Useful
222
+
## Building the Plugin Ecosystem
221
223
222
-
A plugin does not need to start as a public package. Most should start locally. Start with a local Python package and install it in editable mode:
224
+
Reusable plugins also need a discovery layer. Once a plugin is useful beyond one project, users need a simple way to find the right package, install it, and get back to declaring datasets. That is why Data Designer includes a built-in NVIDIA plugin catalog and a CLI workflow for discovery and installation.
223
225
224
-
```bash
225
-
uv pip install -e .
226
-
```
226
+
The NVIDIA catalog is backed by [NVIDIA-NeMo/DataDesignerPlugins](https://github.com/NVIDIA-NeMo/DataDesignerPlugins), a dedicated home for first-party plugin packages, packaging examples, and plugin-specific docs. Keeping those packages outside the core repository lets them carry optional dependencies, target narrower use cases, and move at their own pace while still using the same plugin interface once installed.
227
227
228
-
That is enough for Data Designer to discover the entry point. You can iterate on the config class and implementation while testing the plugin in a real preview loop. When the shape stabilizes, the same package can move to an internal index, a GitHub repo, or PyPI.
228
+
For users, the catalog makes discovering and installing first-party plugins seamless. The common flow is intentionally short: list the compatible packages, search for what you need, and install the package by name or alias.
229
229
230
-
This is useful inside teams. A data platform group can maintain seed readers for internal systems. An applied science group can maintain generators for its domain. A training group can maintain processors that emit exactly the record shapes its trainers consume. Everyone else installs a package and uses typed configs in the same workflow they already know.
231
-
232
-
It is useful for the broader community too. If you build a plugin that should be discoverable by other Data Designer users, publish it and follow the instructions in [Available Plugin List](/plugins/available-plugin-list) to request a catalog listing.
230
+
```bash
231
+
data-designer plugin list
232
+
data-designer plugin search github
233
+
data-designer plugin install github
234
+
```
233
235
234
-
---
236
+
After installation, normal entry point discovery takes over. Import the plugin's config classes and keep building the same declarative workflow.
235
237
236
-
## A Repository for First-Party Plugins
238
+
The same pattern works for teams and communities. A platform group can publish a catalog of approved internal plugins backed by an internal package index or direct package references. A community can publish a catalog for a domain or workflow. The catalog gives users a trusted path to the plugins they prefer, while plugin packages remain independently versioned and distributed.
237
239
238
-
We recently created [NVIDIA-NeMo/DataDesignerPlugins](https://github.com/NVIDIA-NeMo/DataDesignerPlugins), a dedicated repository for NVIDIA-maintained plugins. It is where we will publish first-party plugin packages, recommended packaging examples, and plugin-specific docs as the catalog grows.
The split keeps the core Data Designer repo focused on the framework: the config API, engine execution, model integration, validation behavior, and stable plugin interface. Plugin packages can depend on optional libraries, target narrower use cases, and move at a different release pace, while still installing separately and using the same plugin interface once installed.
245
+
That is the foundation for a richer Data Designer plugin ecosystem: the core framework provides the stable runtime, plugin authors provide specialized capabilities, and catalogs make those capabilities discoverable. For more information, see [Discover Plugins](/plugins/discover).
241
246
242
247
---
243
248
@@ -248,7 +253,7 @@ Interested in building your own plugin? Here are some resources to get you start
248
253
1.[Plugins overview](/plugins/overview) - learn how plugins fit into Data Designer
249
254
2.[Example Plugin](/plugins/example-plugin) - follow the package shape for a column generator plugin
250
255
3.[Markdown Section Seed Reader recipe](/recipes/plugin-development/markdown-section-seed-reader-plugin) - study the complete version of the example from this post
251
-
4.[Available Plugin List](/plugins/available-plugin-list) - browse the catalog and learn how to submit your own plugin
256
+
4.[Discover Plugins](/plugins/discover) - learn how to discover and install plugins
252
257
5.[DataDesignerPlugins on GitHub](https://github.com/NVIDIA-NeMo/DataDesignerPlugins) - explore first-party plugin packages
253
258
254
259
Moving plugins out of experimental mode means Data Designer no longer has to predict every customization users will need. The framework provides the pipeline. Plugins supply the custom pieces.
The plugin system is currently **experimental** and under active development. The documentation, examples, and plugin interface are subject to significant changes in future releases. If you encounter any issues, have questions, or have ideas for improvement, please consider starting [a discussion on GitHub](https://github.com/NVIDIA-NeMo/DataDesigner/discussions).
The plugin system is currently **experimental** and under active development. The documentation, examples, and plugin interface are subject to significant changes in future releases. If you encounter any issues, have questions, or have ideas for improvement, please consider starting [a discussion on GitHub](https://github.com/NVIDIA-NeMo/DataDesigner/discussions).
9
-
</Warning>
10
6
11
7
`FileSystemSeedReader` is the simplest way to build a seed reader plugin when your source data lives in a directory of files. You describe the files cheaply in `build_manifest(...)`, then optionally read and reshape them in `hydrate_row(...)`.
Copy file name to clipboardExpand all lines: fern/versions/v0.5.8/pages/plugins/overview.mdx
+7-11Lines changed: 7 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,18 @@
1
1
---
2
2
title: "Data Designer Plugins"
3
-
description: ""
3
+
description: "Extend Data Designer with custom column generators, seed readers, and processors."
4
4
position: 1
5
5
---
6
-
<Warning>
7
-
Experimental Feature
8
-
The plugin system is currently **experimental** and under active development. The documentation, examples, and plugin interface are subject to significant changes in future releases. If you encounter any issues, have questions, or have ideas for improvement, please consider starting [a discussion on GitHub](https://github.com/NVIDIA-NeMo/DataDesigner/discussions).
9
-
</Warning>
10
6
11
-
## What are plugins?
7
+
Plugins let you add new object types to Data Designer without modifying the core library. Once installed, plugins behave like native Data Designer objects: they use the same declarative config patterns, builder APIs, discovery flow, and runtime execution paths as the built-in objects.
12
8
13
-
Plugins are Python packages that extend Data Designer's capabilities without modifying the core library. Similar to [VS Code extensions](https://marketplace.visualstudio.com/vscode) and [Pytest plugins](https://docs.pytest.org/en/stable/reference/plugin_list.html), the plugin system empowers you to build specialized extensions for your specific use cases and share them with the community.
9
+
## Supported plugin types
14
10
15
-
**Current capabilities**: Data Designer supports three plugin types:
11
+
Data Designer supports three plugin types:
16
12
17
-
-**Column Generator Plugins**: Custom column types you pass to the config builder's [add_column](/code-reference/topic-overviews/config-builder#data_designer.config.config_builder.DataDesignerConfigBuilder.add_column) method.
18
-
-**Seed Reader Plugins**: Custom seed dataset readers that let you load data from new sources (e.g., databases, cloud storage, custom formats).
19
-
-**Processor Plugins**: Custom processors that transform data before batches, after batches, or after generation completes. Pass them to the config builder's [add_processor](/code-reference/topic-overviews/config-builder#data_designer.config.config_builder.DataDesignerConfigBuilder.add_processor) method.
13
+
-**Column generator plugins**: Custom column types you pass to the config builder's [add_column](/code-reference/topic-overviews/config-builder#data_designer.config.config_builder.DataDesignerConfigBuilder.add_column) method.
14
+
-**Seed reader plugins**: Custom seed dataset readers that let you load data from new sources, such as databases, cloud storage, or custom file formats.
15
+
-**Processor plugins**: Custom processors that transform data before batches, after batches, or after generation completes. Pass them to the config builder's [add_processor](/code-reference/topic-overviews/config-builder#data_designer.config.config_builder.DataDesignerConfigBuilder.add_processor) method.
0 commit comments