Skip to content

Commit 8b8d748

Browse files
authored
docs: graduate plugins out of experimental mode (#603)
* chore: add __init__.py to engine namespace subpackages Griffe (used by mkdocstrings) skips directories without __init__.py when resolving module paths, which prevented the new plugins code reference from rendering SeedReader, FileSystemSeedReader, and Processor. Adding empty __init__.py files in engine/resources/, engine/processing/, and engine/processing/processors/ aligns with the convention already used in engine/mcp/, engine/models/, etc. * docs: flesh out docstrings on plugin extension-point classes Plugin authors now see meaningful descriptions for every field and method on the bases rendered in the plugins code reference: - Plugin and PluginType: class docstrings + Attributes tables for fields and enum members; fix typo in config_qualified_name field description. - SingleColumnConfig: document allow_resize. - ProcessorConfig: document processor_type discriminator. - SeedSource: document seed_type discriminator. - FileSystemSeedSource: add class docstring + Attributes table for path / file_pattern / recursive. - ColumnGeneratorFullColumn and ColumnGeneratorCellByCell: add class docstrings explaining when to use each base, plus method docstrings on the abstract generate() implementations. * docs: graduate plugins out of experimental mode Restructures plugin documentation around the now-stable extension points (column generator, seed reader, processor) and treats plugins as a first-class story for customizing Data Designer. - Add code_reference/plugins.md: single-stop reference for the Plugin object and the config + implementation base classes used by all three plugin types. - Add code_reference/generators.md: column generator implementation base classes, separated from column configs. - Surface SingleColumnConfig in code_reference/column_configs.md. - Add plugins/implement.md ("Build Your Own"): per-type implementation instructions across column generators, seed readers, and processors. - Add plugins/processor.md: complete processor plugin package example. - Rewrite plugins/overview.md: open with why plugins exist, drop the internal-helpers note (PluginRegistry / PluginManager), and focus the guide on what plugin builders need. - Refresh plugins/available.md (Catalog) and plugins/filesystem_seed_reader.md to match the new structure. - Delete plugins/example.md (replaced by per-type guides). - Reorder Code Reference nav alphabetically and add the new pages. - Minor link / wording fixes in concepts/processors.md and concepts/deployment-options.md. * docs: simplify plugin docs structure Replace the overview's how-to walkthrough and the per-type plugin guides with a single Build Your Own page that covers all three plugin types side-by-side. Add a dedicated Using Models in Plugins guide and a seed_readers code reference, and trim the overview down to what the plugin types are, how to use one, and how discovery works. - Rename plugins/implement.md to plugins/build_your_own.md. - Delete plugins/filesystem_seed_reader.md and plugins/processor.md (their content is now in build_your_own.md and the per-type code references). - Add plugins/models.md for model-backed column generator authoring. - Add code_reference/seed_readers.md for seed reader implementation base classes. - Rewrite plugins/overview.md: shorter intro, type bullets link to the relevant code reference, drop the multi-step "How do you create plugins" walkthrough in favor of a single Build a Plugin pointer, tighten Discovery troubleshooting. - Refresh plugins/available.md (Available Plugins): point to the DataDesignerPlugins catalog and explain how to request a community listing. - Update cross-page links in concepts/processors.md, concepts/seed-datasets.md, recipes/plugin_development/markdown_seed_reader.md, code_reference/plugins.md, and code_reference/generators.md to match the new structure. - Update mkdocs.yml nav: rename to Build Your Own, add Using Models, add seed_readers code reference. * docs: scroll wide tables horizontally instead of wrapping Code-heavy reference tables (plugin bases, column generators, etc.) were wrapping aggressively on narrow viewports, breaking long identifiers across multiple lines. Switch the table container to horizontal overflow and prevent code cells from wrapping so identifiers stay readable. * docs: address PR #603 review feedback - Add an Implementation base section to code_reference/processors.md rendering the engine-side Processor class. This justifies the engine/processing/__init__.py files added earlier and gives processor plugin authors an auto-rendered API reference, matching the pattern used by code_reference/generators.md and seed_readers.md. - build_your_own.md: replace the placeholder "x" emoji on the IndexMultiplier example with the actual multiplication sign. - build_your_own.md: drop the manual `re.compile + apply(lambda)` pattern in the regex-filter processor in favor of the idiomatic `Series.str.contains(..., regex=True)`. - build_your_own.md: add a kernel-restart caveat after the editable install instructions — PluginRegistry caches discovery on first import, so notebooks need a fresh kernel to pick up freshly installed plugins. - build_your_own.md: state explicitly what `assert_valid_plugin` checks (config base + plugin-type-appropriate impl base). - code_reference/plugins.md: link out to the processors code reference alongside generators and seed_readers. * docs: split code reference by package * docs: add interface code reference * docs: add code reference overviews * docs: refine code reference pages * docs: improve code reference tables * docs: correct reference docstrings * docs: embed plugin catalog table * docs: note plugin discovery restart caveat * docs: explain generator base class choice * docs: mention async cell generator examples * docs: clarify plugin model usage * docs: clarify plugin model aliases * docs: address plugin review feedback * docs: update available plugins page
1 parent 9214637 commit 8b8d748

73 files changed

Lines changed: 1330 additions & 805 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/code_reference/analysis.md

Lines changed: 0 additions & 31 deletions
This file was deleted.

docs/code_reference/column_configs.md

Lines changed: 0 additions & 8 deletions
This file was deleted.
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Analysis
2+
3+
Profiling result objects and report helpers returned after generation.
4+
5+
## Column Statistics
6+
7+
`DataDesigner.create()` and `DataDesigner.preview()` run the dataset profiler after generation. The profiler computes statistics for each configured column; side-effect columns are recorded separately in `DatasetProfilerResults.side_effect_column_names`.
8+
9+
Statistics result classes store computed metrics for each column type and format those metrics for reports.
10+
11+
::: data_designer.config.analysis.column_statistics
12+
13+
## Column Profilers
14+
15+
Column profilers are optional analysis tools that provide deeper insights into specific column types. Currently, the only column profiler available is the Judge Score Profiler.
16+
17+
Profiler result classes store computed profiler output and format it for reports.
18+
19+
::: data_designer.config.analysis.column_profilers
20+
21+
## Dataset Profiler
22+
23+
The [DatasetProfilerResults](#data_designer.config.analysis.dataset_profiler.DatasetProfilerResults) class stores profiling results for a generated dataset. It aggregates column-level statistics, side-effect column names, and optional profiler results, and provides methods to:
24+
25+
- Compute dataset-level metrics (completion percentage, column type summary)
26+
- Filter statistics by column type
27+
- Generate formatted analysis reports via the `to_report()` method
28+
29+
Reports can be displayed in the console or exported to HTML/SVG formats.
30+
31+
::: data_designer.config.analysis.dataset_profiler
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Column Configurations
2+
3+
Column configs declare Data Designer's built-in column types. Each configuration inherits from [SingleColumnConfig](#data_designer.config.base.SingleColumnConfig), which provides shared arguments like the column `name`, whether to `drop` the column after generation, and the `column_type`.
4+
5+
For column generator implementation classes, see [column_generators](../engine/column_generators.md).
6+
7+
!!! info "`column_type` is a discriminator field"
8+
The `column_type` argument is used to identify column types when deserializing the [Data Designer Config](data_designer_config.md) from JSON/YAML. It acts as the discriminator in a [discriminated union](https://docs.pydantic.dev/latest/concepts/unions/#discriminated-unions), allowing Pydantic to automatically determine which column configuration class to instantiate.
9+
10+
## `SingleColumnConfig` {#data_designer.config.base.SingleColumnConfig}
11+
12+
::: data_designer.config.base.SingleColumnConfig
13+
options:
14+
show_root_toc_entry: false
15+
16+
## Column configurations
17+
18+
::: data_designer.config.column_configs
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Data Designer's Config Builder
2+
3+
Use [DataDesignerConfigBuilder](#data_designer.config.config_builder.DataDesignerConfigBuilder) to construct [DataDesignerConfig](data_designer_config.md#data_designer.config.data_designer_config.DataDesignerConfig) objects. The builder accumulates model configs, tool configs, column configs, constraints, seed settings, processors, and profilers.
4+
5+
Inputs can come from scratch, a `dict`, [BuilderConfig](#data_designer.config.config_builder.BuilderConfig), a local YAML/JSON file, or an HTTP(S) YAML/JSON URL via [`from_config()`](#data_designer.config.config_builder.DataDesignerConfigBuilder.from_config). Use [`build()`](#data_designer.config.config_builder.DataDesignerConfigBuilder.build) to create a [DataDesignerConfig](data_designer_config.md#data_designer.config.data_designer_config.DataDesignerConfig), or [`write_config()`](#data_designer.config.config_builder.DataDesignerConfigBuilder.write_config) to serialize the current builder config to YAML or JSON.
6+
7+
!!! info "Model config loading"
8+
[DataDesignerConfigBuilder](#data_designer.config.config_builder.DataDesignerConfigBuilder) accepts model configs as a list of [ModelConfig](models.md#data_designer.config.models.ModelConfig) objects, a YAML/JSON config path, or `None`. When `model_configs=None`, the builder loads default model configs if Data Designer can run locally; otherwise initialization raises BuilderConfigurationError. Model configs define the aliases referenced by model-backed columns such as [`LLMTextColumnConfig`](column_configs.md#data_designer.config.column_configs.LLMTextColumnConfig), [`LLMCodeColumnConfig`](column_configs.md#data_designer.config.column_configs.LLMCodeColumnConfig), [`LLMStructuredColumnConfig`](column_configs.md#data_designer.config.column_configs.LLMStructuredColumnConfig), [`LLMJudgeColumnConfig`](column_configs.md#data_designer.config.column_configs.LLMJudgeColumnConfig), [`EmbeddingColumnConfig`](column_configs.md#data_designer.config.column_configs.EmbeddingColumnConfig), and [`ImageColumnConfig`](column_configs.md#data_designer.config.column_configs.ImageColumnConfig).
9+
10+
::: data_designer.config.config_builder
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Data Designer Configuration
2+
3+
[DataDesignerConfig](#data_designer.config.data_designer_config.DataDesignerConfig) is the top-level configuration object passed to Data Designer. It declares the columns to generate and may include model configs, tool configs, seed settings, sampler constraints, processors, and profiler configs.
4+
5+
Prefer [DataDesignerConfigBuilder](config_builder.md#data_designer.config.config_builder.DataDesignerConfigBuilder) for programmatic construction. Direct [DataDesignerConfig](#data_designer.config.data_designer_config.DataDesignerConfig) instantiation is also supported.
6+
7+
::: data_designer.config.data_designer_config
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# Config Package
2+
3+
The `data-designer-config` package provides `data_designer.config`, the configuration layer of Data Designer. It contains the objects used to describe dataset structure, model access, tool access, seed data, sampler parameters, validators, processors, run settings, plugin registrations, and analysis results.
4+
5+
This package is the base of the dependency chain. Engine and interface code consume these config objects, but config objects do not execute generation directly.
6+
7+
For programmatic configuration work, start with [config_builder](config_builder.md) and [data_designer_config](data_designer_config.md). Use the narrower pages for exact constructor fields for columns, models, MCP tools, seeds, processors, samplers, validators, or profiling results.

docs/code_reference/config/mcp.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
# MCP Configuration
2+
3+
MCP config objects tell Data Designer which Model Context Protocol providers exist and which tools an LLM column may use.
4+
5+
[MCPProvider](#data_designer.config.mcp.MCPProvider) configures remote MCP servers via SSE or Streamable HTTP transport. [LocalStdioMCPProvider](#data_designer.config.mcp.LocalStdioMCPProvider) configures local MCP servers as subprocesses via stdio transport. [ToolConfig](#data_designer.config.mcp.ToolConfig) sets which tools are available for LLM columns and how they are constrained.
6+
7+
For MCP execution internals, see [Engine MCP](../engine/mcp.md). Related guides:
8+
9+
- **[MCP Providers](../../concepts/mcp/mcp-providers.md)** - Configure local or remote MCP providers
10+
- **[Tool Configs](../../concepts/mcp/tool-configs.md)** - Define tool permissions and limits
11+
- **[Enabling Tools](../../concepts/mcp/enabling-tools.md)** - Use tools in LLM columns
12+
- **[Traces](../../concepts/traces.md)** - Capture full conversation history
13+
14+
## API Reference
15+
16+
::: data_designer.config.mcp
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Models
2+
3+
[ModelProvider](#data_designer.config.models.ModelProvider) stores connection and authentication details for model providers. [ModelConfig](#data_designer.config.models.ModelConfig) stores a model alias, model identifier, provider settings, and inference parameters. [Inference Parameters](../../concepts/models/inference-parameters.md) control model behavior. Chat-completion parameters include `temperature`, `top_p`, and `max_tokens`; `temperature` and `top_p` can be fixed values or configured distributions. [ImageContext](#data_designer.config.models.ImageContext) provides image inputs to multimodal models, and [ImageInferenceParams](#data_designer.config.models.ImageInferenceParams) configures image generation models.
4+
5+
Related guides:
6+
7+
- **[Model Providers](../../concepts/models/model-providers.md)**
8+
- **[Model Configs](../../concepts/models/model-configs.md)**
9+
- **[Image Context](../../notebooks/4-providing-images-as-context.ipynb)**
10+
- **[Generating Images](../../notebooks/5-generating-images.ipynb)**
11+
12+
::: data_designer.config.models
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
# Plugins
2+
3+
Plugin packages register [Plugin](#data_designer.plugins.plugin.Plugin) objects through entry points in the `data_designer.plugins` group. A plugin registration ties a config class to its implementation class and declares its [PluginType](#data_designer.plugins.plugin.PluginType).
4+
5+
Related pages: [Build Your Own](../../plugins/build_your_own.md), [Column Generators](../engine/column_generators.md), [Seed Readers](../engine/seed_readers.md), [Engine Processors](../engine/processors.md), and [Processor Configurations](processors.md).
6+
7+
## `Plugin` {#data_designer.plugins.plugin.Plugin}
8+
9+
::: data_designer.plugins.plugin.Plugin
10+
options:
11+
show_root_toc_entry: false
12+
13+
## `PluginType` {#data_designer.plugins.plugin.PluginType}
14+
15+
::: data_designer.plugins.plugin.PluginType
16+
options:
17+
show_root_toc_entry: false

0 commit comments

Comments
 (0)