Skip to content

Commit 4768a36

Browse files
chore: plan 427, PR 2 of agent-first development plan (#478)
* save progress * undo review-code skill change * delete status file * small tweaks * Fix 429 info * update workind on skill info * updates * Update architecture/overview.md Co-authored-by: Johnny Greco <jogreco@nvidia.com> * fix: correct symbol names and CLI commands in architecture docs Address review comments: - models.md: describe clients as native httpx adapters, not SDK wrappers - agent-introspection.md: use actual family keys (columns, samplers, etc.) not column-types - cli.md: use correct command `data-designer config models` - plugins.md: SEED_READER not SEED_SOURCE, inject_into_processor_config_type_union Made-with: Cursor --------- Co-authored-by: Johnny Greco <jogreco@nvidia.com>
1 parent d4443d7 commit 4768a36

File tree

17 files changed

+863
-113
lines changed

17 files changed

+863
-113
lines changed

.agents/skills/create-pr/SKILL.md

Lines changed: 59 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
name: create-pr
3-
description: Create a GitHub PR with a well-formatted description including summary, categorized changes, and attention areas
3+
description: Create a GitHub PR with a well-formatted description matching the repository PR template (flat Changes by default; optional Added/Changed/Removed/Fixed grouping)
44
argument-hint: [special instructions]
55
disable-model-invocation: true
66
metadata:
@@ -9,7 +9,7 @@ metadata:
99

1010
# Create Pull Request
1111

12-
Create a well-formatted GitHub pull request for the current branch.
12+
Create a well-formatted GitHub pull request for the current branch. The PR description must conform to the repository's PR template (`.github/PULL_REQUEST_TEMPLATE.md`).
1313

1414
## Arguments
1515

@@ -40,15 +40,21 @@ Run these commands in parallel to understand the changes:
4040

4141
## Step 2: Analyze and Categorize Changes
4242

43-
### By Change Type (from commits and diff)
43+
Use change types below to **decide** how to write the Changes section (flat vs grouped). You still describe testing under **Testing**, not under these buckets.
44+
45+
### By change type (internal checklist)
4446
-**Added**: New files, features, capabilities
4547
- 🔧 **Changed**: Modified existing functionality
4648
- 🗑️ **Removed**: Deleted files or features
4749
- 🐛 **Fixed**: Bug fixes
4850
- 📚 **Docs**: Documentation updates
4951
- 🧪 **Tests**: Test additions/modifications
5052

51-
### Identify Attention Areas 🔍
53+
### When to use optional grouping in **Changes**
54+
- **Flat bullet list** (default): Small PRs, single theme, or when categories would be sparse or redundant.
55+
- **Grouped subheadings** (`### ✨ Added`, `### 🔧 Changed`, `### 🗑️ Removed`, `### 🐛 Fixed`): Large PRs, release-note-style summaries, or clearly distinct fix + feature mixes. **Omit any empty section** — do not leave placeholder headings.
56+
57+
### Identify attention areas
5258
Flag for special reviewer attention:
5359
- Files with significant changes (>100 lines)
5460
- Changes to base classes, interfaces, or public API
@@ -75,35 +81,63 @@ If commits have mixed types, use the primary/most significant type.
7581
git push -u origin <branch-name>
7682
```
7783

78-
2. **Create PR** using this template:
84+
2. **Build the PR body** using the repository's template structure.
85+
86+
**Default — flat Changes** (remove the HTML comment block from the template when filling in, or replace with your bullets only):
7987

8088
```markdown
8189
## 📋 Summary
8290

83-
[1-2 sentence overview of what this PR accomplishes]
91+
[1-3 sentences: what this PR does and why. Focus on the "why".]
92+
93+
## 🔗 Related Issue
94+
95+
[Fixes #NNN or Closes #NNN — link to the issue this addresses]
96+
97+
## 🔄 Changes
98+
99+
- [Bullet list of key changes]
100+
- [Link to key files when helpful for reviewers]
101+
- [Reference commits for specific changes in multi-commit PRs]
102+
103+
## 🧪 Testing
104+
105+
- [x] `make test` passes
106+
- [x] Unit tests added/updated (or: N/A — no testable logic)
107+
- [ ] E2E tests added/updated (if applicable)
108+
109+
## ✅ Checklist
84110

111+
- [x] Follows commit message conventions
112+
- [x] Commits are signed off (DCO)
113+
- [ ] Architecture docs updated (if applicable)
114+
```
115+
116+
**Optional — grouped Changes** (only when Step 2 criteria apply; omit empty sections):
117+
118+
```markdown
85119
## 🔄 Changes
86120

87121
### ✨ Added
88-
- [New features/files - link to key files when helpful]
122+
- [...]
89123

90124
### 🔧 Changed
91-
- [Modified functionality - reference commits for specific changes]
92-
93-
### 🗑️ Removed
94-
- [Deleted items]
125+
- [...]
95126

96127
### 🐛 Fixed
97-
- [Bug fixes - if applicable]
128+
- [...]
129+
```
98130

131+
(Include `### 🗑️ Removed` only when something was deleted.)
132+
133+
If there are genuinely important attention areas for reviewers, add an **Attention Areas** section after Changes:
134+
135+
```markdown
99136
## 🔍 Attention Areas
100137

101138
> ⚠️ **Reviewers:** Please pay special attention to the following:
102139
103-
- [`path/to/critical/file.py`](https://github.com/<owner>/<repo>/blob/<branch>/path/to/critical/file.py) - [Why this needs attention]
104-
105-
---
106-
🤖 *Generated with AI*
140+
- [`path/to/critical/file.py`](https://github.com/<owner>/<repo>/blob/<branch>/path/to/critical/file.py)[Why this needs attention]
107141
```
108142

109143
3. **Execute**:
@@ -118,20 +152,24 @@ If commits have mixed types, use the primary/most significant type.
118152
119153
## Section Guidelines
120154
121-
- **Summary**: Always include - be concise and focus on the "why"
122-
- **Changes**: Group by type, omit empty sections
155+
- **Summary**: Always include — be concise and focus on the "why", not just the "what"
156+
- **Related Issue**: Always include if an issue exists. Use `Fixes #NNN` for bugs, `Closes #NNN` for features/tasks
157+
- **Changes**: Default to a flat list. Use Added/Changed/Removed/Fixed subheadings only for large or mixed PRs; never emit empty subsection headings
158+
- **Testing**: Check off items that apply. Mark N/A items explicitly rather than leaving them unchecked without explanation
159+
- **Checklist**: Check off items that are true. Leave unchecked with a note if something doesn't apply
123160
- **Attention Areas**: Only include if there are genuinely important items; omit for simple PRs
124161
- **Links**: Include links to code and commits where helpful for reviewers:
125-
- **File links require full URLs** - relative paths don't work in PR descriptions
162+
- **File links require full URLs** relative paths don't work in PR descriptions
126163
- Link to a file: `[filename](https://github.com/<owner>/<repo>/blob/<branch>/path/to/file.py)`
127164
- Link to specific lines: `[description](https://github.com/<owner>/<repo>/blob/<branch>/path/to/file.py#L42-L50)`
128165
- Use the branch name (from Step 1) in the URL so links point to the PR's version of files
129-
- Reference commits: `abc1234` - GitHub auto-links short commit SHAs in PR descriptions
166+
- Reference commits: `abc1234` GitHub auto-links short commit SHAs in PR descriptions
130167
- For multi-commit PRs, reference individual commits when describing specific changes
131168
132169
## Edge Cases
133170
134171
- **No changes**: Inform user there's nothing to create a PR for
135172
- **Uncommitted work**: Warn and ask before proceeding
136-
- **Large PRs** (>20 files): Summarize by directory/module
173+
- **Large PRs** (>20 files): Summarize by directory/module; grouped Changes often helps here
137174
- **Single commit**: PR title can match commit message
175+
- **No related issue**: Note "N/A" in the Related Issue section rather than omitting it

.agents/skills/review-code/SKILL.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,11 @@ Read the following files at the repository root to load the project's standards
9494
- **`STYLEGUIDE.md`** — code style rules (formatting, naming, imports, type annotations), design principles (DRY, KISS, YAGNI, SOLID), common pitfalls, lazy loading and `TYPE_CHECKING` patterns
9595
- **`DEVELOPMENT.md`** — testing patterns and expectations
9696

97+
**Documentation sources (load when the changeset touches matching areas):**
98+
99+
- **`architecture/*.md`** — subsystem maps aligned with `packages/` (e.g. `engine/mcp/``architecture/mcp.md`). Use to verify the PR does not leave recorded architecture false relative to new behavior.
100+
- **`docs/`** — published user-facing documentation. Cross-check when public API, CLI behavior, or config surface changes would affect what readers are told.
101+
97102
Use these guidelines as the baseline for the entire review. Project-specific rules take precedence over general best practices.
98103

99104
## Step 3: Understand the Scope
@@ -147,6 +152,17 @@ Re-read the changed files with a focus on **structure and design of the new/modi
147152
- Obvious inefficiencies introduced by this change (N+1 queries, repeated computation, unnecessary copies)
148153
- Appropriate data structures for the access pattern
149154

155+
**Documentation alignment (same pass — scoped, not a full docs audit):**
156+
157+
When **code** under `packages/` changes behavior, structure, or public contracts in a way that a maintainer would reasonably describe in `architecture/` or `docs/`:
158+
159+
1. Identify the closest **`architecture/<topic>.md`** (and any obvious `docs/` pages) for that subsystem.
160+
2. If the PR **also edits** those docs, sanity-check that the edits match the code.
161+
3. If the PR **does not** edit docs but the change **contradicts** what `architecture/` or `docs/` currently asserts, flag it (**Warnings** if contributors rely on that text; **Suggestions** if impact is narrow). Suggest updating the same PR or an explicit follow-up issue.
162+
4. **Skip** this check for pure refactors with no observable behavior change, typo-only PRs, or changes already limited to documentation.
163+
164+
The local **`search-docs`** skill can help locate `docs/` pages by topic when the right file is not obvious.
165+
150166
### Pass 3: Standards, Testing & Polish
151167

152168
Final pass focused on **project conventions and test quality for new/modified code only**:

.github/ISSUE_TEMPLATE/bug-report.yml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,28 @@ body:
4444
placeholder: A clear and concise description of what you expected to happen.
4545
validations:
4646
required: true
47+
- type: textarea
48+
id: agent-diagnostic
49+
attributes:
50+
label: Agent Diagnostic / Prior Investigation
51+
description: |
52+
If you used an agent, paste the output from its investigation (for example, what it found in the docs or issue tracker).
53+
If you couldn't or didn't use an agent, briefly say why and include the troubleshooting you already tried.
54+
placeholder: |
55+
Paste agent output here, or describe the manual investigation you performed.
4756
- type: textarea
4857
id: context
4958
attributes:
5059
label: Additional context
5160
placeholder: Add any other context about the problem here (e.g., screenshots, logs, browser version).
61+
- type: checkboxes
62+
id: checklist
63+
attributes:
64+
label: Checklist
65+
options:
66+
- label: I reproduced this issue or provided a minimal example
67+
required: true
68+
- label: I searched the docs/issues myself, or had my agent do so
69+
required: true
70+
- label: If I used an agent, I included its diagnostics above
71+
required: false

.github/ISSUE_TEMPLATE/config.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,7 @@ blank_issues_enabled: false
22
contact_links:
33
- name: 💬 Ask a Question
44
url: https://github.com/NVIDIA-NeMo/DataDesigner/discussions
5-
about: Please use GitHub Discussions for general questions.
5+
about: >-
6+
Have a question? Try pointing your agent at the repo first — it can search docs,
7+
find issues, and more. See CONTRIBUTING.md for the recommended workflow.
8+
If that doesn't help, use GitHub Discussions.

.github/ISSUE_TEMPLATE/development-task.yml

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
name: 🛠️ Development Task
2-
description: Track internal development work, refactoring, or infrastructure
2+
description: Track internal development work, refactoring, or infrastructure changes
33
labels: ["task"]
44
body:
55
- type: dropdown
@@ -25,6 +25,20 @@ body:
2525
attributes:
2626
label: Technical Details & Implementation Plan
2727
placeholder: Describe the technical approach, files affected, or logic changes.
28+
- type: textarea
29+
id: investigation
30+
attributes:
31+
label: Investigation / Context
32+
description: |
33+
Relevant issue links, architecture context, or notes from prior exploration.
34+
placeholder: Link related issues, reference architecture docs, or describe relevant context.
35+
- type: textarea
36+
id: agent-plan
37+
attributes:
38+
label: Agent Plan / Findings
39+
description: |
40+
If an agent investigated this task, paste its findings or proposed plan here.
41+
placeholder: Paste agent output here, if applicable.
2842
- type: input
2943
id: dependencies
3044
attributes:

.github/ISSUE_TEMPLATE/feature-request.yml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,24 @@ body:
3838
attributes:
3939
label: Describe alternatives you've considered
4040
placeholder: A clear and concise description of any alternative solutions or features you've considered.
41+
- type: textarea
42+
id: agent-investigation
43+
attributes:
44+
label: Agent Investigation
45+
description: |
46+
If your agent explored the codebase to assess feasibility (for example by searching project documentation or existing issues), paste its findings here.
47+
placeholder: Paste agent output here, if applicable.
4148
- type: textarea
4249
id: context
4350
attributes:
4451
label: Additional context
4552
placeholder: Add any other context or screenshots about the feature request here.
53+
- type: checkboxes
54+
id: checklist
55+
attributes:
56+
label: Checklist
57+
options:
58+
- label: I've reviewed existing issues and the documentation
59+
required: true
60+
- label: This is a design proposal, not a "please build this" request
61+
required: true

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
## 📋 Summary
2+
<!-- 1-3 sentences: what this PR does and why -->
3+
4+
## 🔗 Related Issue
5+
<!-- Link to the issue this addresses: Fixes #NNN or Closes #NNN -->
6+
7+
## 🔄 Changes
8+
<!--
9+
Default: a flat bullet list (delete this comment when you fill in).
10+
11+
Optional — for large or mixed PRs, group under the headings below and omit any empty section:
12+
### ✨ Added
13+
### 🔧 Changed
14+
### 🗑️ Removed
15+
### 🐛 Fixed
16+
-->
17+
18+
## 🧪 Testing
19+
<!-- What testing was done? -->
20+
- [ ] `make test` passes
21+
- [ ] Unit tests added/updated
22+
- [ ] E2E tests added/updated (if applicable)
23+
24+
## ✅ Checklist
25+
- [ ] Follows commit message conventions
26+
- [ ] Commits are signed off (DCO)
27+
- [ ] Architecture docs updated (if applicable)
Lines changed: 70 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,82 @@
11
# Agent Introspection
22

3-
> Stub — to be populated. See source code at `packages/data-designer/src/data_designer/cli/`.
3+
The agent introspection subsystem provides machine-readable CLI commands that let agents discover DataDesigner's type system, configuration state, and available operations at runtime.
4+
5+
Source: `packages/data-designer/src/data_designer/cli/commands/agent.py` and `packages/data-designer/src/data_designer/cli/utils/agent_introspection.py`
46

57
## Overview
6-
<!-- Agent CLI commands, type discovery, family specs -->
8+
9+
Agent introspection solves a specific problem: when an agent helps someone **author a dataset configuration** (columns, samplers, validators, processors, and related options), it needs an accurate catalog of what is available — including types added by installed plugins. Rather than hardcoding that knowledge or parsing source code, the agent can call `data-designer agent` commands to get structured, up-to-date information.
710

811
## Key Components
9-
<!-- Main classes, modules, entry points -->
12+
13+
### Commands
14+
15+
All commands live under the `data-designer agent` group:
16+
17+
| Command | Purpose |
18+
|---------|---------|
19+
| `data-designer agent context` | Full context dump: version, paths, type catalogs, model aliases, persona state, available operations |
20+
| `data-designer agent types [family]` | Type catalog for one or all families, with descriptions and source file locations |
21+
| `data-designer agent state model-aliases` | Configured model aliases with usability status (missing provider, missing API key, etc.) |
22+
| `data-designer agent state persona-datasets` | Available persona datasets with download status per locale |
23+
24+
### FamilySpec
25+
26+
Maps a **family name** to a **discriminated union type** and its **discriminator field**:
27+
28+
| Family | Union Type | Discriminator |
29+
|--------|-----------|---------------|
30+
| `columns` | `ColumnConfigT` | `column_type` |
31+
| `samplers` | `SamplerParamsT` | `sampler_type` |
32+
| `validators` | `ValidatorParamsT` | `validator_type` |
33+
| `processors` | `ProcessorConfigT` | `processor_type` |
34+
| `constraints` | `ColumnConstraintT` | `constraint_type` |
35+
36+
### Type Discovery
37+
38+
`discover_family_types` walks `typing.get_args(type_union)`, reads each Pydantic model's discriminator field annotation (must be `Literal[...]`), and builds a map of discriminator string → model class. Detects and reports duplicate discriminator values.
39+
40+
`get_family_catalog` yields the class name and first docstring paragraph for each type — enough for an agent to understand what each type does without reading source code.
41+
42+
`get_family_source_files` uses `inspect.getfile` and normalizes paths under `data_designer/` (absolute path fallback for plugin types outside the tree).
43+
44+
### State Commands
45+
46+
Reuse the CLI's repository stack:
47+
- **Model aliases**: `ModelRepository` + `ProviderRepository` + `get_providers_with_missing_api_keys` to report usability status (configured, missing provider, missing API key)
48+
- **Personas**: `PersonaRepository` + `DownloadService` for locale availability and download status
49+
50+
### Error Handling
51+
52+
`AgentIntrospectionError` carries a `code`, `message`, and `details` dict. Commands catch these and output structured error information to stderr with exit code 1, making errors parseable by agents.
53+
54+
### Command Registration
55+
56+
`AGENT_COMMANDS` in `agent_command_defs.py` drives both the lazy Typer command map in `main.py` and `get_operations()` in introspection. This single source of truth ensures the operations table in `agent context` output stays in sync with the actual commands.
1057

1158
## Data Flow
12-
<!-- How data moves through this subsystem -->
59+
60+
```
61+
Agent calls: data-designer agent types columns
62+
→ Typer dispatches to agent.get_types("columns")
63+
→ FamilySpec maps "columns" → ColumnConfigT union
64+
→ discover_family_types walks union members
65+
→ get_family_catalog extracts names + descriptions
66+
→ get_family_source_files resolves source locations
67+
→ Formatted output returned to agent
68+
```
1369

1470
## Design Decisions
15-
<!-- Why things are the way they are -->
71+
72+
- **Declarative type discovery from Pydantic unions** rather than maintaining a separate type inventory. The discriminated unions are the source of truth for what types exist (including plugins), so introspection reads directly from them.
73+
- **Structured errors with codes** enable agents to handle failures programmatically (retry, report, escalate) rather than parsing human-readable error messages.
74+
- **Single command registration source** (`AGENT_COMMANDS`) prevents the operations table from drifting out of sync with actual CLI commands.
75+
- **Source file resolution** helps agents navigate to implementations when they need to understand a type's behavior, not just its existence.
1676

1777
## Cross-References
18-
<!-- Links to related architecture docs -->
19-
- [System Architecture](overview.md)
20-
- [CLI](cli.md)
21-
- [Config Layer](config.md)
78+
79+
- [System Architecture](overview.md) — where agent introspection fits
80+
- [CLI](cli.md) — the CLI architecture that hosts these commands
81+
- [Config Layer](config.md) — the discriminated unions that introspection reads
82+
- [Plugins](plugins.md) — how plugin types appear in introspection results

0 commit comments

Comments
 (0)