Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 59 additions & 21 deletions .agents/skills/create-pr/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: create-pr
description: Create a GitHub PR with a well-formatted description including summary, categorized changes, and attention areas
description: Create a GitHub PR with a well-formatted description matching the repository PR template (flat Changes by default; optional Added/Changed/Removed/Fixed grouping)
argument-hint: [special instructions]
disable-model-invocation: true
metadata:
Expand All @@ -9,7 +9,7 @@ metadata:

# Create Pull Request

Create a well-formatted GitHub pull request for the current branch.
Create a well-formatted GitHub pull request for the current branch. The PR description must conform to the repository's PR template (`.github/PULL_REQUEST_TEMPLATE.md`).

## Arguments

Expand Down Expand Up @@ -40,15 +40,21 @@ Run these commands in parallel to understand the changes:

## Step 2: Analyze and Categorize Changes

### By Change Type (from commits and diff)
Use change types below to **decide** how to write the Changes section (flat vs grouped). You still describe testing under **Testing**, not under these buckets.

### By change type (internal checklist)
- ✨ **Added**: New files, features, capabilities
- πŸ”§ **Changed**: Modified existing functionality
- πŸ—‘οΈ **Removed**: Deleted files or features
- πŸ› **Fixed**: Bug fixes
- πŸ“š **Docs**: Documentation updates
- πŸ§ͺ **Tests**: Test additions/modifications

### Identify Attention Areas πŸ”
### When to use optional grouping in **Changes**
- **Flat bullet list** (default): Small PRs, single theme, or when categories would be sparse or redundant.
- **Grouped subheadings** (`### ✨ Added`, `### πŸ”§ Changed`, `### πŸ—‘οΈ Removed`, `### πŸ› Fixed`): Large PRs, release-note-style summaries, or clearly distinct fix + feature mixes. **Omit any empty section** β€” do not leave placeholder headings.

### Identify attention areas
Flag for special reviewer attention:
- Files with significant changes (>100 lines)
- Changes to base classes, interfaces, or public API
Expand All @@ -75,35 +81,63 @@ If commits have mixed types, use the primary/most significant type.
git push -u origin <branch-name>
```

2. **Create PR** using this template:
2. **Build the PR body** using the repository's template structure.

**Default β€” flat Changes** (remove the HTML comment block from the template when filling in, or replace with your bullets only):

```markdown
## πŸ“‹ Summary

[1-2 sentence overview of what this PR accomplishes]
[1-3 sentences: what this PR does and why. Focus on the "why".]

## πŸ”— Related Issue

[Fixes #NNN or Closes #NNN β€” link to the issue this addresses]

## πŸ”„ Changes

- [Bullet list of key changes]
- [Link to key files when helpful for reviewers]
- [Reference commits for specific changes in multi-commit PRs]

## πŸ§ͺ Testing

- [x] `make test` passes
- [x] Unit tests added/updated (or: N/A β€” no testable logic)
- [ ] E2E tests added/updated (if applicable)

## βœ… Checklist

- [x] Follows commit message conventions
- [x] Commits are signed off (DCO)
- [ ] Architecture docs updated (if applicable)
```

**Optional β€” grouped Changes** (only when Step 2 criteria apply; omit empty sections):

```markdown
## πŸ”„ Changes

### ✨ Added
- [New features/files - link to key files when helpful]
- [...]

### πŸ”§ Changed
- [Modified functionality - reference commits for specific changes]

### πŸ—‘οΈ Removed
- [Deleted items]
- [...]

### πŸ› Fixed
- [Bug fixes - if applicable]
- [...]
```

(Include `### πŸ—‘οΈ Removed` only when something was deleted.)

If there are genuinely important attention areas for reviewers, add an **Attention Areas** section after Changes:

```markdown
## πŸ” Attention Areas

> ⚠️ **Reviewers:** Please pay special attention to the following:

- [`path/to/critical/file.py`](https://github.com/<owner>/<repo>/blob/<branch>/path/to/critical/file.py) - [Why this needs attention]

---
πŸ€– *Generated with AI*
- [`path/to/critical/file.py`](https://github.com/<owner>/<repo>/blob/<branch>/path/to/critical/file.py) β€” [Why this needs attention]
```

3. **Execute**:
Expand All @@ -118,20 +152,24 @@ If commits have mixed types, use the primary/most significant type.

## Section Guidelines

- **Summary**: Always include - be concise and focus on the "why"
- **Changes**: Group by type, omit empty sections
- **Summary**: Always include β€” be concise and focus on the "why", not just the "what"
- **Related Issue**: Always include if an issue exists. Use `Fixes #NNN` for bugs, `Closes #NNN` for features/tasks
- **Changes**: Default to a flat list. Use Added/Changed/Removed/Fixed subheadings only for large or mixed PRs; never emit empty subsection headings
- **Testing**: Check off items that apply. Mark N/A items explicitly rather than leaving them unchecked without explanation
- **Checklist**: Check off items that are true. Leave unchecked with a note if something doesn't apply
- **Attention Areas**: Only include if there are genuinely important items; omit for simple PRs
- **Links**: Include links to code and commits where helpful for reviewers:
- **File links require full URLs** - relative paths don't work in PR descriptions
- **File links require full URLs** β€” relative paths don't work in PR descriptions
- Link to a file: `[filename](https://github.com/<owner>/<repo>/blob/<branch>/path/to/file.py)`
- Link to specific lines: `[description](https://github.com/<owner>/<repo>/blob/<branch>/path/to/file.py#L42-L50)`
- Use the branch name (from Step 1) in the URL so links point to the PR's version of files
- Reference commits: `abc1234` - GitHub auto-links short commit SHAs in PR descriptions
- Reference commits: `abc1234` β€” GitHub auto-links short commit SHAs in PR descriptions
- For multi-commit PRs, reference individual commits when describing specific changes

## Edge Cases

- **No changes**: Inform user there's nothing to create a PR for
- **Uncommitted work**: Warn and ask before proceeding
- **Large PRs** (>20 files): Summarize by directory/module
- **Large PRs** (>20 files): Summarize by directory/module; grouped Changes often helps here
- **Single commit**: PR title can match commit message
- **No related issue**: Note "N/A" in the Related Issue section rather than omitting it
16 changes: 16 additions & 0 deletions .agents/skills/review-code/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,11 @@ Read the following files at the repository root to load the project's standards
- **`STYLEGUIDE.md`** β€” code style rules (formatting, naming, imports, type annotations), design principles (DRY, KISS, YAGNI, SOLID), common pitfalls, lazy loading and `TYPE_CHECKING` patterns
- **`DEVELOPMENT.md`** β€” testing patterns and expectations

**Documentation sources (load when the changeset touches matching areas):**

- **`architecture/*.md`** β€” subsystem maps aligned with `packages/` (e.g. `engine/mcp/` ↔ `architecture/mcp.md`). Use to verify the PR does not leave recorded architecture false relative to new behavior.
- **`docs/`** β€” published user-facing documentation. Cross-check when public API, CLI behavior, or config surface changes would affect what readers are told.

Use these guidelines as the baseline for the entire review. Project-specific rules take precedence over general best practices.

## Step 3: Understand the Scope
Expand Down Expand Up @@ -147,6 +152,17 @@ Re-read the changed files with a focus on **structure and design of the new/modi
- Obvious inefficiencies introduced by this change (N+1 queries, repeated computation, unnecessary copies)
- Appropriate data structures for the access pattern

**Documentation alignment (same pass β€” scoped, not a full docs audit):**

When **code** under `packages/` changes behavior, structure, or public contracts in a way that a maintainer would reasonably describe in `architecture/` or `docs/`:

1. Identify the closest **`architecture/<topic>.md`** (and any obvious `docs/` pages) for that subsystem.
2. If the PR **also edits** those docs, sanity-check that the edits match the code.
3. If the PR **does not** edit docs but the change **contradicts** what `architecture/` or `docs/` currently asserts, flag it (**Warnings** if contributors rely on that text; **Suggestions** if impact is narrow). Suggest updating the same PR or an explicit follow-up issue.
4. **Skip** this check for pure refactors with no observable behavior change, typo-only PRs, or changes already limited to documentation.

The local **`search-docs`** skill can help locate `docs/` pages by topic when the right file is not obvious.

### Pass 3: Standards, Testing & Polish

Final pass focused on **project conventions and test quality for new/modified code only**:
Expand Down
20 changes: 20 additions & 0 deletions .github/ISSUE_TEMPLATE/bug-report.yml
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,28 @@ body:
placeholder: A clear and concise description of what you expected to happen.
validations:
required: true
- type: textarea
id: agent-diagnostic
attributes:
label: Agent Diagnostic / Prior Investigation
description: |
If you used an agent, paste the output from its investigation (for example, what it found in the docs or issue tracker).
If you couldn't or didn't use an agent, briefly say why and include the troubleshooting you already tried.
placeholder: |
Paste agent output here, or describe the manual investigation you performed.
- type: textarea
id: context
attributes:
label: Additional context
placeholder: Add any other context about the problem here (e.g., screenshots, logs, browser version).
- type: checkboxes
id: checklist
attributes:
label: Checklist
options:
- label: I reproduced this issue or provided a minimal example
required: true
- label: I searched the docs/issues myself, or had my agent do so
required: true
- label: If I used an agent, I included its diagnostics above
required: false
5 changes: 4 additions & 1 deletion .github/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,7 @@ blank_issues_enabled: false
contact_links:
- name: πŸ’¬ Ask a Question
url: https://github.com/NVIDIA-NeMo/DataDesigner/discussions
about: Please use GitHub Discussions for general questions.
about: >-
Have a question? Try pointing your agent at the repo first β€” it can search docs,
find issues, and more. See CONTRIBUTING.md for the recommended workflow.
If that doesn't help, use GitHub Discussions.
16 changes: 15 additions & 1 deletion .github/ISSUE_TEMPLATE/development-task.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: πŸ› οΈ Development Task
description: Track internal development work, refactoring, or infrastructure
description: Track internal development work, refactoring, or infrastructure changes
labels: ["task"]
body:
- type: dropdown
Expand All @@ -25,6 +25,20 @@ body:
attributes:
label: Technical Details & Implementation Plan
placeholder: Describe the technical approach, files affected, or logic changes.
- type: textarea
id: investigation
attributes:
label: Investigation / Context
description: |
Relevant issue links, architecture context, or notes from prior exploration.
placeholder: Link related issues, reference architecture docs, or describe relevant context.
- type: textarea
id: agent-plan
attributes:
label: Agent Plan / Findings
description: |
If an agent investigated this task, paste its findings or proposed plan here.
placeholder: Paste agent output here, if applicable.
- type: input
id: dependencies
attributes:
Expand Down
16 changes: 16 additions & 0 deletions .github/ISSUE_TEMPLATE/feature-request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,24 @@ body:
attributes:
label: Describe alternatives you've considered
placeholder: A clear and concise description of any alternative solutions or features you've considered.
- type: textarea
id: agent-investigation
attributes:
label: Agent Investigation
description: |
If your agent explored the codebase to assess feasibility (for example by searching project documentation or existing issues), paste its findings here.
placeholder: Paste agent output here, if applicable.
- type: textarea
id: context
attributes:
label: Additional context
placeholder: Add any other context or screenshots about the feature request here.
- type: checkboxes
id: checklist
attributes:
label: Checklist
options:
- label: I've reviewed existing issues and the documentation
required: true
- label: This is a design proposal, not a "please build this" request
required: true
27 changes: 27 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
## πŸ“‹ Summary
<!-- 1-3 sentences: what this PR does and why -->

## πŸ”— Related Issue
<!-- Link to the issue this addresses: Fixes #NNN or Closes #NNN -->

## πŸ”„ Changes
<!--
Default: a flat bullet list (delete this comment when you fill in).

Optional β€” for large or mixed PRs, group under the headings below and omit any empty section:
### ✨ Added
### πŸ”§ Changed
### πŸ—‘οΈ Removed
### πŸ› Fixed
-->

## πŸ§ͺ Testing
<!-- What testing was done? -->
- [ ] `make test` passes
- [ ] Unit tests added/updated
- [ ] E2E tests added/updated (if applicable)

## βœ… Checklist
- [ ] Follows commit message conventions
- [ ] Commits are signed off (DCO)
- [ ] Architecture docs updated (if applicable)
79 changes: 70 additions & 9 deletions architecture/agent-introspection.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,82 @@
# Agent Introspection

> Stub β€” to be populated. See source code at `packages/data-designer/src/data_designer/cli/`.
The agent introspection subsystem provides machine-readable CLI commands that let agents discover DataDesigner's type system, configuration state, and available operations at runtime.

Source: `packages/data-designer/src/data_designer/cli/commands/agent.py` and `packages/data-designer/src/data_designer/cli/utils/agent_introspection.py`

## Overview
<!-- Agent CLI commands, type discovery, family specs -->

Agent introspection solves a specific problem: when an agent helps someone **author a dataset configuration** (columns, samplers, validators, processors, and related options), it needs an accurate catalog of what is available β€” including types added by installed plugins. Rather than hardcoding that knowledge or parsing source code, the agent can call `data-designer agent` commands to get structured, up-to-date information.

## Key Components
<!-- Main classes, modules, entry points -->

### Commands

All commands live under the `data-designer agent` group:

| Command | Purpose |
|---------|---------|
| `data-designer agent context` | Full context dump: version, paths, type catalogs, model aliases, persona state, available operations |
| `data-designer agent types [family]` | Type catalog for one or all families, with descriptions and source file locations |
| `data-designer agent state model-aliases` | Configured model aliases with usability status (missing provider, missing API key, etc.) |
| `data-designer agent state persona-datasets` | Available persona datasets with download status per locale |

### FamilySpec

Maps a **family name** to a **discriminated union type** and its **discriminator field**:

| Family | Union Type | Discriminator |
|--------|-----------|---------------|
| `columns` | `ColumnConfigT` | `column_type` |
| `samplers` | `SamplerParamsT` | `sampler_type` |
| `validators` | `ValidatorParamsT` | `validator_type` |
| `processors` | `ProcessorConfigT` | `processor_type` |
| `constraints` | `ColumnConstraintT` | `constraint_type` |

### Type Discovery

`discover_family_types` walks `typing.get_args(type_union)`, reads each Pydantic model's discriminator field annotation (must be `Literal[...]`), and builds a map of discriminator string β†’ model class. Detects and reports duplicate discriminator values.

`get_family_catalog` yields the class name and first docstring paragraph for each type β€” enough for an agent to understand what each type does without reading source code.

`get_family_source_files` uses `inspect.getfile` and normalizes paths under `data_designer/` (absolute path fallback for plugin types outside the tree).

### State Commands

Reuse the CLI's repository stack:
- **Model aliases**: `ModelRepository` + `ProviderRepository` + `get_providers_with_missing_api_keys` to report usability status (configured, missing provider, missing API key)
- **Personas**: `PersonaRepository` + `DownloadService` for locale availability and download status

### Error Handling

`AgentIntrospectionError` carries a `code`, `message`, and `details` dict. Commands catch these and output structured error information to stderr with exit code 1, making errors parseable by agents.

### Command Registration

`AGENT_COMMANDS` in `agent_command_defs.py` drives both the lazy Typer command map in `main.py` and `get_operations()` in introspection. This single source of truth ensures the operations table in `agent context` output stays in sync with the actual commands.

## Data Flow
<!-- How data moves through this subsystem -->

```
Agent calls: data-designer agent types columns
β†’ Typer dispatches to agent.get_types("columns")
β†’ FamilySpec maps "columns" β†’ ColumnConfigT union
β†’ discover_family_types walks union members
β†’ get_family_catalog extracts names + descriptions
β†’ get_family_source_files resolves source locations
β†’ Formatted output returned to agent
```

## Design Decisions
<!-- Why things are the way they are -->

- **Declarative type discovery from Pydantic unions** rather than maintaining a separate type inventory. The discriminated unions are the source of truth for what types exist (including plugins), so introspection reads directly from them.
- **Structured errors with codes** enable agents to handle failures programmatically (retry, report, escalate) rather than parsing human-readable error messages.
- **Single command registration source** (`AGENT_COMMANDS`) prevents the operations table from drifting out of sync with actual CLI commands.
- **Source file resolution** helps agents navigate to implementations when they need to understand a type's behavior, not just its existence.

## Cross-References
<!-- Links to related architecture docs -->
- [System Architecture](overview.md)
- [CLI](cli.md)
- [Config Layer](config.md)

- [System Architecture](overview.md) β€” where agent introspection fits
- [CLI](cli.md) β€” the CLI architecture that hosts these commands
- [Config Layer](config.md) β€” the discriminated unions that introspection reads
- [Plugins](plugins.md) β€” how plugin types appear in introspection results
Loading
Loading