NVIDIA · mosheabr · Jun 2, 2026 · Jun 2, 2026
@@ -103,6 +103,7 @@ For non-interactive installs, global installs, agent-specific installs, updates,
 | **cuOpt** | GPU-accelerated optimization — vehicle routing, linear programming, quadratic programming, installation, server deployment, and developer tools. | [`cuopt-developer`](skills/cuopt-developer), [`cuopt-install`](skills/cuopt-install), [`cuopt-numerical-optimization-api-c`](skills/cuopt-numerical-optimization-api-c), [`cuopt-numerical-optimization-api-cli`](skills/cuopt-numerical-optimization-api-cli), [`cuopt-numerical-optimization-api-python`](skills/cuopt-numerical-optimization-api-python), [`cuopt-numerical-optimization-formulation`](skills/cuopt-numerical-optimization-formulation), [`cuopt-routing-api-python`](skills/cuopt-routing-api-python), [`cuopt-routing-formulation`](skills/cuopt-routing-formulation), [`cuopt-server-api-python`](skills/cuopt-server-api-python), [`cuopt-server-common`](skills/cuopt-server-common), [`cuopt-skill-evolution`](skills/cuopt-skill-evolution), [`cuopt-user-rules`](skills/cuopt-user-rules) |
 | **cuPyNumeric** | NumPy and SciPy on multi-node multi-GPU systems — skills to help with installing cuPyNumeric, migrating existing NumPy code, and doing parallel I/O | [`cupynumeric-hdf5`](skills/cupynumeric-hdf5), [`cupynumeric-install`](skills/cupynumeric-install), [`cupynumeric-migration-readiness`](skills/cupynumeric-migration-readiness), [`cupynumeric-parallel-data-load`](skills/cupynumeric-parallel-data-load) |
 | **DALI** | GPU-accelerated data loading and processing with NVIDIA DALI. | [`dali-dynamic-mode`](skills/dali-dynamic-mode) |
+| **Data Designer** | Build declarative synthetic dataset generation pipelines with NeMo Data Designer. | [`data-designer`](skills/data-designer) |
 | **DeepStream** | Agentic skills for guided DeepStream development. | [`deepstream-dev`](skills/deepstream-dev), [`deepstream-import-vision-model`](skills/deepstream-import-vision-model) |
 | **Dynamo** | NVIDIA Dynamo deployment bring-up on Kubernetes — pick and deploy recipes, start router modes, validate disagg NIXL/UCX/NCCL interconnect, and triage day-2 failures. | [`dynamo-interconnect-check`](skills/dynamo-interconnect-check), [`dynamo-recipe-runner`](skills/dynamo-recipe-runner), [`dynamo-router-starter`](skills/dynamo-router-starter), [`dynamo-troubleshoot`](skills/dynamo-troubleshoot) |
 | **Earth2Studio** | Open-source deep-learning framework for exploring, building and deploying AI weather/climate workflows. | [`earth2studio-data-fetch`](skills/earth2studio-data-fetch), [`earth2studio-deterministic-forecast`](skills/earth2studio-deterministic-forecast), [`earth2studio-discover`](skills/earth2studio-discover), [`earth2studio-install`](skills/earth2studio-install) |
@@ -148,6 +149,7 @@ Per-product source repo links:
 | **cuOpt** | [Issues](https://github.com/NVIDIA/cuopt/issues) | [Discussions](https://github.com/NVIDIA/cuopt/discussions) | [Contributing](https://github.com/NVIDIA/cuopt/blob/main/CONTRIBUTING.md) | [Security](https://github.com/NVIDIA/cuopt/blob/main/SECURITY.md) |
 | **cuPyNumeric** | [Issues](https://github.com/nv-legate/cupynumeric/issues) | — | [Contributing](https://github.com/nv-legate/cupynumeric/blob/main/CONTRIBUTING.md) | — |
 | **DALI** | [Issues](https://github.com/NVIDIA/DALI/issues) | — | [Contributing](https://github.com/NVIDIA/DALI/blob/main/CONTRIBUTING.md) | — |
+| **Data Designer** | [Issues](https://github.com/NVIDIA-NeMo/DataDesigner/issues) | [Discussions](https://github.com/NVIDIA-NeMo/DataDesigner/discussions) | [Contributing](https://github.com/NVIDIA-NeMo/DataDesigner/blob/main/CONTRIBUTING.md) | [Security](https://github.com/NVIDIA-NeMo/DataDesigner/blob/main/SECURITY.md) |
 | **DeepStream** | [Issues](https://github.com/NVIDIA-AI-IOT/DeepStream_Coding_Agent/issues) | — | [Contributing](https://github.com/NVIDIA-AI-IOT/DeepStream_Coding_Agent/blob/main/CONTRIBUTING.md) | [Security](https://github.com/NVIDIA-AI-IOT/DeepStream_Coding_Agent/blob/main/SECURITY.md) |
 | **Dynamo** | [Issues](https://github.com/ai-dynamo/dynamo/issues) | [Discussions](https://github.com/ai-dynamo/dynamo/discussions) | [Contributing](https://github.com/ai-dynamo/dynamo/blob/main/CONTRIBUTING.md) | [Security](https://github.com/ai-dynamo/dynamo/blob/main/SECURITY.md) |
 | **Earth2Studio** | [Issues](https://github.com/NVIDIA/earth2studio/issues) | [Discussions](https://github.com/NVIDIA/earth2studio/discussions) | [Contributing](https://github.com/NVIDIA/earth2studio/blob/main/CONTRIBUTING.md) | — |

@@ -7,7 +7,7 @@ This benchmark summarizes 3-Tier Evaluation from NVSkills-Eval results for the s
 ## Evaluation Summary
 
 - Skill: `cupynumeric-hdf5`
-- Evaluation date: 2026-05-29
+- Evaluation date: 2026-06-02
 - NVSkills-Eval profile: `external`
 - Environment: `local`
 - Dataset: 17 evaluation tasks
@@ -54,11 +54,11 @@ Task composition is derived from the evaluation dataset when possible. Entries w
 
 | Dimension | Num | `claude-code` | `codex` |
 |---|---:|---:|---:|
-| Security | 8 | 100% (+6%) | 100% (+0%) |
-| Correctness | 8 | 90% (+4%) | 93% (+9%) |
-| Discoverability | 8 | 80% (+17%) | 80% (+7%) |
-| Effectiveness | 8 | 90% (+4%) | 92% (+16%) |
-| Efficiency | 8 | 80% (+24%) | 73% (+7%) |
+| Security | 8 | 100% (+3%) | 100% (+0%) |
+| Correctness | 8 | 92% (+9%) | 96% (+12%) |
+| Discoverability | 8 | 88% (+20%) | 85% (+11%) |
+| Effectiveness | 8 | 93% (+12%) | 94% (+20%) |
+| Efficiency | 8 | 86% (+27%) | 79% (+12%) |
 
 Score values show skill-assisted performance. Values in parentheses show uplift versus the no-skill baseline when baseline data is available.
 

@@ -64,12 +64,12 @@
             "Unpacks each yield as `(chunk, offsets)` and converts the chunk with `cn.asarray`",
             "Places each chunk by its actual shape/offsets (accounts for clipped boundary chunks)",
             "Ends with a blocking execution fence",
-            "Clarifies that from_file_batched chunks the file read — the preallocated array (`cn.empty(shape)`) still has to fit in distributed memory",
+            "Clarifies that from_file_batched chunks the file read \u2014 the preallocated array (`cn.empty(shape)`) still has to fit in distributed memory",
             "Uses only documented legate.io.hdf5 API and does not invent a streaming-write counterpart"
         ],
         "expected_script": null,
         "expected_skill": "cupynumeric-hdf5",
-        "ground_truth": "The agent uses `from_file_batched(path, dataset_name, chunk_size)`, which yields one `LogicalArray` per chunk plus the offsets where that chunk belongs in the global shape. It preallocates the destination with `cn.empty(shape, dtype)` (reading shape/dtype from h5py first), then for each `(chunk, offsets)` places `cn.asarray(chunk)` at `out[r0:r0+chunk.shape[0], ...]` using each chunk's actual shape because boundary chunks are clipped. It ends with `get_legate_runtime().issue_execution_fence(block=True)`. It clarifies that `from_file_batched` chunks the source-file read, not the result — the preallocated array must still fit in distributed memory. It may note `from_file_batched` raises `ValueError` if `chunk_size` is non-positive or its length differs from the dataset rank.",
+        "ground_truth": "The agent uses `from_file_batched(path, dataset_name, chunk_size)`, which yields one `LogicalArray` per chunk plus the offsets where that chunk belongs in the global shape. It preallocates the destination with `cn.empty(shape, dtype)` (reading shape/dtype from h5py first), then for each `(chunk, offsets)` places `cn.asarray(chunk)` at `out[r0:r0+chunk.shape[0], ...]` using each chunk's actual shape because boundary chunks are clipped. It ends with `get_legate_runtime().issue_execution_fence(block=True)`. It clarifies that `from_file_batched` chunks the source-file read, not the result \u2014 the preallocated array must still fit in distributed memory. It may note `from_file_batched` raises `ValueError` if `chunk_size` is non-positive or its length differs from the dataset rank.",
         "id": "hdf5-005-batched-streaming",
         "question": "I have a very large HDF5 dataset I can't read into host memory in one shot. How do I load it into a distributed cuPyNumeric array a chunk at a time?",
         "should_trigger": true
@@ -83,7 +83,7 @@
         ],
         "expected_script": null,
         "expected_skill": "cupynumeric-hdf5",
-        "ground_truth": "The agent explains that `legate.io.hdf5` imports `h5py` at module load, so the whole module fails to import until h5py is installed. The fix is `conda install -c conda-forge h5py`. It notes h5py is not part of the default cuPyNumeric environment. It does not run the install command itself.",
+        "ground_truth": "The agent explains that `legate.io.hdf5` imports `h5py` at module load, so the whole module fails to import until h5py is installed. The fix is `conda install -c conda-forge h5py`. It notes h5py is not part of the default cuPyNumeric environment.  It does not run the install command itself.",
         "id": "hdf5-006-h5py-prerequisite",
         "question": "On a fresh cuPyNumeric env, `from legate.io.hdf5 import to_file` raises `ModuleNotFoundError: No module named 'h5py'`. cuPyNumeric and legate import fine. What do I need?",
         "should_trigger": true
@@ -206,7 +206,7 @@
         ],
         "expected_script": null,
         "expected_skill": null,
-        "ground_truth": "Parquet/tabular interchange is outside this single-array HDF5 skill. The useful answer routes to the cupynumeric-parallel-data-load skill — which owns cuPyNumeric's no-built-in-loader paths for Parquet/Arrow/custom layouts — or simply states that HDF5 is not the right API. It does not recommend legate-dataframe (not supported), and does not suggest writing a Parquet column via the HDF5 API.",
+        "ground_truth": "Parquet/tabular interchange is outside this single-array HDF5 skill. The useful answer routes to the cupynumeric-parallel-data-load skill \u2014 which owns cuPyNumeric's no-built-in-loader paths for Parquet/Arrow/custom layouts \u2014 or simply states that HDF5 is not the right API. It does not recommend legate-dataframe (not supported), and does not suggest writing a Parquet column via the HDF5 API.",
         "id": "hdf5-neg-004-parquet-cudf",
         "question": "I have a cuPyNumeric array I want to expose as a column in a Parquet dataset that the cuDF team will load. What's the right path?",
         "should_trigger": false

@@ -9,7 +9,7 @@ NVIDIA <br>
 ### License/Terms of Use: <br>
 CC-BY-4.0 OR Apache-2.0 <br>
 ## Use Case: <br>
-Developers and engineers who need to save or load cuPyNumeric arrays to and from HDF5 files for large-scale distributed HPC and scientific computing workflows. <br>
+Developers and engineers who need to save cuPyNumeric arrays to HDF5 files, load HDF5 datasets into distributed cuPyNumeric arrays, read large datasets in chunks, or accelerate HDF5 disk I/O with GPUDirect Storage for HPC pipelines. <br>
 
 ### Deployment Geography for Use: <br>
 Global <br>
@@ -19,15 +19,15 @@ Risk: Review before execution as proposals could introduce incorrect or misleadi
 Mitigation: Review and scan skill before deployment. <br>
 
 ## Reference(s): <br>
-- [Legate I/O API Documentation](https://docs.nvidia.com/legate/latest/api/python/io/index.html) <br>
-- [cuPyNumeric GitHub](https://github.com/nv-legate/cupynumeric) <br>
-- [HDF5 — The HDF Group](https://www.hdfgroup.org/solutions/hdf5/) <br>
-- [VFD GDS Plugin](https://github.com/nv-legate/vfd-gds) <br>
+- [Legate HDF5 I/O API Documentation](https://docs.nvidia.com/legate/latest/api/python/io/index.html) <br>
+- [cuPyNumeric GitHub Repository](https://github.com/nv-legate/cupynumeric) <br>
+- [HDF5 - The HDF Group](https://www.hdfgroup.org/solutions/hdf5/) <br>
+- [VFD-GDS Plugin (GPUDirect Storage for HDF5)](https://github.com/nv-legate/vfd-gds) <br>
 
 
 ## Skill Output: <br>
-**Output Type(s):** [Code, Configuration instructions] <br>
-**Output Format:** [Markdown with inline Python code blocks] <br>
+**Output Type(s):** [Code, Shell commands, Configuration instructions] <br>
+**Output Format:** [Markdown with inline Python and bash code blocks] <br>
 **Output Parameters:** [1D] <br>
 **Other Properties Related to Output:** [None] <br>
 
@@ -38,7 +38,7 @@ Mitigation: Review and scan skill before deployment. <br>
 
 
 ## Evaluation Tasks: <br>
-Evaluated against 17 tasks (11 positive activation, 6 negative activation) with 2 attempts per task via NVSkills-Eval. <br>
+Evaluated against 17 evaluation tasks (11 positive activation, 6 negative activation) with 2 attempts per task and a 50% pass threshold. <br>
 
 ## Evaluation Metrics Used: <br>
 Reported benchmark dimensions: <br>
@@ -62,11 +62,11 @@ Underlying evaluation signals used in this run: <br>
 ## Evaluation Results: <br>
 | Dimension | Num | `claude-code` | `codex` |
 |---|---:|---:|---:|
-| Security | 8 | 100% (+6%) | 100% (+0%) |
-| Correctness | 8 | 90% (+4%) | 93% (+9%) |
-| Discoverability | 8 | 80% (+17%) | 80% (+7%) |
-| Effectiveness | 8 | 90% (+4%) | 92% (+16%) |
-| Efficiency | 8 | 80% (+24%) | 73% (+7%) |
+| Security | 8 | 100% (+3%) | 100% (+0%) |
+| Correctness | 8 | 92% (+9%) | 96% (+12%) |
+| Discoverability | 8 | 88% (+20%) | 85% (+11%) |
+| Effectiveness | 8 | 93% (+12%) | 94% (+20%) |
+| Efficiency | 8 | 86% (+27%) | 79% (+12%) |
 
 ## Skill Version(s): <br>
 2.0.0 (source: frontmatter) <br>