rajbos
diff --git a/‎.github/instructions/workflows.instructions.md‎
Lines changed: 113 additions & 0 deletions b/‎.github/instructions/workflows.instructions.md‎
Lines changed: 113 additions & 0 deletions
diff --git a/‎.github/skills/azure-storage-loader/package-lock.json‎
Lines changed: 27 additions & 8 deletions b/‎.github/skills/azure-storage-loader/package-lock.json‎
Lines changed: 27 additions & 8 deletions
diff --git a/‎.github/skills/check-urls/README.md‎
Lines changed: 28 additions & 0 deletions b/‎.github/skills/check-urls/README.md‎
Lines changed: 28 additions & 0 deletions
diff --git a/‎.github/skills/check-urls/SKILL.md‎
Lines changed: 52 additions & 0 deletions b/‎.github/skills/check-urls/SKILL.md‎
Lines changed: 52 additions & 0 deletions
@@ -0,0 +1,113 @@
+---
+applyTo: ".github/workflows/**"
+---
+
+# Workflow Security: Validating Untrusted User Input
+
+## Overview
+
+Workflows in this repository that are triggered by untrusted user input (issue
+bodies, PR descriptions, comments, branch names, etc.) **must** validate that
+input for hidden characters and potential prompt injection before processing it.
+
+This is especially important for workflows that pass user content to AI/LLM
+systems (e.g. GitHub Copilot agents), but also applies to any automated
+processing where a malicious actor could influence the workflow's behavior.
+
+## The Central Validation Script
+
+**`.github/workflows/validate-input.sh`** is the single, authoritative script
+for this check. It detects:
+
+| Threat | Description |
+|--------|-------------|
+| Bidirectional Unicode control characters | Trojan Source attack (CVE-2021-42574) — makes text look different to humans vs. AI |
+| Zero-width / invisible characters | Hidden text injected between visible characters, invisible to human reviewers |
+| Unicode tag characters (U+E0000–E007F) | Completely invisible; can encode arbitrary ASCII instructions |
+| Unicode variation selectors | Can steganographically encode hidden data |
+| HTML comment blocks (`<!-- ... -->`) | Stripped by GitHub's renderer but fully visible to LLMs processing raw Markdown |
+| Non-printable control characters | Unexpected control bytes that may confuse parsers |
+
+If any of the above are found, the script:
+1. **Posts a warning comment** to the issue or PR, listing every finding and
+   linking back to the workflow run that caught it.
+2. **Exits with a non-zero code**, failing the workflow job immediately so that
+   no further processing occurs on the untrusted content.
+
+## How to Use the Script in a Workflow
+
+Add a validation step **before** any step that reads or processes the untrusted
+input. The step must run after the repository is checked out (so the script file
+is available), and it needs a `GH_TOKEN` with write access to post comments.
+
+```yaml
+- name: Validate <input source> for hidden content
+  env:
+    GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+    INPUT_TEXT: ${{ github.event.issue.body }}   # ← the untrusted text
+    ITEM_NUMBER: ${{ github.event.issue.number }} # ← issue or PR number
+    REPO: ${{ github.repository }}
+    RUN_ID: ${{ github.run_id }}
+    SERVER_URL: ${{ github.server_url }}
+    CONTEXT_TYPE: issue          # "issue" or "pr"
+    FINDINGS_FILE: /tmp/validation-findings.txt
+  run: bash .github/workflows/validate-input.sh
+```
+
+For a pull request body, swap the event expressions and set `CONTEXT_TYPE: pr`:
+
+```yaml
+- name: Validate PR body for hidden content
+  env:
+    GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+    INPUT_TEXT: ${{ github.event.pull_request.body }}
+    ITEM_NUMBER: ${{ github.event.pull_request.number }}
+    REPO: ${{ github.repository }}
+    RUN_ID: ${{ github.run_id }}
+    SERVER_URL: ${{ github.server_url }}
+    CONTEXT_TYPE: pr
+    FINDINGS_FILE: /tmp/validation-findings.txt
+  run: bash .github/workflows/validate-input.sh
+```
+
+## Deciding Whether a Workflow Needs Validation
+
+Apply the validation step when **all** of the following are true:
+
+1. The workflow is triggered by a user-controllable event:
+   `issues`, `issue_comment`, `pull_request`, `pull_request_review`,
+   `pull_request_review_comment`, `discussion`, `discussion_comment`, etc.
+2. The workflow reads a **text field** from the event payload that a user wrote:
+   `.body`, `.title`, `.comment.body`, `.review.body`, branch names, etc.
+3. That text is subsequently processed by an automated system (especially an AI).
+
+You do **not** need the script for:
+- Purely numeric fields like `issue.number` or `pull_request.number`.
+- Internal, trusted triggers (`workflow_dispatch` with controlled inputs,
+  `push` to protected branches, `schedule`, etc.).
+- Metadata-only fields like `pull_request.draft` or `label.name`.
+
+## Permissions
+
+The validation step requires the `issues: write` (or `pull-requests: write`)
+permission on the job so the `gh` CLI can post the warning comment:
+
+```yaml
+jobs:
+  my-job:
+    permissions:
+      issues: write      # needed to post the warning comment
+      contents: read
+```
+
+## Keeping the Script Up to Date
+
+If you discover a new class of hidden-character or injection attack not already
+covered, add a new detection block to `.github/workflows/validate-input.sh`
+under its clearly-labelled sections. Keep detection logic inside the Python
+heredoc so Unicode handling is reliable across all runners.
+
+Document any new threat type with:
+- A short comment explaining the attack and why it is dangerous.
+- An example of the Unicode code points or patterns being detected.
+- A human-readable finding message added to the `findings` list.
@@ -0,0 +1,28 @@
+---
+title: Check URLs Skill
+description: Scan TypeScript source files for hardcoded URLs and verify they resolve
+lastUpdated: 2026-03-18
+---
+
+# Check URLs Skill
+
+A GitHub Copilot Agent Skill that finds all hardcoded `http(s)://` URLs in the TypeScript source files and verifies that each one still resolves.
+
+## Files in This Directory
+
+- **SKILL.md** — Main skill file with YAML frontmatter and detailed instructions for the agent
+- **check-urls.js** — Node.js script that performs the scan and HTTP resolution checks
+- **README.md** — This file
+
+## Quick Usage
+
+```bash
+node .github/skills/check-urls/check-urls.js
+```
+
+## How to Invoke via Copilot
+
+Ask Copilot something like:
+- "Check that all hardcoded URLs in the source code still resolve"
+- "Are any of the links in the fluency hints broken?"
+- "Validate all URL links in the TypeScript files"
@@ -0,0 +1,52 @@
+---
+name: check-urls
+description: Find all hardcoded URLs in TypeScript source files and verify they resolve (return HTTP 2xx/3xx). Use when you want to validate that links in tips, hints, and documentation strings are still live.
+---
+
+# Check URLs Skill
+
+This skill scans all TypeScript source files for hardcoded `http://` and `https://` URLs and performs HTTP HEAD requests to verify each one resolves without a 4xx/5xx error.
+
+## When to Use This Skill
+
+Use this skill when you need to:
+- Validate links added to fluency hints or tips in `maturityScoring.ts`
+- Check that VS Code docs URLs, tech.hub.ms video links, or any other hardcoded URLs are still live
+- Audit the codebase after bulk URL changes to catch 404s before a release
+- Routinely health-check external references as part of a maintenance pass
+
+## Running the Check
+
+```bash
+node .github/skills/check-urls/check-urls.js
+```
+
+The script will:
+1. Recursively scan every `*.ts` file under `src/`
+2. Extract all unique `https?://...` URLs (strips trailing punctuation, skips template literals)
+3. Send an HTTP HEAD request to each URL (with a 10-second timeout)
+4. If HEAD returns any 4xx status, automatically retry with GET — some servers (e.g. intent URLs, social sharing endpoints) return 404/405 for HEAD but correctly respond to GET
+5. Print a summary showing ✅ OK, ⚠️ REDIRECT, or ❌ BROKEN for every URL
+6. Exit with code `1` if any URL returns a 4xx or 5xx status on both HEAD and GET, or times out
+
+## Interpreting Output
+
+| Symbol | Meaning |
+|--------|---------|
+| ✅ OK | 2xx response — URL is live |
+| ⚠️ REDIRECT | 3xx response — URL redirects; consider updating to the final destination |
+| ❌ BROKEN | 4xx/5xx or connection failure — URL must be fixed |
+
+## After Finding Broken URLs
+
+1. **404 on tech.hub.ms**: The slug may have changed or the page was removed. Check `https://tech.hub.ms` to find the replacement and update `src/maturityScoring.ts`.
+2. **404 on code.visualstudio.com**: The VS Code docs may have been reorganised. Search [VS Code docs](https://code.visualstudio.com/docs) for the relevant topic and update the link.
+3. **Timeout**: May be a transient network issue. Re-run the script to confirm before changing anything.
+4. After fixing, re-run `node .github/skills/check-urls/check-urls.js` to confirm all URLs resolve.
+5. Run `npm run compile` to confirm the TypeScript build still passes.
+
+## Files in This Directory
+
+- **SKILL.md** — This file; instructions for the skill
+- **check-urls.js** — Node.js script that performs the URL scan and resolution check
+- **README.md** — Short overview of the skill