-
Notifications
You must be signed in to change notification settings - Fork 2
feat(resources): improve CodeQL MaD extensions support #266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
data-douser
wants to merge
17
commits into
main
Choose a base branch
from
dd/mad-ql/1
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
17 commits
Select commit
Hold shift + click to select a range
2f1b6e3
feat(resources): add CodeQL MaD extensions support
data-douser e1287a3
Sync server/dist/**
data-douser 0c68c46
Register MaD library-modeling resources for rust and swift; address r…
Copilot 228b68f
Merge branch 'main' into dd/mad-ql/1
data-douser cffb3f8
Build(deps): bump actions/setup-go from 5.6.0 to 6.4.0 (#265)
dependabot[bot] b80df0d
Update MaD tuple format in overview resource
data-douser db62263
Fix invalid JSON Schema for `query_results_cache_retrieve` (#263)
Copilot 8a63678
Build(deps): bump actions/setup-node from 6.3.0 to 6.4.0 (#264)
dependabot[bot] d74ee0f
Update MaD tuple format in overview resource
data-douser 46fdc62
Merge branch 'dd/mad-ql/1' of github.com:advanced-security/codeql-dev…
data-douser 4da5417
feat(resources): add CodeQL MaD extensions support
data-douser 9787e0a
Sync server/dist/**
data-douser b17d0f6
Register MaD library-modeling resources for rust and swift; address r…
Copilot 4a69c81
Update MaD tuple format in overview resource
data-douser 93ab431
Rebuild server/dist/codeql-development-mcp-server.js.map
data-douser b573add
Merge branch 'dd/mad-ql/1' of github.com:advanced-security/codeql-dev…
data-douser 0a37154
wire data-extension-development prompt and fix type errors
data-douser File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
149 changes: 149 additions & 0 deletions
149
server/src/prompts/data-extension-development.prompt.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,149 @@ | ||
| --- | ||
| agent: agent | ||
| --- | ||
|
|
||
| # Data Extension Development Workflow | ||
|
|
||
| Use this workflow to create CodeQL data extensions (Models-as-Data) for third-party libraries and frameworks. Data extensions let you customize taint tracking without writing QL code — you author YAML files that declare which functions are sources, sinks, summaries, barriers, or barrier guards. | ||
|
|
||
| For format reference, read the MCP resource: `codeql://learning/data-extensions` | ||
| For language-specific guidance, read the corresponding `codeql://languages/<language>/library-modeling` resource. Available for: `cpp`, `csharp`, `go`, `java`, `javascript`, `python`, `ruby`, `rust`, `swift`. | ||
|
|
||
| ## Workflow Checklist | ||
|
|
||
| ### Phase 1: Identify the Target | ||
|
|
||
| - [ ] **Confirm the target library and language** | ||
| - Library name and version: {{libraryName}} | ||
| - Target language: {{language}} | ||
| - Determine the model format: | ||
| - **MaD tuple format** (9–10 column tuples): C/C++ (`codeql/cpp-all`), C# (`codeql/csharp-all`), Go (`codeql/go-all`), Java/Kotlin (`codeql/java-all`), Swift (`codeql/swift-all`) | ||
| - **API Graph format** (3–5 column tuples): JavaScript/TypeScript (`codeql/javascript-all`), Python (`codeql/python-all`), Ruby (`codeql/ruby-all`) | ||
| - **Rust format**: Rust (`codeql/rust-all`) uses its own crate-path-based model format; follow `codeql://languages/rust/library-modeling` | ||
| - Using the wrong format will cause the extension to silently fail to load. | ||
|
|
||
| - [ ] **Locate a CodeQL database** | ||
| - Tool: #list_codeql_databases | ||
| - Or create one: #codeql_database_create | ||
| - The database must contain code that exercises the target library | ||
|
|
||
| - [ ] **Explore the library's API surface** | ||
| - Tool: #read_database_source — browse source files to identify relevant API calls | ||
| - Tool: #codeql_query_run with `queryName="PrintAST"` — visualize how library calls are represented | ||
| - Skim the library's public API docs, type stubs, or source code | ||
|
|
||
| ### Phase 2: Classify the API Surface | ||
|
|
||
| For each public function or method on the library, classify it: | ||
|
|
||
| 1. **Does it return data from outside the program** (network, file, env, stdin)? → `sourceModel` with `kind` matching the threat model (usually `"remote"`) | ||
| 2. **Does it consume data in a security-sensitive operation** (SQL, exec, path, redirect, eval, deserialize)? → `sinkModel` with `kind` matching the vulnerability class (e.g. `"sql-injection"`, `"command-injection"`) | ||
| 3. **Does it pass data through opaque library code** (encode, decode, wrap, copy, iterate)? → `summaryModel` with `kind: "taint"` (derived) or `kind: "value"` (identity) | ||
| 4. **Does it sanitize data so its output is safe for a specific sink kind?** → `barrierModel` with `kind` matching the sink kind it neutralizes | ||
| 5. **Does it return a boolean indicating whether data is safe?** → `barrierGuardModel` with the appropriate `acceptingValue` (`"true"` or `"false"`) and matching `kind` | ||
| 6. **Is the type a subclass of something already modeled?** → `typeModel` (API Graph languages) or set `subtypes: True` (MaD tuple languages) | ||
| 7. **Did the auto-generated model assign a wrong summary?** → `neutralModel` to suppress it | ||
|
|
||
| A complete chain of **source → (summary\*) → sink** is required for end-to-end findings; missing a single hop will cause false negatives. | ||
|
|
||
| ### Phase 3: Choose the Deployment Scope | ||
|
|
||
| Choose between two paths: | ||
|
|
||
| - **Single-repo shortcut** — drop `.model.yml` files under `.github/codeql/extensions/<pack-name>/` in the consuming repo. **No `codeql-pack.yml` is required**; Code Scanning auto-loads extensions from this directory. Use when the models only need to apply to one repo. | ||
| - **Reusable model pack** — create a pack directory with a `codeql-pack.yml` declaring `extensionTargets` and `dataExtensions`. Use when models will be consumed by multiple repos or by org-wide Default Setup. | ||
|
|
||
| ### Phase 4: Author the `.model.yml` File(s) | ||
|
|
||
| - [ ] **Create the model file** | ||
| - Use naming convention `<library>-<module>.model.yml` (lowercase, hyphen-separated) | ||
| - Split per logical module rather than putting an entire ecosystem in one file | ||
| - Read `codeql://languages/{{language}}/library-modeling` for the exact column layout and examples | ||
|
|
||
| - [ ] **Write the YAML with correct extensible predicates** | ||
|
|
||
| ```yaml | ||
| extensions: | ||
| - addsTo: | ||
| pack: codeql/{{language}}-all | ||
| extensible: sinkModel | ||
| data: | ||
| # Add tuples here — column count must exactly match the predicate schema | ||
| - [...] | ||
| ``` | ||
|
|
||
| - Every row must have the **exact column count** for its extensible predicate — an invalid row will fail silently or cause errors | ||
| - Use `provenance: 'manual'` (MaD format) for hand-written rows | ||
| - Ensure `kind` values match across the chain (e.g. a `"sql-injection"` barrier must guard a `"sql-injection"` sink) | ||
|
|
||
| ### Phase 5: Configure `codeql-pack.yml` (Model-Pack Path Only) | ||
|
|
||
| Skip this step if you chose the `.github/codeql/extensions/` shortcut in Phase 3. | ||
|
|
||
| For a reusable pack, create or update `codeql-pack.yml`: | ||
|
|
||
| ```yaml | ||
| name: <org>/<language>-<pack-name> | ||
| version: 0.0.1 | ||
| library: true | ||
| extensionTargets: | ||
| codeql/<language>-all: '*' | ||
| dataExtensions: | ||
| - models/**/*.yml | ||
| ``` | ||
|
|
||
| - `library: true` — model packs are always libraries, never queries | ||
| - `extensionTargets` — names the upstream pack the extensions extend | ||
| - `dataExtensions` — a glob that picks up every `.model.yml` you author | ||
|
|
||
| - [ ] **Install pack dependencies** | ||
| - Tool: #codeql_pack_install — resolve dependencies for the model pack | ||
|
|
||
| ### Phase 6: Test with `codeql query run` | ||
|
|
||
| Validate the model against a real database: | ||
|
|
||
| - [ ] **Run a relevant security query with the extension applied** | ||
| - Tool: #codeql_query_run | ||
| - Pass the model pack directory via the `additionalPacks` parameter | ||
| - Pick a query whose sink kind matches what you modeled (e.g. a `sql-injection` query when adding SQL sinks) | ||
| - Decode results: #codeql_bqrs_decode or #codeql_bqrs_interpret | ||
|
|
||
| - [ ] **Verify expected findings appear** | ||
| - New sources/sinks should produce findings that were absent without the extension | ||
| - Barriers/barrier guards should suppress findings that were previously reported | ||
|
|
||
| ### Phase 7: Run Unit Tests with `codeql test run` | ||
|
|
||
| - [ ] **Create a test case for the extension** | ||
| - Write a small test file that exercises the new source/sink/summary chain end-to-end | ||
| - Include both positive cases (vulnerable code detected) and negative cases (safe code not flagged) | ||
|
|
||
| - [ ] **Run the tests** | ||
| - Tool: #codeql_test_run | ||
| - Pass the model pack directory via the `additionalPacks` parameter | ||
| - Note: `codeql test run` does **not** accept `--model-packs`; extensions must be wired via `codeql-pack.yml` or `--additional-packs` | ||
|
|
||
| - [ ] **Accept correct results** | ||
| - Tool: #codeql_test_accept — accept the `.actual` output as the `.expected` baseline once you confirm it is correct | ||
|
|
||
| ### Phase 8: Decide Next Steps | ||
|
|
||
| - If the `.model.yml` lives under `.github/codeql/extensions/` of the consuming repo, you are **done** — Code Scanning will load it on the next analysis. | ||
| - If you authored a reusable model pack and want it to apply across an organization, publish it to GHCR with `codeql pack publish` and configure it under org Code security → Global settings → CodeQL analysis → Model packs. | ||
|
|
||
| ## Validation Checklist | ||
|
|
||
| - [ ] Correct tuple format for the language (API Graph vs MaD) | ||
| - [ ] Every row has the exact column count for its extensible predicate | ||
| - [ ] Sink/barrier `kind` values match across the chain | ||
| - [ ] At least one end-to-end test exercises the new model and produces expected findings | ||
| - [ ] `codeql-pack.yml` `dataExtensions` glob actually matches the new files | ||
| - [ ] No regressions in pre-existing tests under the same pack | ||
|
|
||
| ## Related Resources | ||
|
|
||
| - `codeql://learning/data-extensions` — Common data extensions overview (both model formats) | ||
| - `codeql://languages/{{language}}/library-modeling` — Language-specific library modeling guide | ||
| - `codeql://templates/security` — Security query templates | ||
| - `codeql://learning/test-driven-development` — TDD workflow for CodeQL queries |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.