lint: Add ast-grep check for visibility-macro annotations#1053
Draft
mkannwischer wants to merge 1 commit into
Draft
lint: Add ast-grep check for visibility-macro annotations#1053mkannwischer wants to merge 1 commit into
mkannwischer wants to merge 1 commit into
Conversation
Add an ast-grep lint for the MLD_INTERNAL_API / MLD_EXTERNAL_API /
MLD_API_QUALIFIER convention on externally-linked declarations under
mldsa/src/ (functions for now; data definitions not yet covered).
Rule lives in scripts/.ast-grep/mldsa-declarations.yml as a
multi-document YAML with two rules (function definition + function
prototype). Each rule matches on tree-sitter-c node kinds
(function_definition / declaration), excludes the three accepted
annotation macros, static functions, and assembly entry points
(*_asm, empty_cu_*).
scripts/lint gains a check-ast-grep step (placed right after the C
clang-format step); nix/util.nix's linters package picks up the
ast-grep binary. Invocation:
ast-grep scan --rule scripts/.ast-grep/mldsa-declarations.yml .
Rationale for ast-grep over alternatives considered:
- CodeQL: powerful but requires a full build + database; overkill for
a lint rule and pulls in a non-OSS CLI.
- Semgrep: C parser chokes on unknown macros and lacks sibling-
relationship matchers, forcing return-type enumeration and implicit
reliance on parser quirks.
- ast-grep: tree-sitter-c parses reliably; rule matches on AST node
kinds and names required annotations literally via regex.
Known limitations:
- Unlike CodeQL (which only sees compiled code), ast-grep scans all
source text, so declarations inside #if blocks (e.g. MLDSA_DEBUG,
native backend headers) are also flagged. Either annotate them or
scope the `files:` globs narrower in follow-up.
- A `static MLD_INLINE ...` combination parses with `static` outside
the function_definition AST node, so the rule falls back to a
text-regex and a `follows:` sibling check.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
hanno-becker
left a comment
There was a problem hiding this comment.
I don't like the description language as much as CodeQL, and there may be additional use cases where expressivity isn't high enough. But for now, I am happy to go with this lighter weight approach and shelve #1052. Thank you for investigating, @mkannwischer!
Could you also add a check for data tables?
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add an ast-grep lint for the MLD_INTERNAL_API / MLD_EXTERNAL_API / MLD_API_QUALIFIER convention on externally-linked declarations under mldsa/src/ (functions for now; data definitions not yet covered).
Rule lives in scripts/.ast-grep/mldsa-declarations.yml as a multi-document YAML with two rules (function definition + function prototype). Each rule matches on tree-sitter-c node kinds (function_definition / declaration), excludes the three accepted annotation macros, static functions, and assembly entry points (asm, empty_cu).
scripts/lint gains a check-ast-grep step; nix/util.nix's linters package picks up the ast-grep binary. Invocation:
Rationale for ast-grep over alternatives considered in this spike:
Known limitations:
files:globs narrower in follow-up.static MLD_INLINE ...combination parses withstaticoutside the function_definition AST node, so the rule falls back to a text regex to exclude those.Temporary CI trimming: