Skip to content

lint: Add ast-grep check for visibility-macro annotations#1053

Draft
mkannwischer wants to merge 1 commit into
mainfrom
ast-grep
Draft

lint: Add ast-grep check for visibility-macro annotations#1053
mkannwischer wants to merge 1 commit into
mainfrom
ast-grep

Conversation

@mkannwischer
Copy link
Copy Markdown
Contributor

Add an ast-grep lint for the MLD_INTERNAL_API / MLD_EXTERNAL_API / MLD_API_QUALIFIER convention on externally-linked declarations under mldsa/src/ (functions for now; data definitions not yet covered).

Rule lives in scripts/.ast-grep/mldsa-declarations.yml as a multi-document YAML with two rules (function definition + function prototype). Each rule matches on tree-sitter-c node kinds (function_definition / declaration), excludes the three accepted annotation macros, static functions, and assembly entry points (asm, empty_cu).

scripts/lint gains a check-ast-grep step; nix/util.nix's linters package picks up the ast-grep binary. Invocation:

ast-grep scan --rule scripts/.ast-grep/mldsa-declarations.yml .

Rationale for ast-grep over alternatives considered in this spike:

  • CodeQL: powerful but requires a full build + database; overkill for a lint rule and pulls in a non-OSS CLI.
  • Semgrep: C parser chokes on unknown macros and lacks sibling- relationship matchers, forcing return-type enumeration and implicit reliance on parser quirks.
  • ast-grep: tree-sitter-c parses reliably; rule matches on AST node kinds and names required annotations literally via regex.

Known limitations:

  • Unlike CodeQL (which only sees compiled code), ast-grep scans all source text, so declarations inside #if blocks (e.g. MLDSA_DEBUG, native backend headers) are also flagged. Either annotate them or scope the files: globs narrower in follow-up.
  • A static MLD_INLINE ... combination parses with static outside the function_definition AST node, so the rule falls back to a text regex to exclude those.

Temporary CI trimming:

  • .github/workflows/all.yml has every non-base job commented out so this branch validates the lint integration in isolation. Restore before merge.

Add an ast-grep lint for the MLD_INTERNAL_API / MLD_EXTERNAL_API /
MLD_API_QUALIFIER convention on externally-linked declarations under
mldsa/src/ (functions for now; data definitions not yet covered).

Rule lives in scripts/.ast-grep/mldsa-declarations.yml as a
multi-document YAML with two rules (function definition + function
prototype).  Each rule matches on tree-sitter-c node kinds
(function_definition / declaration), excludes the three accepted
annotation macros, static functions, and assembly entry points
(*_asm, empty_cu_*).

scripts/lint gains a check-ast-grep step (placed right after the C
clang-format step); nix/util.nix's linters package picks up the
ast-grep binary.  Invocation:

    ast-grep scan --rule scripts/.ast-grep/mldsa-declarations.yml .

Rationale for ast-grep over alternatives considered:
- CodeQL: powerful but requires a full build + database; overkill for
  a lint rule and pulls in a non-OSS CLI.
- Semgrep: C parser chokes on unknown macros and lacks sibling-
  relationship matchers, forcing return-type enumeration and implicit
  reliance on parser quirks.
- ast-grep: tree-sitter-c parses reliably; rule matches on AST node
  kinds and names required annotations literally via regex.

Known limitations:
- Unlike CodeQL (which only sees compiled code), ast-grep scans all
  source text, so declarations inside #if blocks (e.g. MLDSA_DEBUG,
  native backend headers) are also flagged.  Either annotate them or
  scope the `files:` globs narrower in follow-up.
- A `static MLD_INLINE ...` combination parses with `static` outside
  the function_definition AST node, so the rule falls back to a
  text-regex and a `follows:` sibling check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@hanno-becker hanno-becker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like the description language as much as CodeQL, and there may be additional use cases where expressivity isn't high enough. But for now, I am happy to go with this lighter weight approach and shelve #1052. Thank you for investigating, @mkannwischer!

Could you also add a check for data tables?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants