hrw4u: Add AST for static analysis and codegen#13126
Open
juanthropic wants to merge 5 commits intoapache:masterfrom
Open
hrw4u: Add AST for static analysis and codegen#13126juanthropic wants to merge 5 commits intoapache:masterfrom
juanthropic wants to merge 5 commits intoapache:masterfrom
Conversation
Frozen dataclasses representing the semantic AST that a visitor produces from the ANTLR parse tree. Includes Target decomposition (namespace/field/modifier), all statement nodes (Assignment, FunctionCall, BreakStatement, StandaloneOperator), condition expression nodes (Comparison, LogicalOp, Negation), control flow (IfBlock, ElifBranch), and top-level constructs (VarDecl, UseDecl, ProcedureDecl, Section). Type aliases ConditionExpr, BodyNode, and TopLevelNode provide convenience unions. Tests cover Target.from_dotted parsing, node construction, and immutability.
ASTVisitor walks the ANTLR parse tree and produces HRW4UAST. Handles named sections, assignments (= and +=), function calls, break statements, standalone operators, condition expressions (comparisons, logical operators, negation, set membership, IP ranges, WITH modifiers), if/elif/else blocks with arbitrary nesting, and top-level var/use/procedure declarations. Only visitProgram is overridden from the ANTLR visitor base class; all other dispatch is internal, keeping the public API surface minimal. Raises ValueError for unhandled grammar alternatives to surface visitor-grammar drift early. Makefile updated to include ast_nodes.py and ast_visitor.py in the build copy step.
Integration tests covering the full visitor pipeline from source text to AST nodes. Tests are organized by concern: sections and simple statements, condition expressions (all operators, logical combinators, negation, parenthesized grouping), if/elif/else blocks with nesting, real config patterns (nested conditionals, boolean coercion, IP ranges, set membership with modifiers, exact match patterns), line number tracking across all 17 node types, and error handling for unhandled grammar alternatives.
Contributor
|
[approve ci] |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds a semantic, typed AST layer to hrw4u by introducing immutable dataclass nodes and a dedicated ANTLR parse-tree-to-AST visitor, enabling downstream static analysis and future code generation/linting to operate on domain-level constructs instead of grammar-shaped parse trees.
Changes:
- Added frozen dataclass AST node definitions for core hrw4u constructs (sections, statements, condition expressions, vars, procedures).
- Added an
ASTVisitorthat walks the ANTLR parse tree and produces anHRW4UAST. - Added unit/integration tests covering AST construction, parsing behavior, and line number tracking.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/hrw4u/src/ast_nodes.py | Introduces immutable dataclass AST node model and type aliases. |
| tools/hrw4u/src/ast_visitor.py | Adds parse-tree visitor that builds the semantic AST from ANTLR contexts. |
| tools/hrw4u/tests/test_ast_nodes.py | Unit tests for Target.from_dotted and basic node behavior. |
| tools/hrw4u/tests/test_ast_visitor.py | Integration-style tests for source → AST across many constructs and precedence cases. |
| tools/hrw4u/Makefile | Ensures new AST modules are included in the hrw4u package build/copy process. |
Introduce ValueKind enum and Value dataclass to preserve semantic distinction between string literals, identifiers, param refs, IPs, and regexes in the AST. Without this, the codegen visitor cannot re-emit values correctly since _extract_value was collapsing all string-like values into bare Python str.
Skip comments intentionally and raise on unrecognized programItem alternatives to catch visitor/grammar drift.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a semantic AST and visitor for the hrw4u tool, converting
the ANTLR parse tree into typed, immutable Python dataclasses
that downstream visitors can operate on, including static
analysis, code generation, and a future user-configurable
policy linter.
The ANTLR parse tree mirrors the grammar structure, not the
domain, so every consumer must navigate intermediate rule nodes,
punctuation tokens, and context wrappers. The AST strips that
away and exposes typed, semantic nodes (e.g.
Comparison.operator,IfBlock.body). OnlyASTVisitorneeds to understand the parsetree, so grammar changes are isolated to one file and all
downstream visitors stay insulated.
ast_nodes.py— Frozen dataclasses for all hrw4u constructs:assignments, function calls, comparisons, logical/negation
operators, if/elif/else blocks, var/use/procedure declarations,
and sections. Includes
Targetdecomposition and type aliases(
ConditionExpr,BodyNode,TopLevelNode).ast_visitor.py—ASTVisitorthat walks the ANTLR parse treeand produces an
HRW4UAST. OnlyvisitProgramis overriddenfrom the ANTLR base; all other dispatch is internal. Raises
ValueErroron unhandled grammar alternatives to catchvisitor-grammar drift early.
Test plan
test_ast_nodes.py— Unit tests forTarget.from_dottedparsing, node construction, and immutability.
test_ast_visitor.py— Integration tests covering the fullsource-to-AST pipeline: sections, all statement types, every
condition operator, if/elif/else nesting, real config patterns
(IP ranges, set membership, WITH modifiers, boolean coercion),
line number tracking across all 17 node types, and error
handling.