Skip to content

hrw4u: Add AST for static analysis and codegen#13126

Open
juanthropic wants to merge 5 commits intoapache:masterfrom
juanthropic:hrw4u-linter-ast
Open

hrw4u: Add AST for static analysis and codegen#13126
juanthropic wants to merge 5 commits intoapache:masterfrom
juanthropic:hrw4u-linter-ast

Conversation

@juanthropic
Copy link
Copy Markdown

Summary

Adds a semantic AST and visitor for the hrw4u tool, converting
the ANTLR parse tree into typed, immutable Python dataclasses
that downstream visitors can operate on, including static
analysis, code generation, and a future user-configurable
policy linter.

The ANTLR parse tree mirrors the grammar structure, not the
domain, so every consumer must navigate intermediate rule nodes,
punctuation tokens, and context wrappers. The AST strips that
away and exposes typed, semantic nodes (e.g. Comparison.operator,
IfBlock.body). Only ASTVisitor needs to understand the parse
tree, so grammar changes are isolated to one file and all
downstream visitors stay insulated.

  • ast_nodes.py — Frozen dataclasses for all hrw4u constructs:
    assignments, function calls, comparisons, logical/negation
    operators, if/elif/else blocks, var/use/procedure declarations,
    and sections. Includes Target decomposition and type aliases
    (ConditionExpr, BodyNode, TopLevelNode).
  • ast_visitor.pyASTVisitor that walks the ANTLR parse tree
    and produces an HRW4UAST. Only visitProgram is overridden
    from the ANTLR base; all other dispatch is internal. Raises
    ValueError on unhandled grammar alternatives to catch
    visitor-grammar drift early.

Test plan

  • test_ast_nodes.py — Unit tests for Target.from_dotted
    parsing, node construction, and immutability.
  • test_ast_visitor.py — Integration tests covering the full
    source-to-AST pipeline: sections, all statement types, every
    condition operator, if/elif/else nesting, real config patterns
    (IP ranges, set membership, WITH modifiers, boolean coercion),
    line number tracking across all 17 node types, and error
    handling.

Frozen dataclasses representing the semantic AST that a
visitor produces from the ANTLR parse tree. Includes Target
decomposition (namespace/field/modifier), all statement nodes
(Assignment, FunctionCall, BreakStatement, StandaloneOperator),
condition expression nodes (Comparison, LogicalOp, Negation),
control flow (IfBlock, ElifBranch), and top-level constructs
(VarDecl, UseDecl, ProcedureDecl, Section).

Type aliases ConditionExpr, BodyNode, and TopLevelNode provide
convenience unions. Tests cover Target.from_dotted parsing, node
construction, and immutability.
ASTVisitor walks the ANTLR parse tree and produces HRW4UAST.
Handles named sections, assignments (= and +=), function calls,
break statements, standalone operators, condition expressions
(comparisons, logical operators, negation, set membership, IP
ranges, WITH modifiers), if/elif/else blocks with arbitrary
nesting, and top-level var/use/procedure declarations.

Only visitProgram is overridden from the ANTLR visitor base
class; all other dispatch is internal, keeping the public API
surface minimal. Raises ValueError for unhandled grammar
alternatives to surface visitor-grammar drift early.

Makefile updated to include ast_nodes.py and ast_visitor.py
in the build copy step.
Integration tests covering the full visitor pipeline from
source text to AST nodes. Tests are organized by concern:
sections and simple statements, condition expressions (all
operators, logical combinators, negation, parenthesized
grouping), if/elif/else blocks with nesting, real config
patterns (nested conditionals, boolean coercion, IP ranges,
set membership with modifiers, exact match patterns), line
number tracking across all 17 node types, and error handling
for unhandled grammar alternatives.
@juanthropic juanthropic changed the title hrw4u: Add AST layer for static analysis and code generation hrw4u: Add AST layer for static analysis and codegen Apr 29, 2026
@juanthropic juanthropic changed the title hrw4u: Add AST layer for static analysis and codegen hrw4u: Add AST for static analysis and codegen Apr 29, 2026
@zwoop zwoop requested a review from Copilot April 29, 2026 19:33
@zwoop
Copy link
Copy Markdown
Contributor

zwoop commented Apr 29, 2026

[approve ci]

@zwoop zwoop added the hrw4u label Apr 29, 2026
@zwoop zwoop added this to the 11.0.0 milestone Apr 29, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a semantic, typed AST layer to hrw4u by introducing immutable dataclass nodes and a dedicated ANTLR parse-tree-to-AST visitor, enabling downstream static analysis and future code generation/linting to operate on domain-level constructs instead of grammar-shaped parse trees.

Changes:

  • Added frozen dataclass AST node definitions for core hrw4u constructs (sections, statements, condition expressions, vars, procedures).
  • Added an ASTVisitor that walks the ANTLR parse tree and produces an HRW4UAST.
  • Added unit/integration tests covering AST construction, parsing behavior, and line number tracking.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tools/hrw4u/src/ast_nodes.py Introduces immutable dataclass AST node model and type aliases.
tools/hrw4u/src/ast_visitor.py Adds parse-tree visitor that builds the semantic AST from ANTLR contexts.
tools/hrw4u/tests/test_ast_nodes.py Unit tests for Target.from_dotted and basic node behavior.
tools/hrw4u/tests/test_ast_visitor.py Integration-style tests for source → AST across many constructs and precedence cases.
tools/hrw4u/Makefile Ensures new AST modules are included in the hrw4u package build/copy process.

Comment on lines +45 to +56
@dataclass(frozen=True, kw_only=True)
class Assignment(Node):
target: Target
operator: str # "=" or "+="
value: str | int | bool | tuple


@dataclass(frozen=True, kw_only=True)
class FunctionCall(Node):
name: str
args: tuple[str | int | bool, ...]

Copy link
Copy Markdown
Author

@juanthropic juanthropic Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d7f20b6. Added Value dataclass with ValueKind enum to preserve semantic distinction between string literals, identifiers, param refs, IPs, and regexes.

Comment thread tools/hrw4u/src/ast_visitor.py Outdated
Comment on lines +144 to +161
def _extract_value(self, ctx):
if ctx.number is not None:
return int(ctx.number.text)
if ctx.str_ is not None:
return ctx.str_.text[1:-1]
if ctx.TRUE():
return True
if ctx.FALSE():
return False
if ctx.ident is not None:
return ctx.ident.text
if ctx.ip():
return ctx.ip().getText()
if ctx.iprange():
return tuple(ip.getText() for ip in ctx.iprange().ip())
if ctx.paramRef():
return ctx.paramRef().getText()
return ctx.getText()
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the same commit as above — _extract_value now returns tagged Value instances instead of bare str.

Comment on lines +43 to +60
class ASTVisitor(hrw4uVisitor):
"""ANTLR visitor that walks an HRW4U parse tree and produces an AST for HRW4U."""

# Only visitProgram is overridden from the ANTLR visitor interface;
# all other traversal uses private _visit_* helpers so that each
# method has an explicit return type and full control over how
# child results are assembled into parent AST nodes.

def visitProgram(self, ctx):
items = []
for item in ctx.programItem():
if item.useDirective() is not None:
items.append(self._visit_use_directive(item.useDirective()))
elif item.procedureDecl() is not None:
items.append(self._visit_procedure_decl(item.procedureDecl()))
elif item.section() is not None:
items.append(self._visit_section(item.section()))
return HRW4UAST(body=tuple(items))
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither visitor.py nor kg_visitor.py annotate their ANTLR visit methods, and the generated hrw4uVisitor base class is untyped. Adding annotations here would be inconsistent with the codebase. The private _visit_* helpers have clear return types via the AST node constructors they call.

elif item.procedureDecl() is not None:
items.append(self._visit_procedure_decl(item.procedureDecl()))
elif item.section() is not None:
items.append(self._visit_section(item.section()))
Copy link
Copy Markdown
Author

@juanthropic juanthropic Apr 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in af33b4d. Comments are now explicitly skipped, and unrecognized programItem alternatives raise ValueError to catch grammar drift.

Introduce ValueKind enum and Value dataclass to preserve
semantic distinction between string literals, identifiers,
param refs, IPs, and regexes in the AST. Without this, the
codegen visitor cannot re-emit values correctly since
_extract_value was collapsing all string-like values into
bare Python str.
Skip comments intentionally and raise on unrecognized
programItem alternatives to catch visitor/grammar drift.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants