Skip to content

Bug: SCOPE_IDENTIFIER regex silently drops [N] suffixes from generate-block scope names #20

@miguel9554

Description

@miguel9554

Summary

The lexer's SCOPE_IDENTIFIER pattern does not allow [ or ], so scope names
produced by Verilog generate blocks — e.g. g_lane[0], g_lane[1] — are
truncated to g_lane. All array instances collapse to the same name, making it
impossible to distinguish them after parsing.

Root cause

File: src/VCDScanner.l, line 76

SCOPE_IDENTIFIER    [a-zA-Z_][a-zA-Z_0-9\(\)]*

The character class allows letters, digits, underscores, and parentheses, but
not square brackets. When the lexer is in the IN_SCOPE state and
encounters a token like g_lane[0], the rule on line 213:

<IN_SCOPE>{SCOPE_IDENTIFIER} {
    return VCDParser::parser::make_TOK_IDENTIFIER(std::string(yytext),loc);
}

matches only g_lane. The remaining [0] is then consumed by the catch-all
silent discard rule at line 383:

<*>.|\n {
    // DO nothing!
}

No warning or error is emitted. The information is silently lost.

Why this is spec non-conformant

IEEE Std 1800-2023, section 21.7.2.1 (page 688 of the standard), defines the
4-state VCD syntax as:

$scope [ scope_type scope_identifier ] $end
...
scope_identifier ::= { ASCII character }

The grammar specifies scope_identifier as any sequence of ASCII
characters
— there is no restriction on which characters are allowed.
Square brackets are valid ASCII and must be accepted.

Contrast this with SIGNAL_REFERENCE on line 77 of the same file, which
does include square brackets:

SIGNAL_REFERENCE    [a-zA-Z_][a-zA-Z_0-9\.\(\)\[\]]*

The omission in SCOPE_IDENTIFIER is inconsistent with both the spec and the
treatment of signal references in the same lexer.

Impact

Any VCD file generated from a design with generate loops will produce
scopes whose names include an array index, for example:

$scope module g_lane[0] $end
...
$scope module g_lane[1] $end
...
$scope module g_lane[2] $end

After parsing, all three scopes are stored with name == "g_lane". Any
downstream tool that relies on scope names to navigate or compare the
hierarchy (e.g. a VCD diff tool) will see duplicate names and cannot
correctly identify individual instances.

Suggested fix

Add \[\] to the SCOPE_IDENTIFIER character class in src/VCDScanner.l,
line 76:

-SCOPE_IDENTIFIER    [a-zA-Z_][a-zA-Z_0-9\(\)]*
+SCOPE_IDENTIFIER    [a-zA-Z_][a-zA-Z_0-9\(\)\[\]]*

This mirrors how SIGNAL_REFERENCE is already defined and brings the lexer
into conformance with the IEEE 1800-2023 spec.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions