diff --git a/.gitignore b/.gitignore index 5fbd184..97b7c17 100644 --- a/.gitignore +++ b/.gitignore @@ -3,4 +3,7 @@ dist/ client/out/ server/out/ *.vsix -.env \ No newline at end of file +.env +tests/_* +docs/ +.vscode/ \ No newline at end of file diff --git a/.vscodeignore b/.vscodeignore index 3d5debb..9d47de6 100644 --- a/.vscodeignore +++ b/.vscodeignore @@ -24,3 +24,4 @@ package-lock.json .gitattributes .env .env.* +docs/ diff --git a/CHANGELOG.md b/CHANGELOG.md index 8de8092..71f616b 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,22 @@ All notable changes to the **Bison/Flex Language Support** extension will be documented in this file. +## [1.5.1] - 2026-04-02 + +### Fixed + +- **Flex — escaped quotes in quoted string patterns** (#30): patterns like `X"\'"` and `Y"\""` no longer trigger false `flex/invalid-pattern` errors. The validator now correctly handles `\"` and `\'` escape sequences inside Flex quoted strings. +- **Flex — abbreviation refs on rule lines with no inline action** (#31): `{ABBR}` used after a `^` BOL anchor or on a rule line whose action block appears on the following line was not recorded as an abbreviation reference, causing false `flex/unused-abbrev` warnings. +- **Flex — quoted strings with spaces in `rawPattern`** (audit-A): patterns like `"hello world"` were truncated at the space inside the quoted literal, causing false `flex/unreachable-rule` duplicates for distinct patterns sharing a common word prefix. `rawPattern()` now tracks quoted-string depth. +- **Flex — standalone `{` as multi-line action opener** (audit-B): a `{` appearing alone on the line after a rule pattern (valid Flex multi-line action syntax) was pushed as a spurious rule entry with pattern `{`, producing false `flex/unreachable-rule` diagnostics for every subsequent multi-line-action rule. +- **Flex — lowercase start condition names** (audit-C): all start-condition regex patterns used `[A-Z_][A-Z0-9_]*` (uppercase only). SC names that are valid C identifiers but lowercase (e.g. `%x comment`) were silently ignored, skipping `flex/undefined-sc` and `flex/unused-sc` diagnostics for them entirely. +- **Flex — single-tab action separator in abbreviation ref scan** (audit-D): the heuristic that separates the pattern from the action used `\s{2,}`, which did not match a single-tab separator. `{identifier}` tokens inside the C action body (e.g. compound literals) were falsely counted as abbreviation references, suppressing `flex/unused-abbrev`. +- **Cleanup**: removed two dead entries in the catch-all pattern set that contained a literal newline character and could never match a rule line. +- **Bison — lowercase/mixed-case tokens in precedence declarations** (audit-E): `%left`/`%right`/`%nonassoc` used an uppercase-only regex `[A-Z_][A-Z0-9_]*`, silently dropping tokens like `kPLUS` or `tTOKEN` from the precedence table. This caused false `bison/undeclared-token` warnings and incorrect shift/reduce heuristic results for such tokens. +- **Bison — `$N` references after nested sub-blocks in inline actions** (audit-F): the `extractDollarRefs` scanner used `/\{([^}]*)\}/` which stops at the first `}`, missing `$N` references that appear after a nested `{ … }` block inside the same action (e.g. `{ if (cond) { log(); } $$ = $5; }`). Replaced with a brace-depth scanner; the same fix was applied to `extractSymbols`, `getFirstSymbol`, and `extractRuleReferences` for consistency. + +--- + ## [1.5.0] - 2026-04-01 ### Added @@ -11,7 +27,7 @@ All notable changes to the **Bison/Flex Language Support** extension will be doc - **`Bison/Flex: Show in Generated File`** — from a `.y` / `.l` source, locates the generated file (using `bisonFlex.buildDirectory` setting, CMake detection, Makefile detection, same-directory fallback, then workspace-wide search) and navigates to the matching line. A QuickPick is shown when multiple candidates are found. - New setting `bisonFlex.buildDirectory`: optional path to the build output directory, used by **Show in Generated File** to locate generated files when they are not in the same directory as the source. --- +--- ## [1.4.1] - 2026-03-31 diff --git a/README.md b/README.md index a6925c9..cc1e63d 100644 --- a/README.md +++ b/README.md @@ -39,6 +39,44 @@ Real-time error detection as you type: | Shift/reduce conflict heuristic | | | Unknown/invalid directive | | +Every diagnostic carries a **source** field (`bison` / `flex`), a **code slug** (e.g. `bison/unused-token`), and where available a **link** to the GNU documentation — rendered as a clickable `[bison/unused-token]` link in the Problems panel. Unused symbols are rendered greyed-out via `DiagnosticTag.Unnecessary`. + +### Fix-it Hints (Quick Fixes) + +22 code actions available via the lightbulb (`Ctrl+.`) or directly from the Problems panel: + +**Bison** (11 fixes): +- Insert missing `%%` separator +- Declare undeclared `%token` +- Insert `%empty` for empty production +- Remove unused token declaration +- Remove unknown directive +- Add rule stub for missing non-terminal +- Add `%type ` declaration +- Remove invalid `%start` / Add `%start` directive +- Close unclosed `%{` block +- Migrate Yacc legacy directives (`%error-verbose` → `%define parse.error verbose`, `%name-prefix`, `%pure-parser`, `%binary`) + +**Flex** (11 fixes): +- Insert missing `%%` separator +- Define abbreviation stub +- Remove unused abbreviation +- Remove unused start condition +- Remove unknown directive +- Declare `%x SC_NAME` for undefined start condition +- Remove unused `%option` +- Remove duplicate `<>` rule +- Add `%option noyywrap` +- Close unclosed `%{` block +- Remove inaccessible rule + +### Source ↔ Generated File Navigation + +Jump between Bison/Flex grammar sources and their generated C files using `#line` directives: + +- **Bison/Flex: Show in Source** — from a generated `.tab.c` / `lex.yy.c` file, reads the nearest `#line N "file.y"` directive above the cursor and opens the grammar source at the correct line. Appears in the context menu only when a generated file is detected. +- **Bison/Flex: Show in Generated File** — from a `.y` / `.l` source, locates the generated file and navigates to the matching line. Searches `bisonFlex.buildDirectory`, then CMake/Makefile detection, then the same directory, then a workspace-wide scan. A QuickPick is shown when multiple candidates are found. + ### Autocompletion Context-aware suggestions triggered as you type: @@ -185,6 +223,10 @@ Then press `F5` in VS Code to launch the Extension Development Host. | `bisonFlex.showInlayHints` | `boolean` | `true` | Show inline type hints for `$$`/`$1`/`@$` semantic values | | `bisonFlex.enableCodeLens` | `boolean` | `true` | Show reference counts and entry-point badges above rules | | `bisonFlex.enableCmakeDiagnostics` | `boolean` | `true` | Warn when a `.y`/`.l` file is not referenced in `CMakeLists.txt` | +| `bisonFlex.minVersionBison` | `string` | `""` | Suppress checks that require a newer Bison version (e.g. `"3.0"`). Fires `bison/feature-requires-version` when a `%define` feature exceeds this version. | +| `bisonFlex.minVersionFlex` | `string` | `""` | Same as above for Flex. | +| `bisonFlex.disabledChecks` | `array` | `[]` | Diagnostic code slugs to suppress entirely (e.g. `["bison/shift-reduce", "flex/missing-yywrap"]`). | +| `bisonFlex.buildDirectory` | `string` | `""` | Path to the build output directory. Used by **Show in Generated File** to locate `.tab.c` / `lex.yy.c` when they are not next to the source. | --- diff --git a/package-lock.json b/package-lock.json index f7fe2e9..a324e14 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "bison-flex-lang", - "version": "1.5.0", + "version": "1.5.1", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "bison-flex-lang", - "version": "1.1.2", + "version": "1.5.1", "license": "MIT", "dependencies": { "vscode-languageclient": "^9.0.1", diff --git a/package.json b/package.json index 2ee75b5..06bac3b 100644 --- a/package.json +++ b/package.json @@ -2,7 +2,7 @@ "name": "bison-flex-lang", "displayName": "Bison/Flex Language Support", "description": "Full-featured language support for GNU Bison (.y, .yy) and Flex/RE-flex (.l, .ll) — syntax highlighting with embedded C/C++, real-time diagnostics, intelligent autocompletion, and hover documentation for all directives.", - "version": "1.5.0", + "version": "1.5.1", "publisher": "theodevelop", "license": "MIT", "repository": { diff --git a/server/src/parser/bisonParser.ts b/server/src/parser/bisonParser.ts index 88ee6a6..092db28 100644 --- a/server/src/parser/bisonParser.ts +++ b/server/src/parser/bisonParser.ts @@ -166,7 +166,7 @@ export function parseBisonDocument(text: string): BisonDocument { const precMatch = trimmed.match(/^%(left|right|nonassoc|precedence)\s+(.*)/); if (precMatch) { const kind = precMatch[1] as PrecedenceDeclaration['kind']; - const rawSymbols = precMatch[2].match(/[A-Z_][A-Z0-9_]*|"[^"]*"/g) || []; + const rawSymbols = precMatch[2].match(/[A-Za-z_][A-Za-z0-9_]*|"[^"]*"/g) || []; const symbols: string[] = []; const symbolRanges: Range[] = []; for (const raw of rawSymbols) { @@ -389,6 +389,33 @@ function replaceStringLiterals(text: string): string { .replace(/'((?:[^'\\]|\\.)*)'/g, (_, content) => ` ${strLiteralPlaceholder(`'${content}'`)} `); } +/** + * Remove all brace-balanced { ... } blocks from `text`, replacing each with `replacement`. + * Handles arbitrarily nested braces, unlike /\{[^}]*\}/ which stops at the first `}`. + * Unmatched `{` without a closing `}` (e.g. a multi-line action opener on its own line) + * are left out of the result — the Phase 3 brace tracker handles them separately. + */ +function removeBalancedBraces(text: string, replacement: string = ' '): string { + let result = ''; + let depth = 0; + let pendingOpen = false; // true while inside a block that hasn't been closed yet + for (let i = 0; i < text.length; i++) { + if (text[i] === '{') { + if (depth === 0) pendingOpen = true; + depth++; + } else if (text[i] === '}') { + depth = Math.max(0, depth - 1); + if (depth === 0 && pendingOpen) { + result += replacement; // only emit placeholder when the block is fully closed + pendingOpen = false; + } + } else if (depth === 0) { + result += text[i]; + } + } + return result; +} + /** * Extract all grammar symbols (identifiers) from a production RHS in order. * @@ -399,8 +426,7 @@ function replaceStringLiterals(text: string): string { * `"("` apart from `"{"` (both have different placeholders). */ function extractSymbols(text: string): string[] { - const cleaned = replaceStringLiterals(text) - .replace(/\{[^}]*\}/g, ' __midaction__ ') // inline actions count as a symbol ($N position) + const cleaned = removeBalancedBraces(replaceStringLiterals(text), ' __midaction__ ') // inline actions count as a symbol ($N position) .replace(/%prec\s+\S+/g, ' ') // remove %prec TOKEN .replace(/%empty/g, ' ') // remove %empty .replace(/\/\/.*$/g, ' ') // remove line comments @@ -423,8 +449,7 @@ function extractSymbols(text: string): string[] { * `__s` (not all-caps) and is therefore not confused with a real terminal. */ function getFirstSymbol(text: string): string | undefined { - const cleaned = replaceStringLiterals(text) - .replace(/\{[^}]*\}/g, ' ') // remove inline actions + const cleaned = removeBalancedBraces(replaceStringLiterals(text)) // remove inline actions .replace(/%prec\s+\S+/g, ' ') // remove %prec TOKEN .replace(/%empty/g, ' ') // remove %empty .replace(/\/\/.*$/g, ' ') // remove line comments @@ -518,15 +543,27 @@ function parseTokenNames(text: string, type: string | undefined, lineNum: number /** * Scan the inline action block(s) on a single line for $n references. - * Only handles single-line { ... } blocks; multi-line actions are not detected here. + * Uses a brace-depth scanner so that $n references appearing after a nested + * sub-block (e.g. `{ if (x) { foo(); } $$ = $1; }`) are not missed. + * Only handles single-line { ... } blocks; multi-line actions are tracked by + * the caller (Phase 3 loop in parseBisonDocument). * $$ and $n are deliberately skipped. */ function extractDollarRefs(text: string, lineNum: number, fullLine: string): DollarRef[] { const refs: DollarRef[] = []; - const actionRegex = /\{([^}]*)\}/g; - let actionMatch: RegExpExecArray | null; - while ((actionMatch = actionRegex.exec(text)) !== null) { - const actionContent = actionMatch[1]; + let i = 0; + while (i < text.length) { + if (text[i] !== '{') { i++; continue; } + // Found the opening brace of an action block — scan to the matching '}' + let depth = 1; + let j = i + 1; + while (j < text.length && depth > 0) { + if (text[j] === '{') depth++; + else if (text[j] === '}') depth--; + j++; + } + // text[i+1 .. j-2] is the full content of this balanced action block + const actionContent = text.substring(i + 1, j - 1); const dollarRegex = /\$(\d+)/g; let m: RegExpExecArray | null; while ((m = dollarRegex.exec(actionContent)) !== null) { @@ -540,6 +577,7 @@ function extractDollarRefs(text: string, lineNum: number, fullLine: string): Dol range: Range.create(lineNum, col >= 0 ? col : 0, lineNum, (col >= 0 ? col : 0) + fullMatch.length), }); } + i = j; // advance past the entire balanced block } return refs; } @@ -579,9 +617,7 @@ function extractRuleReferences(text: string, lineNum: number, fullLine: string, // Find identifiers in rule bodies (potential token/nonterminal references) // Skip: strings, actions (braces), %prec keyword (but keep its token), %empty, comments - const cleaned = text - .replace(/"(?:[^"\\]|\\.)*"/g, '') // remove strings - .replace(/\{[^}]*\}/g, '') // remove inline actions + const cleaned = removeBalancedBraces(text.replace(/"(?:[^"\\]|\\.)*"/g, '')) // remove strings, then inline actions .replace(/%prec/g, '') // remove %prec keyword (keep the token name) .replace(/%empty/g, '') // remove %empty .replace(/\/\/.*$/g, ''); // remove line comments diff --git a/server/src/parser/flexParser.ts b/server/src/parser/flexParser.ts index 10af0b4..27652b2 100644 --- a/server/src/parser/flexParser.ts +++ b/server/src/parser/flexParser.ts @@ -234,7 +234,7 @@ export function parseFlexDocument(text: string): FlexDocument { if (closeIdx >= 0) { // Collect any additional SC names before the > const before = trimmed.substring(0, closeIdx); - const moreConds = before.match(/[A-Z_][A-Z0-9_]*/g); + const moreConds = before.match(/[A-Za-z_][A-Za-z0-9_]*/g); if (moreConds) pendingScHeader += ',' + moreConds.join(','); const conds = pendingScHeader.replace(/^,+/, '').split(',').filter(s => s.length > 0); pendingScHeader = null; @@ -246,7 +246,7 @@ export function parseFlexDocument(text: string): FlexDocument { } } else { // Still accumulating conditions from this line - const moreConds = trimmed.match(/[A-Z_][A-Z0-9_]*/g); + const moreConds = trimmed.match(/[A-Za-z_][A-Za-z0-9_]*/g); if (moreConds) pendingScHeader += ',' + moreConds.join(','); } continue; @@ -267,10 +267,19 @@ export function parseFlexDocument(text: string): FlexDocument { continue; } + // ── Multi-line action opener: bare `{` on its own line ──────────────────── + // In Flex, the action brace may appear on the line after the pattern. + // Treat a standalone `{` as the opening of a C action block, not a rule. + if (trimmed === '{') { + actionDepth = 1; + continue; + } + // ── SC block opener: { ─────────────────────────────────────────── // Single-line header: { or { + // SC names may be upper or lower case (any valid C identifier). { - const scBlockMatch = trimmed.match(/^<([A-Z_][A-Z0-9_]*(?:,[A-Z_][A-Z0-9_]*)*)>\s*\{/); + const scBlockMatch = trimmed.match(/^<([A-Za-z_][A-Za-z0-9_]*(?:,[A-Za-z_][A-Za-z0-9_]*)*)>\s*\{/); if (scBlockMatch) { const conds = scBlockMatch[1].split(','); scBlockStack.push(conds); @@ -284,7 +293,7 @@ export function parseFlexDocument(text: string): FlexDocument { continue; } // Multi-line header start: on this line) - const scMultiStart = trimmed.match(/^<([A-Z_][A-Z0-9_]*(?:,[A-Z_][A-Z0-9_]*)*,\s*)$/); + const scMultiStart = trimmed.match(/^<([A-Za-z_][A-Za-z0-9_]*(?:,[A-Za-z_][A-Za-z0-9_]*)*,\s*)$/); if (scMultiStart) { pendingScHeader = scMultiStart[1].replace(/,\s*$/, ''); continue; @@ -293,7 +302,8 @@ export function parseFlexDocument(text: string): FlexDocument { // ── Extract start condition references: or ──────────── // Exclude <> which is a special pattern, not a start condition - const scRefs = line.matchAll(/(?(?!>)/g); + // SC names may be upper or lower case (any valid C identifier). + const scRefs = line.matchAll(/(?(?!>)/g); for (const m of scRefs) { const conditions = m[1].split(','); for (const cond of conditions) { @@ -311,8 +321,11 @@ export function parseFlexDocument(text: string): FlexDocument { const abbrRefs = line.matchAll(/\{([a-zA-Z_][a-zA-Z0-9_]*)\}/g); for (const m of abbrRefs) { const name = m[1]; - // Only count as abbreviation ref if it appears before any action block on this line - const actionStart = line.indexOf('{', (line.match(/\s{2,}\{/) || { index: line.length }).index || line.length); + // Only count as abbreviation ref if it appears before any action block on this line. + // If there is no action { on this line (multi-line action), treat actionStart as line.length + // so all {name} refs on this line are counted. + const actionMatch = line.match(/\s+\{/); + const actionStart = actionMatch !== null ? line.indexOf('{', actionMatch.index!) : line.length; if (m.index !== undefined && m.index < actionStart) { const col = m.index; const range = Range.create(i, col, i, col + m[0].length); @@ -327,7 +340,7 @@ export function parseFlexDocument(text: string): FlexDocument { // Start conditions: explicit prefix on this line PLUS any inherited from { block const inherited = scBlockStack.length > 0 ? scBlockStack[scBlockStack.length - 1] : []; const startConditions: string[] = [...inherited]; - const scMatch = trimmed.match(/^<([A-Z_][A-Z0-9_]*(?:,[A-Z_][A-Z0-9_]*)*)>/); + const scMatch = trimmed.match(/^<([A-Za-z_][A-Za-z0-9_]*(?:,[A-Za-z_][A-Za-z0-9_]*)*)>/); if (scMatch) { for (const c of scMatch[1].split(',')) { if (!startConditions.includes(c)) startConditions.push(c); @@ -375,7 +388,7 @@ function parseOptions(text: string, lineNum: number, fullLine: string, doc: Flex } function parseStartConditions(text: string, exclusive: boolean, lineNum: number, fullLine: string, doc: FlexDocument): void { - const names = text.match(/[A-Z_][A-Z0-9_]*/g); + const names = text.match(/[A-Za-z_][A-Za-z0-9_]*/g); if (!names) return; for (const name of names) { const col = fullLine.indexOf(name); diff --git a/server/src/providers/diagnostics.ts b/server/src/providers/diagnostics.ts index 954bed4..f5ad159 100644 --- a/server/src/providers/diagnostics.ts +++ b/server/src/providers/diagnostics.ts @@ -637,31 +637,35 @@ export function computeFlexDiagnostics(doc: FlexDocument, text: string, settings * the first whitespace after the regex. */ const rawPattern = (pattern: string): string => { - // Remove optional or prefix - let p = pattern.replace(/^<[A-Z_*][A-Z0-9_,*]*>\s*/, '').trimStart(); - // Extract the pattern token, tracking [] bracket depth so spaces inside - // character classes (e.g. "[ \t\n]") are included, not treated as delimiters. + // Remove optional or prefix (SC names may be upper or lower case; * is the wildcard) + let p = pattern.replace(/^<[A-Za-z_*][A-Za-z0-9_,*]*>\s*/, '').trimStart(); + // Extract the pattern token, tracking [] bracket depth and "..." quoted strings + // so spaces inside character classes (e.g. "[ \t\n]") or quoted literals + // (e.g. "hello world") are included, not treated as delimiters. // Backslash-escape handling: \X consumes both chars as a unit. let result = ''; let depth = 0; + let inQuote = false; for (let i = 0; i < p.length; i++) { const ch = p[i]; if (ch === '\\') { - // Escaped char: consume both as-is (e.g. "\[" or "\\") + // Escaped char: consume both as-is (e.g. "\[" or "\\" or "\"") result += ch + (p[i + 1] ?? ''); i++; continue; } - if (ch === '[') { depth++; result += ch; continue; } - if (ch === ']' && depth > 0) { depth--; result += ch; continue; } - if ((ch === ' ' || ch === '\t') && depth === 0) break; + if (ch === '"' && !inQuote && depth === 0) { inQuote = true; result += ch; continue; } + if (ch === '"' && inQuote) { inQuote = false; result += ch; continue; } + if (ch === '[' && !inQuote) { depth++; result += ch; continue; } + if (ch === ']' && depth > 0 && !inQuote) { depth--; result += ch; continue; } + if ((ch === ' ' || ch === '\t') && depth === 0 && !inQuote) break; result += ch; } return result || p; }; // Catch-all patterns that would shadow everything after them - const CATCHALL_PATTERNS = new Set(['.', '.*', '.+', '.|\\n', '(.|\n)*', '(.|\n)+', '(.|\\n)*', '(.|\\n)+']); + const CATCHALL_PATTERNS = new Set(['.', '.*', '.+', '.|\\n', '(.|\\n)*', '(.|\\n)+']); // Track: first seen pattern per context (for duplicate detection) const seenPatterns = new Map(); // "context|pattern" -> line number of first occurrence @@ -844,7 +848,7 @@ function validateFlexRegex(pat: string): string | null { // Convert Flex-specific syntax → approximate JS regex let p = pat .replace(/\{[a-zA-Z_][a-zA-Z0-9_]*\}/g, 'x') // {abbr} → placeholder - .replace(/"([^"]*)"/g, (_, s) => // "str" → escaped literal + .replace(/"((?:[^"\\]|\\.)*)"/g, (_, s) => // "str" → escaped literal (handles \" inside) s.replace(/[.*+?^${}()|[\]\\]/g, '\\$&')) .replace(/\[:(alpha|upper|lower):\]/g, 'a-zA-Z') // POSIX classes (inside [...]) .replace(/\[:digit:\]/g, '0-9') diff --git a/tests/test-diagnostic-codes.ts b/tests/test-diagnostic-codes.ts index 7198388..e0ce43c 100644 --- a/tests/test-diagnostic-codes.ts +++ b/tests/test-diagnostic-codes.ts @@ -364,6 +364,24 @@ console.log('\n=== TEST: Bug regressions ==='); assert(oob5.length === 1, '#21 $6 IS out-of-bounds (5 symbols: A B {action} D {action2})'); } +// Issue #31 — false flex/unused-abbrev when abbreviation is used after ^ (BOL anchor) +// or in a pattern where the action is on the next line (no action { on the rule line). +{ + const src = '%option noyywrap\n%%\nAREA_A\t"#AREA_A"\n%%\n<*>^{AREA_A}[ ]* {\n /* ok */\n}\n%%\n'; + const doc = require('../server/src/parser/flexParser').parseFlexDocument(src); + const diags = computeFlexDiagnostics(doc, src); + const unused = diags.filter(d => d.code === 'flex/unused-abbrev'); + assert(unused.length === 0, '#31 abbreviation used after ^ BOL anchor must not produce flex/unused-abbrev'); +} +{ + // Multi-line action: action { on next line — abbrev ref on rule line must still be recorded + const src2 = '%option noyywrap\n%%\nWORD\t[a-z]+\n%%\n{WORD}\n{\n /* ok */\n}\n%%\n'; + const doc2 = require('../server/src/parser/flexParser').parseFlexDocument(src2); + const diags2 = computeFlexDiagnostics(doc2, src2); + const unused2 = diags2.filter(d => d.code === 'flex/unused-abbrev'); + assert(unused2.length === 0, '#31 abbreviation used on rule line with multi-line action must not produce flex/unused-abbrev'); +} + // Issue #23 — rules inside a { ... } block inherit the start condition. // A catch-all `.` in INITIAL should NOT shadow rules in an exclusive SC block. { @@ -377,6 +395,163 @@ console.log('\n=== TEST: Bug regressions ==='); assert(unreachable.length === 0, '#23 no false flex/unreachable-rule for exclusive SC block'); } +// Issue #30 — quoted strings containing escaped quotes: "\'" and "\"" must NOT +// produce flex/invalid-pattern. The "([^"]*)" replacement was stopping at the +// first " inside the escaped sequence, corrupting the character class that follows. +{ + // Pattern: X"\'"[^'\n]*"\'" (quoted single-quote literal on both sides) + const src1 = '%option noyywrap\n%%\nX"\\\'"\t[^\'\\n]*"\\\'"\t{}\n%%\n'; + const doc1 = require('../server/src/parser/flexParser').parseFlexDocument(src1); + const diags1 = computeFlexDiagnostics(doc1, src1); + const inv1 = diags1.filter(d => d.code === 'flex/invalid-pattern'); + assert(inv1.length === 0, '#30 pattern with escaped single-quote in quoted string must not produce flex/invalid-pattern'); + + // Pattern: X"\""[^"\n]*"\"" (quoted double-quote literal on both sides) + const src2 = '%option noyywrap\n%%\nX"\\""\t[^"\\n]*"\\""\t{}\n%%\n'; + const doc2 = require('../server/src/parser/flexParser').parseFlexDocument(src2); + const diags2 = computeFlexDiagnostics(doc2, src2); + const inv2 = diags2.filter(d => d.code === 'flex/invalid-pattern'); + assert(inv2.length === 0, '#30 pattern with escaped double-quote in quoted string must not produce flex/invalid-pattern'); +} + +// ───────────────────────────────────────────────────────────────────────────── +// 6. Audit / proactive checks +// ───────────────────────────────────────────────────────────────────────────── +console.log('\n=== TEST: Flex audit checks ==='); + +// Bug A — rawPattern() was stopping at the space inside a Flex quoted string, +// causing two rules like `"hello world"` and `"hello there"` to have the same +// rawPattern = `"hello` and be flagged as duplicate (false flex/unreachable-rule). +{ + const src = '%option noyywrap\n%%\n"hello world"\t{}\n"hello there"\t{}\n%%\n'; + const doc = require('../server/src/parser/flexParser').parseFlexDocument(src); + const diags = computeFlexDiagnostics(doc, src); + const dup = diags.filter(d => d.code === 'flex/unreachable-rule'); + assert(dup.length === 0, 'audit-A: two distinct quoted-string patterns with spaces must not produce flex/unreachable-rule'); +} +{ + // Same pattern twice → should still be flagged + const src = '%option noyywrap\n%%\n"hello world"\t{}\n"hello world"\t{}\n%%\n'; + const doc = require('../server/src/parser/flexParser').parseFlexDocument(src); + const diags = computeFlexDiagnostics(doc, src); + const dup = diags.filter(d => d.code === 'flex/unreachable-rule'); + assert(dup.length === 1, 'audit-A: identical quoted-string patterns with spaces must still produce flex/unreachable-rule'); +} + +// Bug B — a standalone `{` on its own line (multi-line action) was pushed as a +// rule with pattern `{`, causing false flex/unreachable-rule when multiple rules +// use multi-line action syntax. +{ + const src = '%option noyywrap\n%%\nWORD1\t[a-z]+\n%%\n{WORD1}\n{\n return 1;\n}\n%%\n'; + const doc = require('../server/src/parser/flexParser').parseFlexDocument(src); + const diags = computeFlexDiagnostics(doc, src); + const unreach = diags.filter(d => d.code === 'flex/unreachable-rule'); + assert(unreach.length === 0, 'audit-B: multi-line action `{` on own line must not produce flex/unreachable-rule'); +} +{ + // Multiple rules with multi-line actions: the spurious duplicate `{` rules + // would have caused false unreachable-rule diagnostics. + const src = '%option noyywrap\n%%\nA\t[a-z]+\nB\t[0-9]+\n%%\n{A}\n{\n return 1;\n}\n{B}\n{\n return 2;\n}\n%%\n'; + const doc = require('../server/src/parser/flexParser').parseFlexDocument(src); + const diags = computeFlexDiagnostics(doc, src); + const unreach = diags.filter(d => d.code === 'flex/unreachable-rule'); + assert(unreach.length === 0, 'audit-B: two rules with multi-line actions must not produce flex/unreachable-rule'); +} + +// Bug C — SC names are valid C identifiers and can be lowercase. +// %x comment / rule {} were silently ignored before this fix. +{ + const src = '%option noyywrap\n%x comment\n%%\n[a-z]+\t{}\n[^\n]*\t{}\n%%\n'; + const doc = require('../server/src/parser/flexParser').parseFlexDocument(src); + const diags = computeFlexDiagnostics(doc, src); + const undef = diags.filter(d => d.code === 'flex/undefined-sc'); + assert(undef.length === 0, 'audit-C: lowercase SC name declared with %x must not produce flex/undefined-sc'); + const unused = diags.filter(d => d.code === 'flex/unused-sc'); + assert(unused.length === 0, 'audit-C: lowercase SC name used in rule must not produce flex/unused-sc'); +} +{ + // Declared but never used → should still warn + const src = '%option noyywrap\n%x comment\n%%\n[a-z]+\t{}\n%%\n'; + const doc = require('../server/src/parser/flexParser').parseFlexDocument(src); + const diags = computeFlexDiagnostics(doc, src); + const unused = diags.filter(d => d.code === 'flex/unused-sc'); + assert(unused.length === 1, 'audit-C: lowercase SC declared but unused must produce flex/unused-sc'); +} + +// Bug D — abbrRefs: single-tab separator between pattern and action was not +// detected by the old \s{2,} heuristic, causing C-code {name} tokens inside the +// action body to be falsely counted as abbreviation refs (false negative for +// flex/unused-abbrev when the abbreviation is only "used" in action code). +// +// The triggering scenario: a C compound literal or array initializer like +// `{ int arr[] = {N}; }` contains `{N}` which matches the {identifier} regex. +// With \s{2,}, the single-tab separator was not recognised → actionStart=line.length +// → {N} falsely counted as a pattern abbreviation ref → no flex/unused-abbrev. +{ + // N defined in definitions section, only referenced as {N} inside a C array + // initialiser in the action body (single-tab separator) → must still warn + const src = '%option noyywrap\nN\t[0-9]+\n%%\n[a-z]+\t{ int arr[] = {N}; }\n%%\n'; + const doc = require('../server/src/parser/flexParser').parseFlexDocument(src); + const diags = computeFlexDiagnostics(doc, src); + const unused = diags.filter(d => d.code === 'flex/unused-abbrev'); + assert(unused.length === 1, 'Bug-D: {name} only inside single-tab action body must produce flex/unused-abbrev'); +} +{ + // N used in pattern (before tab+action) → no unused-abbrev + const src = '%option noyywrap\nN\t[0-9]+\n%%\n{N}\t{ return 1; }\n%%\n'; + const doc = require('../server/src/parser/flexParser').parseFlexDocument(src); + const diags = computeFlexDiagnostics(doc, src); + const unused = diags.filter(d => d.code === 'flex/unused-abbrev'); + assert(unused.length === 0, 'Bug-D: abbreviation used in pattern before single-tab action must not produce flex/unused-abbrev'); +} + +// ───────────────────────────────────────────────────────────────────────────── +// Bison audit checks +// ───────────────────────────────────────────────────────────────────────────── +console.log('\n=== TEST: Bison audit checks ==='); + +{ + // audit-E: lowercase/mixed-case tokens in %left/%right/%nonassoc must be recorded + // in doc.precedence so they don't produce false bison/undeclared-token diagnostics. + const src = [ + '%token kPLUS kMINUS kNUM', + '%left kPLUS kMINUS', + '%%', + 'start : expr ;', + 'expr : expr kPLUS expr { $$ = $1 + $3; }', + ' | expr kMINUS expr { $$ = $1 - $3; }', + ' | kNUM', + ' ;', + '%%', + ].join('\n'); + const doc = parseBisonDocument(src); + assert( + doc.precedence.length === 1 && doc.precedence[0].symbols.includes('kPLUS') && doc.precedence[0].symbols.includes('kMINUS'), + 'audit-E: lowercase tokens kPLUS/kMINUS recorded in doc.precedence', + ); + const diags = computeBisonDiagnostics(doc, src); + const undeclared = diags.filter(d => d.code === 'bison/undeclared-token'); + assert(undeclared.length === 0, 'audit-E: no false bison/undeclared-token for lowercase precedence tokens (got ' + undeclared.length + ')'); +} + +{ + // audit-F: $N after a nested sub-block inside an action must be detected. + // Pattern: A { if (cond) { skip(); } $5; } — 1 symbol, $5 is out-of-bounds. + // Old /\{[^}]*\}/ missed $5 because it appears after the inner `}`. + const src = [ + '%token A', + '%%', + 'start : stmt ;', + 'stmt : A { if (1) { int x = 0; } $$ = $5; }', + ' ;', + '%%', + ].join('\n'); + const doc = parseBisonDocument(src); + const diags = computeBisonDiagnostics(doc, src); + const oob = diags.filter(d => d.code === 'bison/out-of-bounds' && d.message.includes('$5')); + assert(oob.length >= 1, 'audit-F: $5 after nested sub-block in action is detected as out-of-bounds'); +} + // ───────────────────────────────────────────────────────────────────────────── // Results // ─────────────────────────────────────────────────────────────────────────────