Skip to content

DioBey7/Syntax-Engine-CFG-PARSER-ANALYZER

Repository files navigation

Syntax Engine: CFG Parser & Analyzer

A production-grade Context-Free Grammar (CFG) parser implementing recursive descent with static left-recursion detection, intelligent backtracking, and automated visual topology generation.

Python 3.8+ License: MIT Code Quality


📋 Table of Contents


Overview

Syntax Engine is a zero-dependency, library-agnostic CFG parser framework designed for academic research, compiler construction, and formal language analysis. It implements a top-down recursive descent parser with sophisticated mechanisms for handling ambiguity, detecting structural anomalies, and generating publishable parse tree visualizations.

Key Differentiators

  • No external parser generators: Built from first principles without NLTK, Lark, PLY, or similar frameworks
  • Static cycle detection: Eliminates runtime stack overflow risks through pre-execution DFS analysis
  • Contextual diagnostics: Provides "Where, What, Why" error reporting with token-level precision
  • Dual interfaces: CLI for batch processing and GUI for interactive analysis
  • Production-ready: Comprehensive error handling, logging, and resource management

Use Cases

  • Compiler course projects: Direct BNF grammar parsing without external tools
  • Formal language research: Experimental grammar analysis and derivation studies
  • Syntax validation: Batch sentence validation against arbitrary CFGs
  • Educational visualization: Parse tree generation for teaching formal languages
  • Grammar testing: Rapid iteration on grammar design with visual feedback

Core Features

🎯 Parsing & Recognition

Feature Description
Recursive Descent Parser Deterministic top-down parsing with backtracking support for non-deterministic grammars
BNF Grammar Support Accepts standard Backus-Naur Form with multi-line productions and epsilon (ε) rules
Epsilon Production Handling Correct treatment of null derivations in alternative branches
Production Sorting Length-based heuristic ordering for efficient deterministic path selection

🛡️ Correctness & Safety

Feature Description
Left-Recursion Detection Static DFS-based cycle detection preventing infinite loops
Token Index Management Precise state restoration across backtracking branches
Bounds Checking EOF (End-of-File) protection with graceful error reporting
Grammar Validation Verification of terminal/non-terminal consistency

📊 Visualization & Analysis

Feature Description
Parse Tree Generation Text-based tree visualization with proper indentation and nesting
Graphical Topology Rendering High-resolution PDF/PNG generation via Graphviz integration
JSON Serialization Clean dictionary-based derivation export with linguistic tag stripping
Interactive GUI Multi-tab interface with real-time analysis and topology preview

🔍 Diagnostics

Feature Description
Semantic Error Reports Contextual analysis identifying exact error location and expected tokens
Kernel Logging Timestamped event tracking with error severity classification
Progress Indication Visual feedback during batch processing with animate progress bars
Entry-level Granularity Per-sentence analysis with indexed result correlation

Technical Architecture

System Design Overview

┌─────────────────────────────────────────────────────────┐
│                    User Interfaces                      │
├──────────────────────────┬──────────────────────────────┤
│   CLI (main_display.py)  │  GUI (main_GUI.py)          │
│   - Batch processing     │  - Interactive analysis      │
│   - Terminal output      │  - Visual topology preview   │
│   - PDF generation       │  - Real-time feedback        │
└──────────────────────────┴──────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────┐
│              Parser Kernel (my_parser.py)               │
├─────────────────────────────────────────────────────────┤
│ • Recursive descent implementation                       │
│ • Left-recursion detection (DFS)                        │
│ • Backtracking state management                         │
│ • Error diagnostics engine                              │
└─────────────────────────────────────────────────────────┘
                           ▼
┌──────────────────────────┬──────────────────────────────┐
│   Reader (my_reader.py)  │  Tree Generator              │
│                          │  (parse_tree_generator.py)   │
│ • Grammar parsing        │  • Parse tree rendering      │
│ • Sentence tokenization  │  • Graphviz integration      │
│ • File I/O handling      │  • Image processing          │
└──────────────────────────┴──────────────────────────────┘

Module Responsibilities

my_reader.py — Grammar & Input Processing

Handles all file I/O and data transformation operations.

Reader
├── __init__(grammar_file, sentences_file)
├── read_bnf_grammar() → Dict[str, List[List[str]]]
│   └── Parses BNF format with multi-line support
└── read_sentences() → List[List[str]]
    └── Tokenizes sentences with epsilon handling

Design decisions:

  • Path resolution relative to script location for portability
  • UTF-8 encoding for international character support
  • Empty line skipping for flexible file formatting
  • Fallback empty return on FileNotFoundError for graceful degradation

my_parser.py — Core Parsing Engine

Implements the recursive descent parser with cycle detection.

Parser
├── __init__(grammar_file, sentences_file)
│   └── Initializes grammar, sentences, and parsing state
├── detect_left_recursion() → List[List[str]]
│   └── DFS-based cycle detection
├── parsing_rules(symbol: str) → Union[Dict, str, None]
│   └── Recursive descent with backtracking
└── error_check(sentence, start_symbol) → None
    └── Diagnostic report generation

State variables:

  • current_index: Current token position
  • tokens: Input token sequence
  • max_idx: Furthest successfully consumed position
  • expected_at_max: Set of expected tokens at failure point

parse_tree_generator.py — Visualization Engine

Converts derivation structures to human-readable formats.

Functions:
├── generate_parse_tree(derivation) → str
│   └── Text-based tree with ASCII art
└── generate_graphical_tree(derivation, entry_num, output_format) → str
    └── Graphviz PDF/PNG rendering

main_display.py — Command Line Interface

Batch processing with terminal output and PDF generation.

run_test(grammar_file, sentences_file) → None
├── Left-recursion check
├── Iterative sentence parsing
├── Result accumulation
└── Parse tree visualization (text + graphical)

main_GUI.py — Graphical User Interface

Multi-threaded interactive analysis with real-time feedback.

SyntaxEngine (customtkinter.CTk)
├── Sidebar controls (file loading, execution)
├── Tabbed output (parse tree, JSON, anomaly logs)
├── Kernel logging (timestamped event tracking)
├── Progress animation (visual feedback)
└── Topology preview (graphical tree display)

Algorithm Specifications

1. Recursive Descent Parser

Algorithm: parsing_rules(symbol)

function parsing_rules(symbol):
    if symbol = "ε":
        return "ε"  // epsilon production matches empty string
    
    if symbol ∉ grammar:
        // Terminal symbol (lexeme)
        if current_index < tokens.length AND tokens[current_index] = symbol:
            value ← tokens[current_index]
            current_index ← current_index + 1
            return value
        else:
            // Track error location
            if current_index ≥ max_idx:
                max_idx ← current_index
                expected_at_max ← {symbol}
            return None  // terminal mismatch
    
    // Non-terminal symbol (recursion)
    start_pos ← current_index
    sorted_prods ← sort(grammar[symbol], by: length DESC)
    
    for each production in sorted_prods:
        current_index ← start_pos  // reset on backtrack
        children ← {}
        match_all ← true
        
        for each part in production:
            result ← parsing_rules(part)
            if result ≠ None:
                children[part] ← result
            else:
                match_all ← false
                break  // try next production
        
        if match_all:
            return children  // successful derivation
    
    return None  // all productions failed

Complexity Analysis:

  • Time: O(n × g × p) in worst case
    • n = input length
    • g = grammar size (non-terminals)
    • p = average productions per non-terminal
  • Space: O(d) call stack, where d = maximum derivation depth
  • Backtracking: Automatic state restoration via recursion stack

Key Properties:

  • Deterministic on LL(1) grammars
  • Handles ambiguous grammars via leftmost derivation preference
  • Production ordering heuristic (longest-first) reduces backtracking

2. Left-Recursion Detection

Algorithm: detect_left_recursion()

function detect_left_recursion():
    // Build dependency graph
    dependencies ← {}
    for each (A → β₁ | β₂ | ... | βₙ) in grammar:
        first_symbols ← {}
        for each production β:
            if β[0] ∈ grammar:  // first symbol is non-terminal
                first_symbols.add(β[0])
        dependencies[A] ← first_symbols
    
    // DFS-based cycle detection
    cycles ← []
    visited ← {}
    path ← []
    
    function dfs(node):
        if node in path:
            // Cycle detected
            start ← path.indexOf(node)
            cycles.append(path[start:] + [node])
            return
        
        if visited[node]:
            return  // already processed
        
        path.append(node)
        for neighbor in dependencies[node]:
            dfs(neighbor)
        path.pop()  // backtrack
        visited[node] ← true
    
    for each non_terminal in grammar:
        if not visited[non_terminal]:
            dfs(non_terminal)
    
    return cycles

Complexity Analysis:

  • Time: O(V + E) where V = non-terminals, E = production relationships
  • Space: O(V) for visited tracking + O(d) recursion stack

Correctness:

  • Detects direct left-recursion: A → A α
  • Detects indirect left-recursion: A → B → C → A
  • Cycles involving epsilon are correctly identified as recursive

3. Error Diagnostic System

Error Location Algorithm:

Upon parse failure:

1. Identify failure point: max_idx (furthest consumed token)
2. Determine expected tokens: expected_at_max set
3. Generate contextual message:
   
   if max_idx = 0:
       "Sentence begins with '{token}', but grammar requires {expected}"
   else:
       "After '{prev_token}', grammar expects {expected}, found '{token}'"

System Requirements

Software Prerequisites

Component Minimum Recommended
Python 3.8 3.10+
Graphviz 2.40 14.1.5+
OS Windows 10 / macOS 10.14 / Linux (Ubuntu 18.04+) Windows 11 / macOS 12+ / Linux (Ubuntu 22.04+)

Python Package Dependencies

customtkinter>=5.0          # Modern GUI framework
graphviz>=0.20              # Parse tree visualization
Pillow>=9.0                 # Image processing

Installation Instructions

Step 1: Python Environment Setup

# Create isolated Python environment (recommended)
python -m venv syntax-engine-env

# Activate environment
# Windows:
syntax-engine-env\Scripts\activate
# macOS/Linux:
source syntax-engine-env/bin/activate

Step 2: Install Python Packages

pip install --upgrade pip
pip install customtkinter graphviz Pillow

Step 3: Install Graphviz System Software

Windows (32/64-bit):

  1. Download: Graphviz 14.1.5 Installer
  2. Run installer with Administrator privileges
  3. Critical: Check ✓ "Add Graphviz to the system PATH for all users" during installation
  4. Restart command prompt to reload environment variables
  5. Verify installation:
    dot -V

macOS (Homebrew):

brew install graphviz
dot -V  # Verify installation

Linux (Ubuntu/Debian):

sudo apt-get update
sudo apt-get install graphviz
dot -V  # Verify installation

Step 4: Verify Installation

python -c "import customtkinter, graphviz, PIL; print('All dependencies installed successfully!')"

Installation & Setup

Quick Start

# 1. Clone repository
git clone https://github.com/DioBey7/Syntax-Engine-CFG-PARSER-ANALYZER.git
cd Syntax-Engine-CFG-PARSER-ANALYZER

# 2. Create and activate virtual environment
python -m venv venv
# Windows: venv\Scripts\activate
# macOS/Linux: source venv/bin/activate

# 3. Install dependencies
pip install customtkinter graphviz Pillow

# 4. Run GUI (recommended for first-time users)
python main_GUI.py

# OR run CLI for batch processing
python main_display.py

Project Structure

Syntax-Engine-CFG-PARSER-ANALYZER/
├── my_reader.py                          # Grammar & input file processing
├── my_parser.py                          # Core parsing engine
├── parse_tree_generator.py                # Visualization & rendering
├── main_display.py                       # Command-line interface
├── main_GUI.py                           # Graphical user interface
├── grammar1.txt                          # Example: Arithmetic expressions
├── grammar2.txt                          # Example: Boolean logic
├── grammar3.txt                          # Example: Context-free language
├── sentences.txt                         # Test sentences for grammar1
├── sentences2.txt                        # Test sentences for grammar2
├── sentences3.txt                        # Test sentences for grammar3
├── execution_and_dependency_notes.txt    # Configuration reference
├── requirements.txt                      # Python package manifest
├── README.md                             # This file
└── output/                               # Generated parse trees (PDF/PNG)
    ├── grammar1_entry_1.pdf
    ├── gui_entry_1.png
    └── ...

Usage Guide

Command-Line Interface (CLI)

Execution:

python main_display.py

Workflow:

  1. Parser loads grammar and sentences from configured files
  2. Left-recursion detection runs (halts execution if cycles found)
  3. Iterative parsing of each sentence
  4. Valid sentences: parse tree printed to terminal + PDF generated
  5. Invalid sentences: error diagnostics printed
  6. Summary statistics displayed

Output Example:

__________ ANALYZING: grammar1.txt __________

----------------------------------------
Input: a + b * c
Valid
Parse tree:

E
├── T
│   └── F
│       └── a
├── +
└── T
    ├── F
    │   └── b
    ├── *
    └── F
        └── c

JSON:
{
    "E": {
        "T": {"F": "a"},
        "+": "+",
        "T": {
            "F": "b",
            "*": "*",
            "F": "c"
        }
    }
}

Parse tree saved to: grammar1_entry_1.pdf

----------------------------------------
grammar1.txt Results: 1/1 sentences are syntactically valid.

Graphical User Interface (GUI)

Execution:

python main_GUI.py

Step-by-Step Usage:

  1. Load Grammar:

    • Click "LOAD GRAMMAR" button
    • Select a .txt file containing BNF grammar definition
    • Confirmation appears in kernel log
  2. Load Sentences:

    • Click "LOAD SENTENCES" button
    • Select a .txt file containing test sentences
    • Confirmation appears in kernel log
  3. Execute Engine:

    • Click "EXECUTE ENGINE" button
    • Progress bar animates during analysis
    • Results populate three tabs:

    Tab 1: PARSE TREE

    • Text representation with proper indentation
    • Toggle "GRAPHICAL TOPOLOGY VIEW" to switch to PNG/PDF rendering
    • Individual entry numbering

    Tab 2: JSON STRUCTURE

    • Clean dictionary serialization without linguistic tags
    • Pretty-printed with 4-space indentation
    • Indexed entries for batch results

    Tab 3: ANOMALY LOGS

    • Invalid sentences with detailed error analysis
    • Location identification (token index)
    • Expected vs. actual token report
    • Contextual explanation
  4. Interpret Results:

    • Green kernel log messages = successful operations
    • Red messages with [!] = errors or warnings
    • Sound feedback on operation completion
    • Link to developer repository available

GUI Features:

Feature Description
Dark Theme Eye-friendly interface for extended use
Sound Effects Audio feedback on success/error/completion
Progress Animation Visual indication of processing time
Graphical Preview Embedded topology rendering
Multi-threading Responsive UI during long operations
Event Logging Kernel-style timestamped messages

Grammar Specification

BNF Format

Standard Syntax:

<non-terminal> ::= production1 | production2 | ... | productionN
<multi-line>   ::= alternative1
                 | alternative2
                 | alternative3

Elements:

Element Syntax Example Meaning
Non-terminal <identifier> <Expression> Grammar rule identifier
Terminal (Token) identifier or string id, +, * Literal token to match
Epsilon ε <Optional> ::= item | ε Empty/null production
Alternative | a | b | c Choice between productions
Grouping Whitespace a b c Sequential composition

Example Grammars

Example 1: Simple Arithmetic Expressions

<Expr> ::= <Term> | <Expr> + <Term> | <Expr> - <Term>
<Term> ::= <Factor> | <Term> * <Factor> | <Term> / <Factor>
<Factor> ::= ( <Expr> ) | id | num

Example 2: Boolean Logic

<BoolExpr> ::= <Term> | <BoolExpr> OR <Term>
<Term> ::= <Factor> | <Term> AND <Factor>
<Factor> ::= NOT <Factor> | ( <BoolExpr> ) | true | false | id

Example 3: Context-Free Language (Balanced Parentheses)

<S> ::= ε | ( <S> ) <S>

Example 4: Simple Imperative Language

<Program> ::= <Statement> | <Program> <Statement>
<Statement> ::= <Assignment> | <IfStatement> | <WhileLoop>
<Assignment> ::= id = <Expr> ;
<IfStatement> ::= if ( <Expr> ) { <Program> } else { <Program> }
<WhileLoop> ::= while ( <Expr> ) { <Program> }
<Expr> ::= id | num | <Expr> + <Expr> | ( <Expr> )

File Format Requirements

Grammar File (grammar.txt):

<E> ::= <T> | <E> + <T> | <E> - <T>
<T> ::= <F> | <T> * <F> | <T> / <F>
<F> ::= ( <E> ) | id

# Comments are NOT supported (will cause parsing errors)
# Keep one rule per line or use continuation on next line
# Ensure balanced <> for non-terminals

Sentence File (sentences.txt):

a + b * c
id - num
( a )
a
(a+b)*c
ε

Parsing Rules:

  • Single tokens (no spaces): abc[a, b, c]
  • Space-separated tokens: a + b[a, +, b]
  • Epsilon token: ε → treated as empty production
  • Empty lines: skipped
  • File encoding: UTF-8 required

API Documentation

Module: my_reader.py

Class: Reader

class Reader:
    def __init__(self, grammar_file: str, sentences_file: str) -> None:
        """
        Initialize reader with file paths (relative to script location).
        
        Args:
            grammar_file (str): Relative path to BNF grammar file
            sentences_file (str): Relative path to sentences file
        
        Raises:
            None (graceful fallback on FileNotFoundError)
        """
    def read_bnf_grammar() -> Dict[str, List[List[str]]]:
        """
        Parse BNF grammar file and return production rules.
        
        Returns:
            Dict[str, List[List[str]]]: Grammar dictionary
                {
                    'E': [['T'], ['E', '+', 'T'], ['E', '-', 'T']],
                    'T': [['F'], ['T', '*', 'F']],
                    'F': [['(', 'E', ')'], ['id']]
                }
        
        Raises:
            FileNotFoundError: Returns {} (empty dict) if file not found
        
        Examples:
            >>> reader = Reader('grammar.txt', 'sentences.txt')
            >>> grammar = reader.read_bnf_grammar()
            >>> grammar['E']
            [['T'], ['E', '+', 'T']]
        """
    def read_sentences() -> List[List[str]]:
        """
        Parse sentences file and return tokenized input.
        
        Returns:
            List[List[str]]: Tokenized sentences
                [
                    ['a', '+', 'b'],
                    ['(', 'a', ')'],
                    ['ε']
                ]
        
        Raises:
            FileNotFoundError: Returns [] (empty list) if file not found
        
        Notes:
            - Single-token inputs (no spaces) are split character by character
            - Multi-token inputs (with spaces) are split by whitespace
            - Empty lines are skipped
            - Epsilon token 'ε' is preserved
        
        Examples:
            >>> sentences = reader.read_sentences()
            >>> sentences[0]
            ['a', '+', 'b']
        """

Module: my_parser.py

Class: Parser

class Parser:
    def __init__(self, grammar_file: str, sentences_file: str) -> None:
        """
        Initialize parser with grammar and sentences.
        
        Attributes:
            grammar (Dict[str, List[List[str]]]): Parsed BNF rules
            sentences (List[List[str]]): Tokenized input sentences
            tokens (List[str]): Current sentence token buffer
            current_index (int): Position in token buffer
            max_idx (int): Furthest successfully consumed position
            expected_at_max (List[str]): Expected tokens at failure point
        """
    def detect_left_recursion() -> List[List[str]]:
        """
        Detect left-recursive cycles in grammar using DFS.
        
        Returns:
            List[List[str]]: Cycles found (empty if no cycles)
            [
                ['E', 'T', 'E'],  # Cycle: E→T→E
                ['S', 'S']         # Direct recursion: S→S
            ]
        
        Complexity:
            Time: O(V + E) where V = non-terminals, E = edges
            Space: O(V) visited set + O(d) recursion depth
        
        Notes:
            - Runs before parsing to prevent stack overflow
            - Detects both direct and indirect left-recursion
            - Returns empty list if grammar is left-recursive-free
        
        Examples:
            >>> cycles = parser.detect_left_recursion()
            >>> if cycles:
            ...     print("Grammar has left-recursion! Aborting.")
        """
    def parsing_rules(symbol: str) -> Union[Dict, str, None]:
        """
        Recursive descent parser with backtracking.
        
        Args:
            symbol (str): Non-terminal or terminal to parse
        
        Returns:
            Union[Dict, str, None]:
                - Dict: Successful derivation tree
                - str: Terminal token or epsilon (ε)
                - None: Parse failure
        
        State modifications:
            - Updates current_index on successful terminal match
            - Updates max_idx and expected_at_max on mismatch
            - Backtracking via recursion stack (automatic)
        
        Complexity:
            Time: O(n × g × p) worst case backtracking
            Space: O(d) recursion stack depth
        
        Notes:
            - Productions sorted by length (longest first) for heuristic
            - Epsilon productions handled implicitly
            - Non-determinism resolved via production order
        
        Examples:
            >>> parser.tokens = ['a', '+', 'b']
            >>> parser.current_index = 0
            >>> result = parser.parsing_rules('E')
            >>> print(result)
            {'T': {'F': 'a'}, '+': '+', 'T': {'F': 'b'}}
        """
    def error_check(sentence: List[str], start_symbol: str) -> None:
        """
        Generate detailed error diagnostics for failed parse.
        
        Args:
            sentence (List[str]): Original input sentence
            start_symbol (str): Grammar start symbol
        
        Prints:
            - Token position and label at error location
            - Expected tokens at failure point
            - Contextual explanation (what preceded the error)
        
        Output format:
            Invalid
            
            Error:
            •Where the error occurs: at token 2 ("d")
            •What was expected: "b" or "c"
            •Why the sentence is invalid: after 'a', the grammar requires ...
        
        Examples:
            >>> parser.error_check(['a', 'd'], 'S')
            Invalid
            
            Error:
            •Where the error occurs: at token 2 ("d")
            ...
        """

Module: parse_tree_generator.py

def generate_parse_tree(derivation: Dict) -> str:
    """
    Generate text-based parse tree with proper indentation.
    
    Args:
        derivation (Dict): Tree structure from parser
        {
            'E': {
                'T': {'F': 'a'},
                '+': '+',
                'T': {'F': 'b'}
            }
        }
    
    Returns:
        str: Formatted tree representation
        E
        ├── T
        │   └── F
        │       └── a
        ├── +
        └── T
            └── F
                └── b
    
    Notes:
        - Unicode box-drawing characters for visual hierarchy
        - Proper indentation at each nesting level
        - Terminal values right-aligned with parent
    """
def generate_graphical_tree(derivation: Dict, entry_num: str, 
                           output_format: str = 'pdf', 
                           view: bool = False) -> str:
    """
    Generate high-resolution parse tree visualization using Graphviz.
    
    Args:
        derivation (Dict): Tree structure from parser
        entry_num (str): Identifier for output file (e.g., 'entry_1')
        output_format (str): 'pdf' or 'png' (default: 'pdf')
        view (bool): Open file in default viewer (default: False)
    
    Returns:
        str: Path to generated output file
        'output/entry_1.pdf'
    
    Raises:
        graphviz.ExecutableNotFound: Graphviz not in system PATH
        subprocess.CalledProcessError: Rendering failed
    
    Notes:
        - Requires Graphviz system software installed
        - Creates 'output/' directory if not present
        - High-resolution rendering suitable for publication
        - Color-coded node types (non-terminal vs. terminal)
    
    Examples:
        >>> path = generate_graphical_tree(result, 'entry_1', 'pdf')
        >>> print(f"Tree saved to: {path}")
        Tree saved to: output/entry_1.pdf
    """

Design Patterns & Solutions

Problem 1: Handling Non-Determinism

Problem: Ambiguous grammars generate multiple valid parse trees. Standard recursive descent can get stuck in exponential backtracking.

Solution: Production Heuristic Ordering

# Sort productions by length (longest first)
sorted_productions = sorted(grammar[symbol], key=len, reverse=True)

for production in sorted_productions:
    # Try longest productions first
    # Reduces backtracking for greedy matches

Rationale:

  • Longest productions typically bind more tightly
  • Reduces failed branch exploration
  • Implements leftmost-longest parsing strategy
  • Approximates precedence without explicit precedence rules

Problem 2: Left-Recursion in Recursive Descent

Problem: Recursive descent cannot handle left-recursive grammars:

E → E + T | T    # E appears as first symbol in production

This causes infinite recursion: E → E → E → ...

Solution: Static DFS Cycle Detection

def detect_left_recursion():
    # Build dependency graph of non-terminals
    dependencies = {symbol: {first_symbols_of_productions}}
    
    # Run DFS from each non-terminal
    # If we revisit a node in current path → cycle detected
    # Abort parser before attempting problematic derivation

Workaround for Left-Recursive Grammars:

If your grammar has left-recursion, convert it to equivalent right-recursive form:

Original (left-recursive):
E → E + T | T

Converted (right-recursive):
E → T E'
E' → + T E' | ε

Problem 3: Precise Error Diagnostics

Problem: Generic "Syntax Error" messages don't help users debug invalid sentences.

Solution: Error Context Tracking

# Track maximum progress point
self.max_idx = 0
self.expected_at_max = []

# On each mismatch
if current_index >= max_idx:
    max_idx = current_index
    expected_at_max = expected_tokens

# Error report includes:
# 1. WHERE: Exact token position
# 2. WHAT: Expected tokens at that position
# 3. WHY: Context-aware explanation

Problem 4: Backtracking State Management

Problem: Manual backtracking leads to complex state restoration bugs.

Solution: Recursion Stack Semantics

# Save state before trying production
start_pos = current_index

# Recursively parse production
for part in production:
    result = parsing_rules(part)  # Updates current_index
    if result is None:
        break  # Failed
    
# Backtrack by restoring saved position
current_index = start_pos

Problem 5: Visual Tree Generation

Problem: Parse trees are complex nested structures difficult to visualize as text or images.

Solution: Graphviz Integration

# Convert derivation dictionary to DOT format
def derivation_to_dot(tree, parent_id=0):
    # Each node gets unique ID
    # Create DOT edge for each parent-child relationship
    # Render to PDF/PNG using Graphviz executable

Performance Analysis

Empirical Complexity

Scenario Complexity Notes
Successful Parse (LL(1)) O(n) Linear scan, no backtracking
With Backtracking O(n × p^g) worst case Exponential in worst case (ambiguous grammars)
Left-Recursion Detection O(V + E) DFS on dependency graph
Parse Tree Rendering O(V) V = tree nodes, linear traversal
Graphviz Rendering O(n log n) Neato layout algorithm

Optimization Strategies

  1. Memoization (Future Enhancement):

    # Cache successful (symbol, position) pairs
    memo[(symbol, current_index)] = result
    # Skip re-parsing identical subproblems
    # Would reduce worst-case exponential to polynomial
  2. Early Termination:

    # Stop after first valid parse (if ambiguity allowed)
    if match_all:
        return children  # Don't try remaining productions
  3. Grammar Optimization:

    • Factor out common prefixes
    • Eliminate redundant rules
    • Convert to Greibach Normal Form if needed
  4. Production Ordering:

    • Longest-first heuristic reduces backtracking
    • Profile typical inputs to order frequently-used productions first

Troubleshooting

Installation Issues

Error: ModuleNotFoundError: No module named 'customtkinter'

Solution:

pip install --upgrade customtkinter
# OR
pip install customtkinter graphviz Pillow

Error: graphviz.ExecutableNotFound: "dot" not found

Windows:

  1. Download Graphviz installer
  2. Run installer as Administrator
  3. Critical: Check "Add Graphviz to system PATH"
  4. Restart command prompt
  5. Verify: dot -V

macOS/Linux:

# macOS (Homebrew)
brew install graphviz

# Linux (Ubuntu/Debian)
sudo apt-get install graphviz

# Verify
dot -V

Runtime Errors

Error: FileNotFoundError: [Errno 2] No such file or directory: 'grammar.txt'

Solution:

# Ensure files are in same directory as script
# OR provide absolute paths:
Reader('/absolute/path/to/grammar.txt', '/absolute/path/to/sentences.txt')

Error: SyntaxError when parsing grammar file

Causes:

  • Missing ::= separator
  • Unbalanced <> in non-terminal names
  • Extra whitespace in unexpected places

Solution:

Correct:    <E> ::= <T> | <E> + <T>
Incorrect:  E ::= T | E + T           # Missing <>
Incorrect:  <E> => <T> | <E> + <T>    # Wrong operator (=>)

Error: RecursionError: maximum recursion depth exceeded

Causes:

  • Left-recursion in grammar (should be caught by detection)
  • Grammar with epsilon cycles creating infinite derivations

Solution:

# Increase recursion limit (temporary)
import sys
sys.setrecursionlimit(10000)

# Better: Fix the grammar
# Convert left-recursive rules to right-recursive form

Parsing Issues

Problem: All sentences marked "Invalid" despite correct input

Diagnostic:

# Check grammar parsing
parser = Parser('grammar.txt', 'sentences.txt')
print(parser.grammar)  # Verify structure
print(parser.sentences)  # Verify tokenization

Common Causes:

  1. Start symbol mismatch: Grammar starts with <S> but parser assumes first non-terminal

    Solution: Explicitly set start symbol

    start_symbol = 'E'  # instead of assuming
    result = parser.parsing_rules(start_symbol)
  2. Tokenization mismatch: Input a + b tokenized as ['a', '+', 'b'] but grammar expects ['a+b']

    Solution: Consistent tokenization in both grammar and sentence files

  3. Non-terminal/terminal confusion: <id> in grammar but id in sentences

    Solution: Grammar terminals should NOT have angle brackets


Problem: GUI shows "HALTED: Input/Context parameters required"

Solution:

  1. Click "LOAD GRAMMAR" and select a .txt file
  2. Click "LOAD SENTENCES" and select a .txt file
  3. Verify buttons show green checkmark text
  4. Click "EXECUTE ENGINE"

Problem: Graphical trees not rendering

Cause: Graphviz not in PATH or parse failed

Solution:

# Verify Graphviz
dot -V

# If failing:
# Windows: Reinstall with PATH option
# macOS: brew install graphviz
# Linux: sudo apt-get install graphviz

# Check temp directory has write permissions

Performance Issues

Problem: Parsing takes excessive time

Likely Cause: Exponential backtracking on ambiguous grammar

Optimization:

  1. Check for unintended ambiguity in grammar
  2. Add left-factoring to reduce branching
  3. Profile parser to identify hot paths:
    import cProfile
    cProfile.run('parser.parsing_rules(start_symbol)')

Contributing

Development Guidelines

  1. Code Style:

    • Follow PEP 8
    • Use type hints for new functions
    • Document with docstrings
  2. Testing:

    • Add test grammars to test_grammars/ directory
    • Ensure new features don't break existing tests
    • Test edge cases (epsilon, left-recursion, etc.)
  3. Commits:

    • Clear commit messages
    • Reference issues where applicable
    • Keep atomic, focused changes

Future Enhancements

  • Memoization for polynomial-time ambiguous parsing
  • LL/LR grammar classification
  • Grammar optimization and normalization (CNF, GNF)
  • Support for regular expressions in terminals
  • Parallel batch processing
  • Web-based IDE interface
  • Export parse trees to LaTeX/TikZ

License

This project is licensed under the MIT License - see LICENSE file for details.


Academic References

Theoretical Foundation

  • Hopcroft, Motwani, Ullman (2006). Introduction to Automata Theory, Languages, and Computation. Pearson.

    • Formal definition of context-free grammars
    • Parsing algorithms
  • Aho, Lam, Sethi, Ullman (2006). Compilers: Principles, Techniques, and Tools (2nd ed.). Pearson.

    • Top-down parsing (recursive descent)
    • Left-recursion handling
    • Error recovery
  • Grune, Jacobs (2008). Parsing Techniques: A Practical Guide (2nd ed.). Springer.

    • Comprehensive parsing reference
    • Backtracking strategies

Implementation References


Citation

If you use this project in academic work, please cite:

@software{syntax_engine_2025,
  author = {DioBey7},
  title = {Syntax Engine: {CFG} Parser \& Analyzer},
  year = {2025},
  url = {https://github.com/DioBey7/Syntax-Engine-CFG-PARSER-ANALYZER}
}

Author

DioBey7
GitHub: @DioBey7
Repository: Syntax-Engine-CFG-PARSER-ANALYZER


Acknowledgments

  • Inspiration from classical compiler construction course materials
  • Graphviz team for powerful graph visualization
  • CustomTkinter maintainers for modern Python GUI framework
  • Academic community for formal language theory foundations

Last Updated: 2026-05-10
Version: 1.0 (Academic Build 443728)

About

A high-performance, library-free BNF syntax analyzer featuring a custom Recursive Descent kernel, static DFS left-recursion guard, and automated Graphviz visual topologies.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors