Name	Name	Last commit message	Last commit date
Latest commit History 3 Commits
LaTeX Report	LaTeX Report
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
execution_and_dependency_notes.txt	execution_and_dependency_notes.txt
grammar1.txt	grammar1.txt
grammar2.txt	grammar2.txt
grammar3.txt	grammar3.txt
main_GUI.py	main_GUI.py
main_display.py	main_display.py
my_parser.py	my_parser.py
my_reader.py	my_reader.py
sentences.txt	sentences.txt
sentences2.txt	sentences2.txt
sentences3.txt	sentences3.txt

Syntax Engine: CFG Parser & Analyzer

A production-grade Context-Free Grammar (CFG) parser implementing recursive descent with static left-recursion detection, intelligent backtracking, and automated visual topology generation.

Overview

Syntax Engine is a zero-dependency, library-agnostic CFG parser framework designed for academic research, compiler construction, and formal language analysis. It implements a top-down recursive descent parser with sophisticated mechanisms for handling ambiguity, detecting structural anomalies, and generating publishable parse tree visualizations.

Key Differentiators

No external parser generators: Built from first principles without NLTK, Lark, PLY, or similar frameworks
Static cycle detection: Eliminates runtime stack overflow risks through pre-execution DFS analysis
Contextual diagnostics: Provides "Where, What, Why" error reporting with token-level precision
Dual interfaces: CLI for batch processing and GUI for interactive analysis
Production-ready: Comprehensive error handling, logging, and resource management

Use Cases

Compiler course projects: Direct BNF grammar parsing without external tools
Formal language research: Experimental grammar analysis and derivation studies
Syntax validation: Batch sentence validation against arbitrary CFGs
Educational visualization: Parse tree generation for teaching formal languages
Grammar testing: Rapid iteration on grammar design with visual feedback

Core Features

🎯 Parsing & Recognition

Feature	Description
Recursive Descent Parser	Deterministic top-down parsing with backtracking support for non-deterministic grammars
BNF Grammar Support	Accepts standard Backus-Naur Form with multi-line productions and epsilon (ε) rules
Epsilon Production Handling	Correct treatment of null derivations in alternative branches
Production Sorting	Length-based heuristic ordering for efficient deterministic path selection

🛡️ Correctness & Safety

Feature	Description
Left-Recursion Detection	Static DFS-based cycle detection preventing infinite loops
Token Index Management	Precise state restoration across backtracking branches
Bounds Checking	EOF (End-of-File) protection with graceful error reporting
Grammar Validation	Verification of terminal/non-terminal consistency

📊 Visualization & Analysis

Feature	Description
Parse Tree Generation	Text-based tree visualization with proper indentation and nesting
Graphical Topology Rendering	High-resolution PDF/PNG generation via Graphviz integration
JSON Serialization	Clean dictionary-based derivation export with linguistic tag stripping
Interactive GUI	Multi-tab interface with real-time analysis and topology preview

🔍 Diagnostics

Feature	Description
Semantic Error Reports	Contextual analysis identifying exact error location and expected tokens
Kernel Logging	Timestamped event tracking with error severity classification
Progress Indication	Visual feedback during batch processing with animate progress bars
Entry-level Granularity	Per-sentence analysis with indexed result correlation

Technical Architecture

System Design Overview

┌─────────────────────────────────────────────────────────┐
│                    User Interfaces                      │
├──────────────────────────┬──────────────────────────────┤
│   CLI (main_display.py)  │  GUI (main_GUI.py)          │
│   - Batch processing     │  - Interactive analysis      │
│   - Terminal output      │  - Visual topology preview   │
│   - PDF generation       │  - Real-time feedback        │
└──────────────────────────┴──────────────────────────────┘
                           ▼
┌─────────────────────────────────────────────────────────┐
│              Parser Kernel (my_parser.py)               │
├─────────────────────────────────────────────────────────┤
│ • Recursive descent implementation                       │
│ • Left-recursion detection (DFS)                        │
│ • Backtracking state management                         │
│ • Error diagnostics engine                              │
└─────────────────────────────────────────────────────────┘
                           ▼
┌──────────────────────────┬──────────────────────────────┐
│   Reader (my_reader.py)  │  Tree Generator              │
│                          │  (parse_tree_generator.py)   │
│ • Grammar parsing        │  • Parse tree rendering      │
│ • Sentence tokenization  │  • Graphviz integration      │
│ • File I/O handling      │  • Image processing          │
└──────────────────────────┴──────────────────────────────┘

Module Responsibilities

my_reader.py — Grammar & Input Processing

Handles all file I/O and data transformation operations.

Reader
├── __init__(grammar_file, sentences_file)
├── read_bnf_grammar() → Dict[str, List[List[str]]]
│   └── Parses BNF format with multi-line support
└── read_sentences() → List[List[str]]
    └── Tokenizes sentences with epsilon handling

Design decisions:

Path resolution relative to script location for portability
UTF-8 encoding for international character support
Empty line skipping for flexible file formatting
Fallback empty return on FileNotFoundError for graceful degradation

my_parser.py — Core Parsing Engine

Implements the recursive descent parser with cycle detection.

Parser
├── __init__(grammar_file, sentences_file)
│   └── Initializes grammar, sentences, and parsing state
├── detect_left_recursion() → List[List[str]]
│   └── DFS-based cycle detection
├── parsing_rules(symbol: str) → Union[Dict, str, None]
│   └── Recursive descent with backtracking
└── error_check(sentence, start_symbol) → None
    └── Diagnostic report generation

State variables:

current_index: Current token position
tokens: Input token sequence
max_idx: Furthest successfully consumed position
expected_at_max: Set of expected tokens at failure point

parse_tree_generator.py — Visualization Engine

Converts derivation structures to human-readable formats.

Functions:
├── generate_parse_tree(derivation) → str
│   └── Text-based tree with ASCII art
└── generate_graphical_tree(derivation, entry_num, output_format) → str
    └── Graphviz PDF/PNG rendering

main_display.py — Command Line Interface

Batch processing with terminal output and PDF generation.

run_test(grammar_file, sentences_file) → None
├── Left-recursion check
├── Iterative sentence parsing
├── Result accumulation
└── Parse tree visualization (text + graphical)

main_GUI.py — Graphical User Interface

Multi-threaded interactive analysis with real-time feedback.

SyntaxEngine (customtkinter.CTk)
├── Sidebar controls (file loading, execution)
├── Tabbed output (parse tree, JSON, anomaly logs)
├── Kernel logging (timestamped event tracking)
├── Progress animation (visual feedback)
└── Topology preview (graphical tree display)

Algorithm Specifications

1. Recursive Descent Parser

Algorithm: parsing_rules(symbol)

function parsing_rules(symbol):
    if symbol = "ε":
        return "ε"  // epsilon production matches empty string
    
    if symbol ∉ grammar:
        // Terminal symbol (lexeme)
        if current_index < tokens.length AND tokens[current_index] = symbol:
            value ← tokens[current_index]
            current_index ← current_index + 1
            return value
        else:
            // Track error location
            if current_index ≥ max_idx:
                max_idx ← current_index
                expected_at_max ← {symbol}
            return None  // terminal mismatch
    
    // Non-terminal symbol (recursion)
    start_pos ← current_index
    sorted_prods ← sort(grammar[symbol], by: length DESC)
    
    for each production in sorted_prods:
        current_index ← start_pos  // reset on backtrack
        children ← {}
        match_all ← true
        
        for each part in production:
            result ← parsing_rules(part)
            if result ≠ None:
                children[part] ← result
            else:
                match_all ← false
                break  // try next production
        
        if match_all:
            return children  // successful derivation
    
    return None  // all productions failed

Complexity Analysis:

Time: O(n × g × p) in worst case
- n = input length
- g = grammar size (non-terminals)
- p = average productions per non-terminal
Space: O(d) call stack, where d = maximum derivation depth
Backtracking: Automatic state restoration via recursion stack

Key Properties:

Deterministic on LL(1) grammars
Handles ambiguous grammars via leftmost derivation preference
Production ordering heuristic (longest-first) reduces backtracking

2. Left-Recursion Detection

Algorithm: detect_left_recursion()

function detect_left_recursion():
    // Build dependency graph
    dependencies ← {}
    for each (A → β₁ | β₂ | ... | βₙ) in grammar:
        first_symbols ← {}
        for each production β:
            if β[0] ∈ grammar:  // first symbol is non-terminal
                first_symbols.add(β[0])
        dependencies[A] ← first_symbols
    
    // DFS-based cycle detection
    cycles ← []
    visited ← {}
    path ← []
    
    function dfs(node):
        if node in path:
            // Cycle detected
            start ← path.indexOf(node)
            cycles.append(path[start:] + [node])
            return
        
        if visited[node]:
            return  // already processed
        
        path.append(node)
        for neighbor in dependencies[node]:
            dfs(neighbor)
        path.pop()  // backtrack
        visited[node] ← true
    
    for each non_terminal in grammar:
        if not visited[non_terminal]:
            dfs(non_terminal)
    
    return cycles

Complexity Analysis:

Time: O(V + E) where V = non-terminals, E = production relationships
Space: O(V) for visited tracking + O(d) recursion stack

Correctness:

Detects direct left-recursion: A → A α
Detects indirect left-recursion: A → B → C → A
Cycles involving epsilon are correctly identified as recursive

3. Error Diagnostic System

Error Location Algorithm:

Upon parse failure:

1. Identify failure point: max_idx (furthest consumed token)
2. Determine expected tokens: expected_at_max set
3. Generate contextual message:
   
   if max_idx = 0:
       "Sentence begins with '{token}', but grammar requires {expected}"
   else:
       "After '{prev_token}', grammar expects {expected}, found '{token}'"

System Requirements

Software Prerequisites

Component	Minimum	Recommended
Python	3.8	3.10+
Graphviz	2.40	14.1.5+
OS	Windows 10 / macOS 10.14 / Linux (Ubuntu 18.04+)	Windows 11 / macOS 12+ / Linux (Ubuntu 22.04+)

Python Package Dependencies

customtkinter>=5.0          # Modern GUI framework
graphviz>=0.20              # Parse tree visualization
Pillow>=9.0                 # Image processing

Installation Instructions

Step 1: Python Environment Setup

# Create isolated Python environment (recommended)
python -m venv syntax-engine-env

# Activate environment
# Windows:
syntax-engine-env\Scripts\activate
# macOS/Linux:
source syntax-engine-env/bin/activate

Step 2: Install Python Packages

pip install --upgrade pip
pip install customtkinter graphviz Pillow

Step 3: Install Graphviz System Software

Windows (32/64-bit):

Download: Graphviz 14.1.5 Installer
Run installer with Administrator privileges
Critical: Check ✓ "Add Graphviz to the system PATH for all users" during installation
Restart command prompt to reload environment variables
Verify installation:
```
dot -V
```

macOS (Homebrew):

brew install graphviz
dot -V  # Verify installation

Linux (Ubuntu/Debian):

sudo apt-get update
sudo apt-get install graphviz
dot -V  # Verify installation

Step 4: Verify Installation

python -c "import customtkinter, graphviz, PIL; print('All dependencies installed successfully!')"

Installation & Setup

Quick Start

# 1. Clone repository
git clone https://github.com/DioBey7/Syntax-Engine-CFG-PARSER-ANALYZER.git
cd Syntax-Engine-CFG-PARSER-ANALYZER

# 2. Create and activate virtual environment
python -m venv venv
# Windows: venv\Scripts\activate
# macOS/Linux: source venv/bin/activate

# 3. Install dependencies
pip install customtkinter graphviz Pillow

# 4. Run GUI (recommended for first-time users)
python main_GUI.py

# OR run CLI for batch processing
python main_display.py

Project Structure

Syntax-Engine-CFG-PARSER-ANALYZER/
├── my_reader.py                          # Grammar & input file processing
├── my_parser.py                          # Core parsing engine
├── parse_tree_generator.py                # Visualization & rendering
├── main_display.py                       # Command-line interface
├── main_GUI.py                           # Graphical user interface
├── grammar1.txt                          # Example: Arithmetic expressions
├── grammar2.txt                          # Example: Boolean logic
├── grammar3.txt                          # Example: Context-free language
├── sentences.txt                         # Test sentences for grammar1
├── sentences2.txt                        # Test sentences for grammar2
├── sentences3.txt                        # Test sentences for grammar3
├── execution_and_dependency_notes.txt    # Configuration reference
├── requirements.txt                      # Python package manifest
├── README.md                             # This file
└── output/                               # Generated parse trees (PDF/PNG)
    ├── grammar1_entry_1.pdf
    ├── gui_entry_1.png
    └── ...

Usage Guide

Command-Line Interface (CLI)

Execution:

python main_display.py

Workflow:

Parser loads grammar and sentences from configured files
Left-recursion detection runs (halts execution if cycles found)
Iterative parsing of each sentence
Valid sentences: parse tree printed to terminal + PDF generated
Invalid sentences: error diagnostics printed
Summary statistics displayed

Output Example:

__________ ANALYZING: grammar1.txt __________

----------------------------------------
Input: a + b * c
Valid
Parse tree:

E
├── T
│   └── F
│       └── a
├── +
└── T
    ├── F
    │   └── b
    ├── *
    └── F
        └── c

JSON:
{
    "E": {
        "T": {"F": "a"},
        "+": "+",
        "T": {
            "F": "b",
            "*": "*",
            "F": "c"
        }
    }
}

Parse tree saved to: grammar1_entry_1.pdf

----------------------------------------
grammar1.txt Results: 1/1 sentences are syntactically valid.

Graphical User Interface (GUI)

Execution:

python main_GUI.py

Step-by-Step Usage:

Load Grammar:
- Click "LOAD GRAMMAR" button
- Select a .txt file containing BNF grammar definition
- Confirmation appears in kernel log
Load Sentences:
- Click "LOAD SENTENCES" button
- Select a .txt file containing test sentences
- Confirmation appears in kernel log
Execute Engine:
- Click "EXECUTE ENGINE" button
- Progress bar animates during analysis
- Results populate three tabs:
Tab 1: PARSE TREE
- Text representation with proper indentation
- Toggle "GRAPHICAL TOPOLOGY VIEW" to switch to PNG/PDF rendering
- Individual entry numbering
Tab 2: JSON STRUCTURE
- Clean dictionary serialization without linguistic tags
- Pretty-printed with 4-space indentation
- Indexed entries for batch results
Tab 3: ANOMALY LOGS
- Invalid sentences with detailed error analysis
- Location identification (token index)
- Expected vs. actual token report
- Contextual explanation
Interpret Results:
- Green kernel log messages = successful operations
- Red messages with [!] = errors or warnings
- Sound feedback on operation completion
- Link to developer repository available

GUI Features:

Feature	Description
Dark Theme	Eye-friendly interface for extended use
Sound Effects	Audio feedback on success/error/completion
Progress Animation	Visual indication of processing time
Graphical Preview	Embedded topology rendering
Multi-threading	Responsive UI during long operations
Event Logging	Kernel-style timestamped messages

Grammar Specification

BNF Format

Standard Syntax:

<non-terminal> ::= production1 | production2 | ... | productionN
<multi-line>   ::= alternative1
                 | alternative2
                 | alternative3

Elements:

Element	Syntax	Example	Meaning
Non-terminal	`<identifier>`	`<Expression>`	Grammar rule identifier
Terminal (Token)	`identifier` or string	`id`, `+`, `*`	Literal token to match
Epsilon	`ε`	`<Optional> ::= item \| ε`	Empty/null production
Alternative	`\|`	`a \| b \| c`	Choice between productions
Grouping	Whitespace	`a b c`	Sequential composition

Example Grammars

Example 1: Simple Arithmetic Expressions

<Expr> ::= <Term> | <Expr> + <Term> | <Expr> - <Term>
<Term> ::= <Factor> | <Term> * <Factor> | <Term> / <Factor>
<Factor> ::= ( <Expr> ) | id | num

Example 2: Boolean Logic

<BoolExpr> ::= <Term> | <BoolExpr> OR <Term>
<Term> ::= <Factor> | <Term> AND <Factor>
<Factor> ::= NOT <Factor> | ( <BoolExpr> ) | true | false | id

Example 3: Context-Free Language (Balanced Parentheses)

<S> ::= ε | ( <S> ) <S>

Example 4: Simple Imperative Language

<Program> ::= <Statement> | <Program> <Statement>
<Statement> ::= <Assignment> | <IfStatement> | <WhileLoop>
<Assignment> ::= id = <Expr> ;
<IfStatement> ::= if ( <Expr> ) { <Program> } else { <Program> }
<WhileLoop> ::= while ( <Expr> ) { <Program> }
<Expr> ::= id | num | <Expr> + <Expr> | ( <Expr> )

File Format Requirements

Grammar File (grammar.txt):

<E> ::= <T> | <E> + <T> | <E> - <T>
<T> ::= <F> | <T> * <F> | <T> / <F>
<F> ::= ( <E> ) | id

# Comments are NOT supported (will cause parsing errors)
# Keep one rule per line or use continuation on next line
# Ensure balanced <> for non-terminals

Sentence File (sentences.txt):

a + b * c
id - num
( a )
a
(a+b)*c
ε

Parsing Rules:

Single tokens (no spaces): abc → [a, b, c]
Space-separated tokens: a + b → [a, +, b]
Epsilon token: ε → treated as empty production
Empty lines: skipped
File encoding: UTF-8 required

API Documentation

Module: `my_reader.py`

Class: `Reader`

class Reader:
    def __init__(self, grammar_file: str, sentences_file: str) -> None:
        """
        Initialize reader with file paths (relative to script location).
        
        Args:
            grammar_file (str): Relative path to BNF grammar file
            sentences_file (str): Relative path to sentences file
        
        Raises:
            None (graceful fallback on FileNotFoundError)
        """

    def read_bnf_grammar() -> Dict[str, List[List[str]]]:
        """
        Parse BNF grammar file and return production rules.
        
        Returns:
            Dict[str, List[List[str]]]: Grammar dictionary
                {
                    'E': [['T'], ['E', '+', 'T'], ['E', '-', 'T']],
                    'T': [['F'], ['T', '*', 'F']],
                    'F': [['(', 'E', ')'], ['id']]
                }
        
        Raises:
            FileNotFoundError: Returns {} (empty dict) if file not found
        
        Examples:
            >>> reader = Reader('grammar.txt', 'sentences.txt')
            >>> grammar = reader.read_bnf_grammar()
            >>> grammar['E']
            [['T'], ['E', '+', 'T']]
        """

    def read_sentences() -> List[List[str]]:
        """
        Parse sentences file and return tokenized input.
        
        Returns:
            List[List[str]]: Tokenized sentences
                [
                    ['a', '+', 'b'],
                    ['(', 'a', ')'],
                    ['ε']
                ]
        
        Raises:
            FileNotFoundError: Returns [] (empty list) if file not found
        
        Notes:
            - Single-token inputs (no spaces) are split character by character
            - Multi-token inputs (with spaces) are split by whitespace
            - Empty lines are skipped
            - Epsilon token 'ε' is preserved
        
        Examples:
            >>> sentences = reader.read_sentences()
            >>> sentences[0]
            ['a', '+', 'b']
        """

Module: `my_parser.py`

Class: `Parser`

class Parser:
    def __init__(self, grammar_file: str, sentences_file: str) -> None:
        """
        Initialize parser with grammar and sentences.
        
        Attributes:
            grammar (Dict[str, List[List[str]]]): Parsed BNF rules
            sentences (List[List[str]]): Tokenized input sentences
            tokens (List[str]): Current sentence token buffer
            current_index (int): Position in token buffer
            max_idx (int): Furthest successfully consumed position
            expected_at_max (List[str]): Expected tokens at failure point
        """

    def detect_left_recursion() -> List[List[str]]:
        """
        Detect left-recursive cycles in grammar using DFS.
        
        Returns:
            List[List[str]]: Cycles found (empty if no cycles)
            [
                ['E', 'T', 'E'],  # Cycle: E→T→E
                ['S', 'S']         # Direct recursion: S→S
            ]
        
        Complexity:
            Time: O(V + E) where V = non-terminals, E = edges
            Space: O(V) visited set + O(d) recursion depth
        
        Notes:
            - Runs before parsing to prevent stack overflow
            - Detects both direct and indirect left-recursion
            - Returns empty list if grammar is left-recursive-free
        
        Examples:
            >>> cycles = parser.detect_left_recursion()
            >>> if cycles:
            ...     print("Grammar has left-recursion! Aborting.")
        """

    def parsing_rules(symbol: str) -> Union[Dict, str, None]:
        """
        Recursive descent parser with backtracking.
        
        Args:
            symbol (str): Non-terminal or terminal to parse
        
        Returns:
            Union[Dict, str, None]:
                - Dict: Successful derivation tree
                - str: Terminal token or epsilon (ε)
                - None: Parse failure
        
        State modifications:
            - Updates current_index on successful terminal match
            - Updates max_idx and expected_at_max on mismatch
            - Backtracking via recursion stack (automatic)
        
        Complexity:
            Time: O(n × g × p) worst case backtracking
            Space: O(d) recursion stack depth
        
        Notes:
            - Productions sorted by length (longest first) for heuristic
            - Epsilon productions handled implicitly
            - Non-determinism resolved via production order
        
        Examples:
            >>> parser.tokens = ['a', '+', 'b']
            >>> parser.current_index = 0
            >>> result = parser.parsing_rules('E')
            >>> print(result)
            {'T': {'F': 'a'}, '+': '+', 'T': {'F': 'b'}}
        """

    def error_check(sentence: List[str], start_symbol: str) -> None:
        """
        Generate detailed error diagnostics for failed parse.
        
        Args:
            sentence (List[str]): Original input sentence
            start_symbol (str): Grammar start symbol
        
        Prints:
            - Token position and label at error location
            - Expected tokens at failure point
            - Contextual explanation (what preceded the error)
        
        Output format:
            Invalid
            
            Error:
            •Where the error occurs: at token 2 ("d")
            •What was expected: "b" or "c"
            •Why the sentence is invalid: after 'a', the grammar requires ...
        
        Examples:
            >>> parser.error_check(['a', 'd'], 'S')
            Invalid
            
            Error:
            •Where the error occurs: at token 2 ("d")
            ...
        """

Module: `parse_tree_generator.py`

def generate_parse_tree(derivation: Dict) -> str:
    """
    Generate text-based parse tree with proper indentation.
    
    Args:
        derivation (Dict): Tree structure from parser
        {
            'E': {
                'T': {'F': 'a'},
                '+': '+',
                'T': {'F': 'b'}
            }
        }
    
    Returns:
        str: Formatted tree representation
        E
        ├── T
        │   └── F
        │       └── a
        ├── +
        └── T
            └── F
                └── b
    
    Notes:
        - Unicode box-drawing characters for visual hierarchy
        - Proper indentation at each nesting level
        - Terminal values right-aligned with parent
    """

def generate_graphical_tree(derivation: Dict, entry_num: str, 
                           output_format: str = 'pdf', 
                           view: bool = False) -> str:
    """
    Generate high-resolution parse tree visualization using Graphviz.
    
    Args:
        derivation (Dict): Tree structure from parser
        entry_num (str): Identifier for output file (e.g., 'entry_1')
        output_format (str): 'pdf' or 'png' (default: 'pdf')
        view (bool): Open file in default viewer (default: False)
    
    Returns:
        str: Path to generated output file
        'output/entry_1.pdf'
    
    Raises:
        graphviz.ExecutableNotFound: Graphviz not in system PATH
        subprocess.CalledProcessError: Rendering failed
    
    Notes:
        - Requires Graphviz system software installed
        - Creates 'output/' directory if not present
        - High-resolution rendering suitable for publication
        - Color-coded node types (non-terminal vs. terminal)
    
    Examples:
        >>> path = generate_graphical_tree(result, 'entry_1', 'pdf')
        >>> print(f"Tree saved to: {path}")
        Tree saved to: output/entry_1.pdf
    """

Design Patterns & Solutions

Problem 1: Handling Non-Determinism

Problem: Ambiguous grammars generate multiple valid parse trees. Standard recursive descent can get stuck in exponential backtracking.

Solution: Production Heuristic Ordering

# Sort productions by length (longest first)
sorted_productions = sorted(grammar[symbol], key=len, reverse=True)

for production in sorted_productions:
    # Try longest productions first
    # Reduces backtracking for greedy matches

Rationale:

Longest productions typically bind more tightly
Reduces failed branch exploration
Implements leftmost-longest parsing strategy
Approximates precedence without explicit precedence rules

Problem 2: Left-Recursion in Recursive Descent

Problem: Recursive descent cannot handle left-recursive grammars:

E → E + T | T    # E appears as first symbol in production

This causes infinite recursion: E → E → E → ...

Solution: Static DFS Cycle Detection

def detect_left_recursion():
    # Build dependency graph of non-terminals
    dependencies = {symbol: {first_symbols_of_productions}}
    
    # Run DFS from each non-terminal
    # If we revisit a node in current path → cycle detected
    # Abort parser before attempting problematic derivation

Workaround for Left-Recursive Grammars:

If your grammar has left-recursion, convert it to equivalent right-recursive form:

Original (left-recursive):
E → E + T | T

Converted (right-recursive):
E → T E'
E' → + T E' | ε

Problem 3: Precise Error Diagnostics

Problem: Generic "Syntax Error" messages don't help users debug invalid sentences.

Solution: Error Context Tracking

# Track maximum progress point
self.max_idx = 0
self.expected_at_max = []

# On each mismatch
if current_index >= max_idx:
    max_idx = current_index
    expected_at_max = expected_tokens

# Error report includes:
# 1. WHERE: Exact token position
# 2. WHAT: Expected tokens at that position
# 3. WHY: Context-aware explanation

Problem 4: Backtracking State Management

Problem: Manual backtracking leads to complex state restoration bugs.

Solution: Recursion Stack Semantics

# Save state before trying production
start_pos = current_index

# Recursively parse production
for part in production:
    result = parsing_rules(part)  # Updates current_index
    if result is None:
        break  # Failed
    
# Backtrack by restoring saved position
current_index = start_pos

Problem 5: Visual Tree Generation

Problem: Parse trees are complex nested structures difficult to visualize as text or images.

Solution: Graphviz Integration

# Convert derivation dictionary to DOT format
def derivation_to_dot(tree, parent_id=0):
    # Each node gets unique ID
    # Create DOT edge for each parent-child relationship
    # Render to PDF/PNG using Graphviz executable

Performance Analysis

Empirical Complexity

Scenario	Complexity	Notes
Successful Parse (LL(1))	O(n)	Linear scan, no backtracking
With Backtracking	O(n × p^g) worst case	Exponential in worst case (ambiguous grammars)
Left-Recursion Detection	O(V + E)	DFS on dependency graph
Parse Tree Rendering	O(V)	V = tree nodes, linear traversal
Graphviz Rendering	O(n log n)	Neato layout algorithm

Optimization Strategies

Memoization (Future Enhancement):

# Cache successful (symbol, position) pairs
memo[(symbol, current_index)] = result
# Skip re-parsing identical subproblems
# Would reduce worst-case exponential to polynomial

Early Termination:

# Stop after first valid parse (if ambiguity allowed)
if match_all:
    return children  # Don't try remaining productions

Grammar Optimization:
- Factor out common prefixes
- Eliminate redundant rules
- Convert to Greibach Normal Form if needed
Production Ordering:
- Longest-first heuristic reduces backtracking
- Profile typical inputs to order frequently-used productions first

Troubleshooting

Installation Issues

Error: `ModuleNotFoundError: No module named 'customtkinter'`

Solution:

pip install --upgrade customtkinter
# OR
pip install customtkinter graphviz Pillow

Error: `graphviz.ExecutableNotFound: "dot" not found`

Windows:

Download Graphviz installer
Run installer as Administrator
Critical: Check "Add Graphviz to system PATH"
Restart command prompt
Verify: dot -V

macOS/Linux:

# macOS (Homebrew)
brew install graphviz

# Linux (Ubuntu/Debian)
sudo apt-get install graphviz

# Verify
dot -V

Runtime Errors

Error: `FileNotFoundError: [Errno 2] No such file or directory: 'grammar.txt'`

Solution:

# Ensure files are in same directory as script
# OR provide absolute paths:
Reader('/absolute/path/to/grammar.txt', '/absolute/path/to/sentences.txt')

Error: `SyntaxError` when parsing grammar file

Causes:

Missing ::= separator
Unbalanced <> in non-terminal names
Extra whitespace in unexpected places

Solution:

Correct:    <E> ::= <T> | <E> + <T>
Incorrect:  E ::= T | E + T           # Missing <>
Incorrect:  <E> => <T> | <E> + <T>    # Wrong operator (=>)

Error: `RecursionError: maximum recursion depth exceeded`

Causes:

Left-recursion in grammar (should be caught by detection)
Grammar with epsilon cycles creating infinite derivations

Solution:

# Increase recursion limit (temporary)
import sys
sys.setrecursionlimit(10000)

# Better: Fix the grammar
# Convert left-recursive rules to right-recursive form

Parsing Issues

Problem: All sentences marked "Invalid" despite correct input

Diagnostic:

# Check grammar parsing
parser = Parser('grammar.txt', 'sentences.txt')
print(parser.grammar)  # Verify structure
print(parser.sentences)  # Verify tokenization

Common Causes:

Start symbol mismatch: Grammar starts with <S> but parser assumes first non-terminal

Solution: Explicitly set start symbol
```
start_symbol = 'E'  # instead of assuming
result = parser.parsing_rules(start_symbol)
```
Tokenization mismatch: Input a + b tokenized as ['a', '+', 'b'] but grammar expects ['a+b']

Solution: Consistent tokenization in both grammar and sentence files
Non-terminal/terminal confusion: <id> in grammar but id in sentences

Solution: Grammar terminals should NOT have angle brackets

Problem: GUI shows "HALTED: Input/Context parameters required"

Solution:

Click "LOAD GRAMMAR" and select a .txt file
Click "LOAD SENTENCES" and select a .txt file
Verify buttons show green checkmark text
Click "EXECUTE ENGINE"

Problem: Graphical trees not rendering

Cause: Graphviz not in PATH or parse failed

Solution:

# Verify Graphviz
dot -V

# If failing:
# Windows: Reinstall with PATH option
# macOS: brew install graphviz
# Linux: sudo apt-get install graphviz

# Check temp directory has write permissions

Performance Issues

Problem: Parsing takes excessive time

Likely Cause: Exponential backtracking on ambiguous grammar

Optimization:

Check for unintended ambiguity in grammar
Add left-factoring to reduce branching

Profile parser to identify hot paths:

import cProfile
cProfile.run('parser.parsing_rules(start_symbol)')

Contributing

Development Guidelines

Code Style:
- Follow PEP 8
- Use type hints for new functions
- Document with docstrings
Testing:
- Add test grammars to test_grammars/ directory
- Ensure new features don't break existing tests
- Test edge cases (epsilon, left-recursion, etc.)
Commits:
- Clear commit messages
- Reference issues where applicable
- Keep atomic, focused changes

Future Enhancements

Memoization for polynomial-time ambiguous parsing
LL/LR grammar classification
Grammar optimization and normalization (CNF, GNF)
Support for regular expressions in terminals
Parallel batch processing
Web-based IDE interface
Export parse trees to LaTeX/TikZ

License

This project is licensed under the MIT License - see LICENSE file for details.

Academic References

Theoretical Foundation

Hopcroft, Motwani, Ullman (2006). Introduction to Automata Theory, Languages, and Computation. Pearson.
- Formal definition of context-free grammars
- Parsing algorithms
Aho, Lam, Sethi, Ullman (2006). Compilers: Principles, Techniques, and Tools (2nd ed.). Pearson.
- Top-down parsing (recursive descent)
- Left-recursion handling
- Error recovery
Grune, Jacobs (2008). Parsing Techniques: A Practical Guide (2nd ed.). Springer.
- Comprehensive parsing reference
- Backtracking strategies

Implementation References

Python Graphviz: https://graphviz.readthedocs.io/
customtkinter Documentation: https://github.com/TomSchimansky/CustomTkinter
Graphviz Language: https://graphviz.org/doc/info/lang.html

Citation

If you use this project in academic work, please cite:

@software{syntax_engine_2025,
  author = {DioBey7},
  title = {Syntax Engine: {CFG} Parser \& Analyzer},
  year = {2025},
  url = {https://github.com/DioBey7/Syntax-Engine-CFG-PARSER-ANALYZER}
}

Author

DioBey7
GitHub: @DioBey7
Repository: Syntax-Engine-CFG-PARSER-ANALYZER

Acknowledgments

Inspiration from classical compiler construction course materials
Graphviz team for powerful graph visualization
CustomTkinter maintainers for modern Python GUI framework
Academic community for formal language theory foundations

Last Updated: 2026-05-10
Version: 1.0 (Academic Build 443728)

Folders and files

Latest commit

History

Repository files navigation

Syntax Engine: CFG Parser & Analyzer

📋 Table of Contents

Overview

Key Differentiators

Use Cases

Core Features

🎯 Parsing & Recognition

🛡️ Correctness & Safety

📊 Visualization & Analysis

🔍 Diagnostics

Technical Architecture

System Design Overview

Module Responsibilities

my_reader.py — Grammar & Input Processing

my_parser.py — Core Parsing Engine

parse_tree_generator.py — Visualization Engine

main_display.py — Command Line Interface

main_GUI.py — Graphical User Interface

Algorithm Specifications

1. Recursive Descent Parser

2. Left-Recursion Detection

3. Error Diagnostic System

System Requirements

Software Prerequisites

Python Package Dependencies

Installation Instructions

Step 1: Python Environment Setup

Step 2: Install Python Packages

Step 3: Install Graphviz System Software

Step 4: Verify Installation

Installation & Setup

Quick Start

Project Structure

Usage Guide

Command-Line Interface (CLI)

Graphical User Interface (GUI)

Grammar Specification

BNF Format

Example Grammars

File Format Requirements

API Documentation

Module: my_reader.py

Class: Reader

Module: my_parser.py

Class: Parser

Module: parse_tree_generator.py

Design Patterns & Solutions

Problem 1: Handling Non-Determinism

Problem 2: Left-Recursion in Recursive Descent

Problem 3: Precise Error Diagnostics

Problem 4: Backtracking State Management

Problem 5: Visual Tree Generation

Performance Analysis

Empirical Complexity

Optimization Strategies

Troubleshooting

Installation Issues

Error: ModuleNotFoundError: No module named 'customtkinter'

Error: graphviz.ExecutableNotFound: "dot" not found

Runtime Errors

Error: FileNotFoundError: [Errno 2] No such file or directory: 'grammar.txt'

Error: SyntaxError when parsing grammar file

Error: RecursionError: maximum recursion depth exceeded

Parsing Issues

Problem: All sentences marked "Invalid" despite correct input

Problem: GUI shows "HALTED: Input/Context parameters required"

Problem: Graphical trees not rendering

Performance Issues

Problem: Parsing takes excessive time

Contributing

Development Guidelines

Future Enhancements

License

Academic References

Theoretical Foundation

Implementation References

Module: `my_reader.py`

Class: `Reader`

Module: `my_parser.py`

Class: `Parser`

Module: `parse_tree_generator.py`

Error: `ModuleNotFoundError: No module named 'customtkinter'`

Error: `graphviz.ExecutableNotFound: "dot" not found`

Error: `FileNotFoundError: [Errno 2] No such file or directory: 'grammar.txt'`

Error: `SyntaxError` when parsing grammar file

Error: `RecursionError: maximum recursion depth exceeded`

Packages