A production-grade Context-Free Grammar (CFG) parser implementing recursive descent with static left-recursion detection, intelligent backtracking, and automated visual topology generation.
- Overview
- Core Features
- Technical Architecture
- Algorithm Specifications
- System Requirements
- Installation & Setup
- Usage Guide
- Grammar Specification
- API Documentation
- Design Patterns & Solutions
- Performance Analysis
- Troubleshooting
- Contributing
Syntax Engine is a zero-dependency, library-agnostic CFG parser framework designed for academic research, compiler construction, and formal language analysis. It implements a top-down recursive descent parser with sophisticated mechanisms for handling ambiguity, detecting structural anomalies, and generating publishable parse tree visualizations.
- No external parser generators: Built from first principles without NLTK, Lark, PLY, or similar frameworks
- Static cycle detection: Eliminates runtime stack overflow risks through pre-execution DFS analysis
- Contextual diagnostics: Provides "Where, What, Why" error reporting with token-level precision
- Dual interfaces: CLI for batch processing and GUI for interactive analysis
- Production-ready: Comprehensive error handling, logging, and resource management
- Compiler course projects: Direct BNF grammar parsing without external tools
- Formal language research: Experimental grammar analysis and derivation studies
- Syntax validation: Batch sentence validation against arbitrary CFGs
- Educational visualization: Parse tree generation for teaching formal languages
- Grammar testing: Rapid iteration on grammar design with visual feedback
| Feature | Description |
|---|---|
| Recursive Descent Parser | Deterministic top-down parsing with backtracking support for non-deterministic grammars |
| BNF Grammar Support | Accepts standard Backus-Naur Form with multi-line productions and epsilon (ε) rules |
| Epsilon Production Handling | Correct treatment of null derivations in alternative branches |
| Production Sorting | Length-based heuristic ordering for efficient deterministic path selection |
| Feature | Description |
|---|---|
| Left-Recursion Detection | Static DFS-based cycle detection preventing infinite loops |
| Token Index Management | Precise state restoration across backtracking branches |
| Bounds Checking | EOF (End-of-File) protection with graceful error reporting |
| Grammar Validation | Verification of terminal/non-terminal consistency |
| Feature | Description |
|---|---|
| Parse Tree Generation | Text-based tree visualization with proper indentation and nesting |
| Graphical Topology Rendering | High-resolution PDF/PNG generation via Graphviz integration |
| JSON Serialization | Clean dictionary-based derivation export with linguistic tag stripping |
| Interactive GUI | Multi-tab interface with real-time analysis and topology preview |
| Feature | Description |
|---|---|
| Semantic Error Reports | Contextual analysis identifying exact error location and expected tokens |
| Kernel Logging | Timestamped event tracking with error severity classification |
| Progress Indication | Visual feedback during batch processing with animate progress bars |
| Entry-level Granularity | Per-sentence analysis with indexed result correlation |
┌─────────────────────────────────────────────────────────┐
│ User Interfaces │
├──────────────────────────┬──────────────────────────────┤
│ CLI (main_display.py) │ GUI (main_GUI.py) │
│ - Batch processing │ - Interactive analysis │
│ - Terminal output │ - Visual topology preview │
│ - PDF generation │ - Real-time feedback │
└──────────────────────────┴──────────────────────────────┘
▼
┌─────────────────────────────────────────────────────────┐
│ Parser Kernel (my_parser.py) │
├─────────────────────────────────────────────────────────┤
│ • Recursive descent implementation │
│ • Left-recursion detection (DFS) │
│ • Backtracking state management │
│ • Error diagnostics engine │
└─────────────────────────────────────────────────────────┘
▼
┌──────────────────────────┬──────────────────────────────┐
│ Reader (my_reader.py) │ Tree Generator │
│ │ (parse_tree_generator.py) │
│ • Grammar parsing │ • Parse tree rendering │
│ • Sentence tokenization │ • Graphviz integration │
│ • File I/O handling │ • Image processing │
└──────────────────────────┴──────────────────────────────┘
Handles all file I/O and data transformation operations.
Reader
├── __init__(grammar_file, sentences_file)
├── read_bnf_grammar() → Dict[str, List[List[str]]]
│ └── Parses BNF format with multi-line support
└── read_sentences() → List[List[str]]
└── Tokenizes sentences with epsilon handlingDesign decisions:
- Path resolution relative to script location for portability
- UTF-8 encoding for international character support
- Empty line skipping for flexible file formatting
- Fallback empty return on FileNotFoundError for graceful degradation
Implements the recursive descent parser with cycle detection.
Parser
├── __init__(grammar_file, sentences_file)
│ └── Initializes grammar, sentences, and parsing state
├── detect_left_recursion() → List[List[str]]
│ └── DFS-based cycle detection
├── parsing_rules(symbol: str) → Union[Dict, str, None]
│ └── Recursive descent with backtracking
└── error_check(sentence, start_symbol) → None
└── Diagnostic report generationState variables:
current_index: Current token positiontokens: Input token sequencemax_idx: Furthest successfully consumed positionexpected_at_max: Set of expected tokens at failure point
Converts derivation structures to human-readable formats.
Functions:
├── generate_parse_tree(derivation) → str
│ └── Text-based tree with ASCII art
└── generate_graphical_tree(derivation, entry_num, output_format) → str
└── Graphviz PDF/PNG renderingBatch processing with terminal output and PDF generation.
run_test(grammar_file, sentences_file) → None
├── Left-recursion check
├── Iterative sentence parsing
├── Result accumulation
└── Parse tree visualization (text + graphical)Multi-threaded interactive analysis with real-time feedback.
SyntaxEngine (customtkinter.CTk)
├── Sidebar controls (file loading, execution)
├── Tabbed output (parse tree, JSON, anomaly logs)
├── Kernel logging (timestamped event tracking)
├── Progress animation (visual feedback)
└── Topology preview (graphical tree display)Algorithm: parsing_rules(symbol)
function parsing_rules(symbol):
if symbol = "ε":
return "ε" // epsilon production matches empty string
if symbol ∉ grammar:
// Terminal symbol (lexeme)
if current_index < tokens.length AND tokens[current_index] = symbol:
value ← tokens[current_index]
current_index ← current_index + 1
return value
else:
// Track error location
if current_index ≥ max_idx:
max_idx ← current_index
expected_at_max ← {symbol}
return None // terminal mismatch
// Non-terminal symbol (recursion)
start_pos ← current_index
sorted_prods ← sort(grammar[symbol], by: length DESC)
for each production in sorted_prods:
current_index ← start_pos // reset on backtrack
children ← {}
match_all ← true
for each part in production:
result ← parsing_rules(part)
if result ≠ None:
children[part] ← result
else:
match_all ← false
break // try next production
if match_all:
return children // successful derivation
return None // all productions failed
Complexity Analysis:
- Time: O(n × g × p) in worst case
- n = input length
- g = grammar size (non-terminals)
- p = average productions per non-terminal
- Space: O(d) call stack, where d = maximum derivation depth
- Backtracking: Automatic state restoration via recursion stack
Key Properties:
- Deterministic on LL(1) grammars
- Handles ambiguous grammars via leftmost derivation preference
- Production ordering heuristic (longest-first) reduces backtracking
Algorithm: detect_left_recursion()
function detect_left_recursion():
// Build dependency graph
dependencies ← {}
for each (A → β₁ | β₂ | ... | βₙ) in grammar:
first_symbols ← {}
for each production β:
if β[0] ∈ grammar: // first symbol is non-terminal
first_symbols.add(β[0])
dependencies[A] ← first_symbols
// DFS-based cycle detection
cycles ← []
visited ← {}
path ← []
function dfs(node):
if node in path:
// Cycle detected
start ← path.indexOf(node)
cycles.append(path[start:] + [node])
return
if visited[node]:
return // already processed
path.append(node)
for neighbor in dependencies[node]:
dfs(neighbor)
path.pop() // backtrack
visited[node] ← true
for each non_terminal in grammar:
if not visited[non_terminal]:
dfs(non_terminal)
return cycles
Complexity Analysis:
- Time: O(V + E) where V = non-terminals, E = production relationships
- Space: O(V) for visited tracking + O(d) recursion stack
Correctness:
- Detects direct left-recursion: A → A α
- Detects indirect left-recursion: A → B → C → A
- Cycles involving epsilon are correctly identified as recursive
Error Location Algorithm:
Upon parse failure:
1. Identify failure point: max_idx (furthest consumed token)
2. Determine expected tokens: expected_at_max set
3. Generate contextual message:
if max_idx = 0:
"Sentence begins with '{token}', but grammar requires {expected}"
else:
"After '{prev_token}', grammar expects {expected}, found '{token}'"
| Component | Minimum | Recommended |
|---|---|---|
| Python | 3.8 | 3.10+ |
| Graphviz | 2.40 | 14.1.5+ |
| OS | Windows 10 / macOS 10.14 / Linux (Ubuntu 18.04+) | Windows 11 / macOS 12+ / Linux (Ubuntu 22.04+) |
customtkinter>=5.0 # Modern GUI framework
graphviz>=0.20 # Parse tree visualization
Pillow>=9.0 # Image processing
# Create isolated Python environment (recommended)
python -m venv syntax-engine-env
# Activate environment
# Windows:
syntax-engine-env\Scripts\activate
# macOS/Linux:
source syntax-engine-env/bin/activatepip install --upgrade pip
pip install customtkinter graphviz PillowWindows (32/64-bit):
- Download: Graphviz 14.1.5 Installer
- Run installer with Administrator privileges
- Critical: Check ✓ "Add Graphviz to the system PATH for all users" during installation
- Restart command prompt to reload environment variables
- Verify installation:
dot -V
macOS (Homebrew):
brew install graphviz
dot -V # Verify installationLinux (Ubuntu/Debian):
sudo apt-get update
sudo apt-get install graphviz
dot -V # Verify installationpython -c "import customtkinter, graphviz, PIL; print('All dependencies installed successfully!')"# 1. Clone repository
git clone https://github.com/DioBey7/Syntax-Engine-CFG-PARSER-ANALYZER.git
cd Syntax-Engine-CFG-PARSER-ANALYZER
# 2. Create and activate virtual environment
python -m venv venv
# Windows: venv\Scripts\activate
# macOS/Linux: source venv/bin/activate
# 3. Install dependencies
pip install customtkinter graphviz Pillow
# 4. Run GUI (recommended for first-time users)
python main_GUI.py
# OR run CLI for batch processing
python main_display.pySyntax-Engine-CFG-PARSER-ANALYZER/
├── my_reader.py # Grammar & input file processing
├── my_parser.py # Core parsing engine
├── parse_tree_generator.py # Visualization & rendering
├── main_display.py # Command-line interface
├── main_GUI.py # Graphical user interface
├── grammar1.txt # Example: Arithmetic expressions
├── grammar2.txt # Example: Boolean logic
├── grammar3.txt # Example: Context-free language
├── sentences.txt # Test sentences for grammar1
├── sentences2.txt # Test sentences for grammar2
├── sentences3.txt # Test sentences for grammar3
├── execution_and_dependency_notes.txt # Configuration reference
├── requirements.txt # Python package manifest
├── README.md # This file
└── output/ # Generated parse trees (PDF/PNG)
├── grammar1_entry_1.pdf
├── gui_entry_1.png
└── ...
Execution:
python main_display.pyWorkflow:
- Parser loads grammar and sentences from configured files
- Left-recursion detection runs (halts execution if cycles found)
- Iterative parsing of each sentence
- Valid sentences: parse tree printed to terminal + PDF generated
- Invalid sentences: error diagnostics printed
- Summary statistics displayed
Output Example:
__________ ANALYZING: grammar1.txt __________
----------------------------------------
Input: a + b * c
Valid
Parse tree:
E
├── T
│ └── F
│ └── a
├── +
└── T
├── F
│ └── b
├── *
└── F
└── c
JSON:
{
"E": {
"T": {"F": "a"},
"+": "+",
"T": {
"F": "b",
"*": "*",
"F": "c"
}
}
}
Parse tree saved to: grammar1_entry_1.pdf
----------------------------------------
grammar1.txt Results: 1/1 sentences are syntactically valid.
Execution:
python main_GUI.pyStep-by-Step Usage:
-
Load Grammar:
- Click "LOAD GRAMMAR" button
- Select a
.txtfile containing BNF grammar definition - Confirmation appears in kernel log
-
Load Sentences:
- Click "LOAD SENTENCES" button
- Select a
.txtfile containing test sentences - Confirmation appears in kernel log
-
Execute Engine:
- Click "EXECUTE ENGINE" button
- Progress bar animates during analysis
- Results populate three tabs:
Tab 1: PARSE TREE
- Text representation with proper indentation
- Toggle "GRAPHICAL TOPOLOGY VIEW" to switch to PNG/PDF rendering
- Individual entry numbering
Tab 2: JSON STRUCTURE
- Clean dictionary serialization without linguistic tags
- Pretty-printed with 4-space indentation
- Indexed entries for batch results
Tab 3: ANOMALY LOGS
- Invalid sentences with detailed error analysis
- Location identification (token index)
- Expected vs. actual token report
- Contextual explanation
-
Interpret Results:
- Green kernel log messages = successful operations
- Red messages with [!] = errors or warnings
- Sound feedback on operation completion
- Link to developer repository available
GUI Features:
| Feature | Description |
|---|---|
| Dark Theme | Eye-friendly interface for extended use |
| Sound Effects | Audio feedback on success/error/completion |
| Progress Animation | Visual indication of processing time |
| Graphical Preview | Embedded topology rendering |
| Multi-threading | Responsive UI during long operations |
| Event Logging | Kernel-style timestamped messages |
Standard Syntax:
<non-terminal> ::= production1 | production2 | ... | productionN
<multi-line> ::= alternative1
| alternative2
| alternative3
Elements:
| Element | Syntax | Example | Meaning |
|---|---|---|---|
| Non-terminal | <identifier> |
<Expression> |
Grammar rule identifier |
| Terminal (Token) | identifier or string |
id, +, * |
Literal token to match |
| Epsilon | ε |
<Optional> ::= item | ε |
Empty/null production |
| Alternative | | |
a | b | c |
Choice between productions |
| Grouping | Whitespace | a b c |
Sequential composition |
Example 1: Simple Arithmetic Expressions
<Expr> ::= <Term> | <Expr> + <Term> | <Expr> - <Term>
<Term> ::= <Factor> | <Term> * <Factor> | <Term> / <Factor>
<Factor> ::= ( <Expr> ) | id | num
Example 2: Boolean Logic
<BoolExpr> ::= <Term> | <BoolExpr> OR <Term>
<Term> ::= <Factor> | <Term> AND <Factor>
<Factor> ::= NOT <Factor> | ( <BoolExpr> ) | true | false | id
Example 3: Context-Free Language (Balanced Parentheses)
<S> ::= ε | ( <S> ) <S>
Example 4: Simple Imperative Language
<Program> ::= <Statement> | <Program> <Statement>
<Statement> ::= <Assignment> | <IfStatement> | <WhileLoop>
<Assignment> ::= id = <Expr> ;
<IfStatement> ::= if ( <Expr> ) { <Program> } else { <Program> }
<WhileLoop> ::= while ( <Expr> ) { <Program> }
<Expr> ::= id | num | <Expr> + <Expr> | ( <Expr> )
Grammar File (grammar.txt):
<E> ::= <T> | <E> + <T> | <E> - <T>
<T> ::= <F> | <T> * <F> | <T> / <F>
<F> ::= ( <E> ) | id
# Comments are NOT supported (will cause parsing errors)
# Keep one rule per line or use continuation on next line
# Ensure balanced <> for non-terminals
Sentence File (sentences.txt):
a + b * c
id - num
( a )
a
(a+b)*c
ε
Parsing Rules:
- Single tokens (no spaces):
abc→[a, b, c] - Space-separated tokens:
a + b→[a, +, b] - Epsilon token:
ε→ treated as empty production - Empty lines: skipped
- File encoding: UTF-8 required
class Reader:
def __init__(self, grammar_file: str, sentences_file: str) -> None:
"""
Initialize reader with file paths (relative to script location).
Args:
grammar_file (str): Relative path to BNF grammar file
sentences_file (str): Relative path to sentences file
Raises:
None (graceful fallback on FileNotFoundError)
""" def read_bnf_grammar() -> Dict[str, List[List[str]]]:
"""
Parse BNF grammar file and return production rules.
Returns:
Dict[str, List[List[str]]]: Grammar dictionary
{
'E': [['T'], ['E', '+', 'T'], ['E', '-', 'T']],
'T': [['F'], ['T', '*', 'F']],
'F': [['(', 'E', ')'], ['id']]
}
Raises:
FileNotFoundError: Returns {} (empty dict) if file not found
Examples:
>>> reader = Reader('grammar.txt', 'sentences.txt')
>>> grammar = reader.read_bnf_grammar()
>>> grammar['E']
[['T'], ['E', '+', 'T']]
""" def read_sentences() -> List[List[str]]:
"""
Parse sentences file and return tokenized input.
Returns:
List[List[str]]: Tokenized sentences
[
['a', '+', 'b'],
['(', 'a', ')'],
['ε']
]
Raises:
FileNotFoundError: Returns [] (empty list) if file not found
Notes:
- Single-token inputs (no spaces) are split character by character
- Multi-token inputs (with spaces) are split by whitespace
- Empty lines are skipped
- Epsilon token 'ε' is preserved
Examples:
>>> sentences = reader.read_sentences()
>>> sentences[0]
['a', '+', 'b']
"""class Parser:
def __init__(self, grammar_file: str, sentences_file: str) -> None:
"""
Initialize parser with grammar and sentences.
Attributes:
grammar (Dict[str, List[List[str]]]): Parsed BNF rules
sentences (List[List[str]]): Tokenized input sentences
tokens (List[str]): Current sentence token buffer
current_index (int): Position in token buffer
max_idx (int): Furthest successfully consumed position
expected_at_max (List[str]): Expected tokens at failure point
""" def detect_left_recursion() -> List[List[str]]:
"""
Detect left-recursive cycles in grammar using DFS.
Returns:
List[List[str]]: Cycles found (empty if no cycles)
[
['E', 'T', 'E'], # Cycle: E→T→E
['S', 'S'] # Direct recursion: S→S
]
Complexity:
Time: O(V + E) where V = non-terminals, E = edges
Space: O(V) visited set + O(d) recursion depth
Notes:
- Runs before parsing to prevent stack overflow
- Detects both direct and indirect left-recursion
- Returns empty list if grammar is left-recursive-free
Examples:
>>> cycles = parser.detect_left_recursion()
>>> if cycles:
... print("Grammar has left-recursion! Aborting.")
""" def parsing_rules(symbol: str) -> Union[Dict, str, None]:
"""
Recursive descent parser with backtracking.
Args:
symbol (str): Non-terminal or terminal to parse
Returns:
Union[Dict, str, None]:
- Dict: Successful derivation tree
- str: Terminal token or epsilon (ε)
- None: Parse failure
State modifications:
- Updates current_index on successful terminal match
- Updates max_idx and expected_at_max on mismatch
- Backtracking via recursion stack (automatic)
Complexity:
Time: O(n × g × p) worst case backtracking
Space: O(d) recursion stack depth
Notes:
- Productions sorted by length (longest first) for heuristic
- Epsilon productions handled implicitly
- Non-determinism resolved via production order
Examples:
>>> parser.tokens = ['a', '+', 'b']
>>> parser.current_index = 0
>>> result = parser.parsing_rules('E')
>>> print(result)
{'T': {'F': 'a'}, '+': '+', 'T': {'F': 'b'}}
""" def error_check(sentence: List[str], start_symbol: str) -> None:
"""
Generate detailed error diagnostics for failed parse.
Args:
sentence (List[str]): Original input sentence
start_symbol (str): Grammar start symbol
Prints:
- Token position and label at error location
- Expected tokens at failure point
- Contextual explanation (what preceded the error)
Output format:
Invalid
Error:
•Where the error occurs: at token 2 ("d")
•What was expected: "b" or "c"
•Why the sentence is invalid: after 'a', the grammar requires ...
Examples:
>>> parser.error_check(['a', 'd'], 'S')
Invalid
Error:
•Where the error occurs: at token 2 ("d")
...
"""def generate_parse_tree(derivation: Dict) -> str:
"""
Generate text-based parse tree with proper indentation.
Args:
derivation (Dict): Tree structure from parser
{
'E': {
'T': {'F': 'a'},
'+': '+',
'T': {'F': 'b'}
}
}
Returns:
str: Formatted tree representation
E
├── T
│ └── F
│ └── a
├── +
└── T
└── F
└── b
Notes:
- Unicode box-drawing characters for visual hierarchy
- Proper indentation at each nesting level
- Terminal values right-aligned with parent
"""def generate_graphical_tree(derivation: Dict, entry_num: str,
output_format: str = 'pdf',
view: bool = False) -> str:
"""
Generate high-resolution parse tree visualization using Graphviz.
Args:
derivation (Dict): Tree structure from parser
entry_num (str): Identifier for output file (e.g., 'entry_1')
output_format (str): 'pdf' or 'png' (default: 'pdf')
view (bool): Open file in default viewer (default: False)
Returns:
str: Path to generated output file
'output/entry_1.pdf'
Raises:
graphviz.ExecutableNotFound: Graphviz not in system PATH
subprocess.CalledProcessError: Rendering failed
Notes:
- Requires Graphviz system software installed
- Creates 'output/' directory if not present
- High-resolution rendering suitable for publication
- Color-coded node types (non-terminal vs. terminal)
Examples:
>>> path = generate_graphical_tree(result, 'entry_1', 'pdf')
>>> print(f"Tree saved to: {path}")
Tree saved to: output/entry_1.pdf
"""Problem: Ambiguous grammars generate multiple valid parse trees. Standard recursive descent can get stuck in exponential backtracking.
Solution: Production Heuristic Ordering
# Sort productions by length (longest first)
sorted_productions = sorted(grammar[symbol], key=len, reverse=True)
for production in sorted_productions:
# Try longest productions first
# Reduces backtracking for greedy matchesRationale:
- Longest productions typically bind more tightly
- Reduces failed branch exploration
- Implements leftmost-longest parsing strategy
- Approximates precedence without explicit precedence rules
Problem: Recursive descent cannot handle left-recursive grammars:
E → E + T | T # E appears as first symbol in production
This causes infinite recursion: E → E → E → ...
Solution: Static DFS Cycle Detection
def detect_left_recursion():
# Build dependency graph of non-terminals
dependencies = {symbol: {first_symbols_of_productions}}
# Run DFS from each non-terminal
# If we revisit a node in current path → cycle detected
# Abort parser before attempting problematic derivationWorkaround for Left-Recursive Grammars:
If your grammar has left-recursion, convert it to equivalent right-recursive form:
Original (left-recursive):
E → E + T | T
Converted (right-recursive):
E → T E'
E' → + T E' | ε
Problem: Generic "Syntax Error" messages don't help users debug invalid sentences.
Solution: Error Context Tracking
# Track maximum progress point
self.max_idx = 0
self.expected_at_max = []
# On each mismatch
if current_index >= max_idx:
max_idx = current_index
expected_at_max = expected_tokens
# Error report includes:
# 1. WHERE: Exact token position
# 2. WHAT: Expected tokens at that position
# 3. WHY: Context-aware explanationProblem: Manual backtracking leads to complex state restoration bugs.
Solution: Recursion Stack Semantics
# Save state before trying production
start_pos = current_index
# Recursively parse production
for part in production:
result = parsing_rules(part) # Updates current_index
if result is None:
break # Failed
# Backtrack by restoring saved position
current_index = start_posProblem: Parse trees are complex nested structures difficult to visualize as text or images.
Solution: Graphviz Integration
# Convert derivation dictionary to DOT format
def derivation_to_dot(tree, parent_id=0):
# Each node gets unique ID
# Create DOT edge for each parent-child relationship
# Render to PDF/PNG using Graphviz executable| Scenario | Complexity | Notes |
|---|---|---|
| Successful Parse (LL(1)) | O(n) | Linear scan, no backtracking |
| With Backtracking | O(n × p^g) worst case | Exponential in worst case (ambiguous grammars) |
| Left-Recursion Detection | O(V + E) | DFS on dependency graph |
| Parse Tree Rendering | O(V) | V = tree nodes, linear traversal |
| Graphviz Rendering | O(n log n) | Neato layout algorithm |
-
Memoization (Future Enhancement):
# Cache successful (symbol, position) pairs memo[(symbol, current_index)] = result # Skip re-parsing identical subproblems # Would reduce worst-case exponential to polynomial
-
Early Termination:
# Stop after first valid parse (if ambiguity allowed) if match_all: return children # Don't try remaining productions
-
Grammar Optimization:
- Factor out common prefixes
- Eliminate redundant rules
- Convert to Greibach Normal Form if needed
-
Production Ordering:
- Longest-first heuristic reduces backtracking
- Profile typical inputs to order frequently-used productions first
Solution:
pip install --upgrade customtkinter
# OR
pip install customtkinter graphviz PillowWindows:
- Download Graphviz installer
- Run installer as Administrator
- Critical: Check "Add Graphviz to system PATH"
- Restart command prompt
- Verify:
dot -V
macOS/Linux:
# macOS (Homebrew)
brew install graphviz
# Linux (Ubuntu/Debian)
sudo apt-get install graphviz
# Verify
dot -VSolution:
# Ensure files are in same directory as script
# OR provide absolute paths:
Reader('/absolute/path/to/grammar.txt', '/absolute/path/to/sentences.txt')Causes:
- Missing
::=separator - Unbalanced
<>in non-terminal names - Extra whitespace in unexpected places
Solution:
Correct: <E> ::= <T> | <E> + <T>
Incorrect: E ::= T | E + T # Missing <>
Incorrect: <E> => <T> | <E> + <T> # Wrong operator (=>)
Causes:
- Left-recursion in grammar (should be caught by detection)
- Grammar with epsilon cycles creating infinite derivations
Solution:
# Increase recursion limit (temporary)
import sys
sys.setrecursionlimit(10000)
# Better: Fix the grammar
# Convert left-recursive rules to right-recursive formDiagnostic:
# Check grammar parsing
parser = Parser('grammar.txt', 'sentences.txt')
print(parser.grammar) # Verify structure
print(parser.sentences) # Verify tokenizationCommon Causes:
-
Start symbol mismatch: Grammar starts with
<S>but parser assumes first non-terminalSolution: Explicitly set start symbol
start_symbol = 'E' # instead of assuming result = parser.parsing_rules(start_symbol)
-
Tokenization mismatch: Input
a + btokenized as['a', '+', 'b']but grammar expects['a+b']Solution: Consistent tokenization in both grammar and sentence files
-
Non-terminal/terminal confusion:
<id>in grammar butidin sentencesSolution: Grammar terminals should NOT have angle brackets
Solution:
- Click "LOAD GRAMMAR" and select a
.txtfile - Click "LOAD SENTENCES" and select a
.txtfile - Verify buttons show green checkmark text
- Click "EXECUTE ENGINE"
Cause: Graphviz not in PATH or parse failed
Solution:
# Verify Graphviz
dot -V
# If failing:
# Windows: Reinstall with PATH option
# macOS: brew install graphviz
# Linux: sudo apt-get install graphviz
# Check temp directory has write permissionsLikely Cause: Exponential backtracking on ambiguous grammar
Optimization:
- Check for unintended ambiguity in grammar
- Add left-factoring to reduce branching
- Profile parser to identify hot paths:
import cProfile cProfile.run('parser.parsing_rules(start_symbol)')
-
Code Style:
- Follow PEP 8
- Use type hints for new functions
- Document with docstrings
-
Testing:
- Add test grammars to
test_grammars/directory - Ensure new features don't break existing tests
- Test edge cases (epsilon, left-recursion, etc.)
- Add test grammars to
-
Commits:
- Clear commit messages
- Reference issues where applicable
- Keep atomic, focused changes
- Memoization for polynomial-time ambiguous parsing
- LL/LR grammar classification
- Grammar optimization and normalization (CNF, GNF)
- Support for regular expressions in terminals
- Parallel batch processing
- Web-based IDE interface
- Export parse trees to LaTeX/TikZ
This project is licensed under the MIT License - see LICENSE file for details.
-
Hopcroft, Motwani, Ullman (2006). Introduction to Automata Theory, Languages, and Computation. Pearson.
- Formal definition of context-free grammars
- Parsing algorithms
-
Aho, Lam, Sethi, Ullman (2006). Compilers: Principles, Techniques, and Tools (2nd ed.). Pearson.
- Top-down parsing (recursive descent)
- Left-recursion handling
- Error recovery
-
Grune, Jacobs (2008). Parsing Techniques: A Practical Guide (2nd ed.). Springer.
- Comprehensive parsing reference
- Backtracking strategies
- Python Graphviz: https://graphviz.readthedocs.io/
- customtkinter Documentation: https://github.com/TomSchimansky/CustomTkinter
- Graphviz Language: https://graphviz.org/doc/info/lang.html
If you use this project in academic work, please cite:
@software{syntax_engine_2025,
author = {DioBey7},
title = {Syntax Engine: {CFG} Parser \& Analyzer},
year = {2025},
url = {https://github.com/DioBey7/Syntax-Engine-CFG-PARSER-ANALYZER}
}DioBey7
GitHub: @DioBey7
Repository: Syntax-Engine-CFG-PARSER-ANALYZER
- Inspiration from classical compiler construction course materials
- Graphviz team for powerful graph visualization
- CustomTkinter maintainers for modern Python GUI framework
- Academic community for formal language theory foundations
Last Updated: 2026-05-10
Version: 1.0 (Academic Build 443728)