Skip to content

Commit f865b17

Browse files
author
Jaco Labuschagne
committed
Add initial project structure with Go modules, parser, and splitter functionality
- Created .gitignore to exclude binaries and IDE files. - Initialized go.mod and go.sum for dependency management. - Added Product Requirements Document (PRD) outlining project goals and features. - Implemented README.md with project overview, installation instructions, and usage examples. - Developed core splitter functionality in pkg/splitter, including statement models and error handling. - Integrated ANTLR4 for PL/SQL parsing with generated lexer and parser files. - Added test cases for splitter functionality and sample SQL scripts for validation. - Included scripts for generating ANTLR parser files and handling syntax errors. - Established directory structure for internal parser and generated files.
0 parents  commit f865b17

29 files changed

Lines changed: 435280 additions & 0 deletions

.gitignore

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Binaries for programs and plugins
2+
*.exe
3+
*.exe~
4+
*.dll
5+
*.so
6+
*.dylib
7+
8+
# Test binary, built with `go test -c`
9+
*.test
10+
11+
# Output of the go coverage tool, specifically when used with LiteIDE
12+
*.out
13+
14+
# Go workspace file
15+
go.work
16+
17+
# Generated ANTLR files
18+
pkg/parser/gen/
19+
20+
# IDE specific files
21+
.idea/
22+
.vscode/
23+
*.swp
24+
*.swo
25+
26+
# OS specific files
27+
.DS_Store
28+
Thumbs.db
29+
30+
/.cursor
31+
/.memory-bank

PRD.md

Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
# Product Requirements Document: go-plsql-splitter
2+
3+
## 1. Overview and Objectives
4+
5+
The go-plsql-splitter is a Go library designed to accurately split Oracle PL/SQL scripts into individual statements with precise boundary detection. Using ANTLR4 for parsing (with no regex allowed), it aims to provide developers with a reliable tool for extracting SQL statements that can be validated and executed in deployment pipelines.
6+
7+
### Key Objectives:
8+
- Achieve 100% accurate PL/SQL statement boundary detection
9+
- Provide precise source location tracking for each statement
10+
- Deliver high performance for processing large SQL scripts
11+
- Supply detailed error information for syntax issues
12+
13+
## 2. Target Audience
14+
15+
- Software developers building Oracle database deployment tools
16+
- DevOps engineers implementing CI/CD pipelines for database changes
17+
- Database administrators automating SQL script validation
18+
- Anyone building tools that require precise PL/SQL statement extraction
19+
20+
## 3. Core Features and Functionality
21+
22+
### 3.1 Statement Splitting
23+
- Split PL/SQL scripts into individual statements with 100% boundary accuracy
24+
- Support all Oracle 19c PL/SQL statement types
25+
- Preserve original statement content exactly as it appears in source
26+
27+
### 3.2 Position Tracking
28+
- Track line and column numbers for the start and end of each statement
29+
- Maintain context information for error reporting
30+
31+
### 3.3 Comment Handling
32+
- Properly handle both single-line (--) and multi-line (/* */) comments
33+
- Preserve comments in output when they're part of a statement
34+
35+
### 3.4 Input Processing
36+
- Process input from file paths
37+
- Process input from string content
38+
- Consider streaming implementation for large files
39+
40+
### 3.5 Error Reporting
41+
- Provide detailed syntax error messages
42+
- Include file, line, and column information in error reports
43+
- Offer context about the statement where the error occurred
44+
45+
### 3.6 Serialization Support
46+
- All output structures must support JSON marshalling
47+
- Consistent field naming for easy integration with other tools
48+
49+
## 4. Technical Requirements
50+
51+
### 4.1 Development
52+
- Go 1.24 or later required
53+
- ANTLR4 grammar for Oracle PL/SQL (supporting Oracle 19c syntax)
54+
- No regex allowed for statement splitting
55+
- Public GitHub repository under MIT license
56+
57+
### 4.2 ANTLR4 Integration
58+
- May require creating or updating ANTLR grammar to support latest language features
59+
- Custom grammar implementation if existing packages are outdated
60+
- Performance optimization for the ANTLR4 parsing process
61+
62+
## 5. Performance Requirements
63+
64+
### 5.1 Processing Speed
65+
- Optimize for parsing speed while maintaining accuracy
66+
- Should handle multi-megabyte SQL scripts efficiently
67+
68+
### 5.2 Memory Efficiency
69+
- Minimize memory usage, especially for large files
70+
- Consider streaming approach for very large files to avoid loading entire content
71+
72+
### 5.3 Resource Utilization
73+
- Avoid excessive CPU usage during parsing
74+
- Balance accuracy with performance
75+
76+
## 6. Input/Output Specifications
77+
78+
### 6.1 Input
79+
- File paths to PL/SQL scripts
80+
- String content containing PL/SQL statements
81+
- File encoding defaults to UTF-8
82+
83+
### 6.2 Output
84+
- Structured data containing individual SQL statements
85+
- Location information (line/column) for each statement
86+
- Statement type classification if available from ANTLR parser
87+
- Format that supports JSON marshalling
88+
89+
Example output structure:
90+
```go
91+
type Statement struct {
92+
Content string `json:"content"`
93+
StartLine int `json:"startLine"`
94+
EndLine int `json:"endLine"`
95+
StartColumn int `json:"startColumn"`
96+
EndColumn int `json:"endColumn"`
97+
Type string `json:"type,omitempty"` // If available from ANTLR parser
98+
}
99+
```
100+
101+
## 7. Error Handling
102+
103+
### 7.1 Syntax Errors
104+
- Detailed error messages for syntax issues
105+
- Include file, line, and column information
106+
- Clear description of the error nature
107+
108+
Example error structure:
109+
```go
110+
type SyntaxError struct {
111+
Message string `json:"message"`
112+
Line int `json:"line"`
113+
Column int `json:"column"`
114+
Statement string `json:"statement,omitempty"`
115+
}
116+
```
117+
118+
### 7.2 File Errors
119+
- Proper handling of file not found, permission issues, etc.
120+
- Clear error messages for I/O problems
121+
122+
### 7.3 No Recovery Mechanism
123+
- No error recovery mechanisms required
124+
- Parser should fail with clear error when syntax is invalid
125+
126+
## 8. Integration with External Components
127+
128+
### 8.1 plsql-parser Integration
129+
- Design for compatibility with github.com/zodimo/plsql-parser
130+
- Clear interfaces for integration with this package
131+
132+
### 8.2 Deployment System Integration
133+
- Output format suitable for passing to deployment systems
134+
- Consider common deployment tool requirements
135+
136+
## 9. API Design
137+
138+
### 9.1 Simple Interface
139+
```go
140+
// Basic functions for simple use cases
141+
func SplitFile(filePath string) ([]Statement, error)
142+
func SplitString(content string) ([]Statement, error)
143+
```
144+
145+
### 9.2 Configurable Interface
146+
```go
147+
// More flexible interface with configuration options
148+
type Splitter struct {
149+
// Configuration options
150+
}
151+
152+
func NewSplitter(options ...Option) *Splitter
153+
func (s *Splitter) SplitFile(filePath string) ([]Statement, error)
154+
func (s *Splitter) SplitString(content string) ([]Statement, error)
155+
```
156+
157+
## 10. Testing and Quality Assurance
158+
159+
### 10.1 Test Coverage
160+
- Comprehensive test suite with high coverage
161+
- Unit tests for parser components
162+
- Integration tests for full splitting functionality
163+
164+
### 10.2 Test Cases
165+
- Tests for all Oracle 19c statement types
166+
- Edge cases (comments, nested statements, etc.)
167+
- Error cases with invalid syntax
168+
169+
### 10.3 Performance Testing
170+
- Benchmark tests for performance optimization
171+
- Memory usage monitoring
172+
- Load testing with large SQL scripts
173+
174+
## 11. Deployment and Distribution
175+
176+
### 11.1 Packaging
177+
- Standard Go module
178+
- Public GitHub repository under user zodimo
179+
- MIT license
180+
181+
### 11.2 Documentation
182+
- Comprehensive README with usage examples
183+
- Godoc API documentation
184+
- Examples for common use cases
185+
186+
### 11.3 Versioning
187+
- Semantic versioning (SemVer)
188+
- Backward compatibility guarantees
189+
190+
## 12. Implementation Considerations
191+
192+
### 12.1 ANTLR4 Grammar
193+
- Evaluate existing PL/SQL grammars for ANTLR4
194+
- May need to fork and modify an existing grammar to support Oracle 19c
195+
- Consider performance optimizations in the grammar
196+
197+
### 12.2 Parsing Strategy
198+
- Use ANTLR4's parse tree listeners or visitors
199+
- Track statement boundaries during parsing
200+
- Handle special cases like anonymous blocks, stored procedures, etc.
201+
202+
### 12.3 Memory Management
203+
- Avoid loading entire files into memory for large scripts
204+
- Consider streaming parser implementation for large files
205+
- Efficient string handling
206+
207+
## 13. Development Roadmap
208+
209+
### 13.1 Phase 1: Basic Functionality
210+
- Set up project structure
211+
- Implement basic file/string parsing
212+
- Handle simple statement types
213+
214+
### 13.2 Phase 2: Enhanced Features
215+
- Support all Oracle 19c statement types
216+
- Implement detailed error reporting
217+
- Add position tracking
218+
219+
### 13.3 Phase 3: Performance Optimization
220+
- Benchmark and optimize for speed
221+
- Memory usage optimization
222+
- Handle edge cases
223+
224+
### 13.4 Phase 4: Documentation and Examples
225+
- Comprehensive documentation
226+
- Usage examples
227+
- Integration examples
228+
229+
## 14. Challenges and Risks
230+
231+
### 14.1 ANTLR4 Grammar Complexity
232+
- PL/SQL grammar is complex and may require significant effort to get 100% correct
233+
- Specific Oracle 19c syntax features might be challenging to parse
234+
235+
### 14.2 Performance Optimization
236+
- Balancing parsing accuracy with performance
237+
- ANTLR4 parsers can be memory-intensive for complex grammars
238+
239+
### 14.3 Edge Cases
240+
- Handling non-standard PL/SQL syntax extensions
241+
- Correctly parsing complex nested constructs
242+
243+
## 15. Dependencies
244+
245+
### 15.1 ANTLR4 Go Runtime
246+
- Required for parsing
247+
- May have its own version constraints
248+
249+
### 15.2 plsql-parser Package
250+
- Need to define clear interfaces with this package
251+
- Ensure compatible design decisions
252+
253+
## 16. Future Considerations
254+
255+
### 16.1 Version Support
256+
- Potential support for newer Oracle versions after 19c
257+
- Backward compatibility with older Oracle versions
258+
259+
### 16.2 Feature Extensions
260+
- Potential support for other SQL dialects
261+
- Statement validation capabilities
262+
- SQL transformation features
263+
264+
### 16.3 Performance Enhancements
265+
- Ongoing optimization for even better performance
266+
- Support for concurrent parsing

0 commit comments

Comments
 (0)