|
| 1 | +# TypesXML AI Agent Guidelines |
| 2 | + |
| 3 | +## Performance & Memory Optimization |
| 4 | + |
| 5 | +### Buffer Management |
| 6 | +- The SAXParser uses a minimum buffer size of 2048 bytes (`SAXParser.MIN_BUFFER_SIZE`) |
| 7 | +- For large files, the parser reads incrementally and expands buffer as needed |
| 8 | +- **AI Recommendation**: For very large XML files (>100MB), suggest custom ContentHandler over DOMBuilder to avoid memory issues |
| 9 | + |
| 10 | +### Memory Usage Patterns |
| 11 | +```typescript |
| 12 | +// Memory-efficient for large files |
| 13 | +class LargeFileHandler implements ContentHandler { |
| 14 | + // Process elements without storing entire DOM |
| 15 | +} |
| 16 | + |
| 17 | +// Memory-intensive for large files |
| 18 | +const builder = new DOMBuilder(); // Stores entire DOM in memory |
| 19 | +``` |
| 20 | + |
| 21 | +### Performance Considerations |
| 22 | +- File parsing is more efficient than string parsing (string parsing creates temp files) |
| 23 | +- Encoding detection adds small overhead - specify encoding when known |
| 24 | +- DTD parsing is optional and adds processing time |
| 25 | + |
| 26 | +## XML Standards Compliance |
| 27 | + |
| 28 | +### Supported XML Versions |
| 29 | +- XML 1.0 (default) |
| 30 | +- XML 1.1 (when specified in declaration) |
| 31 | +- Character validation differs between versions |
| 32 | + |
| 33 | +### What's NOT Supported |
| 34 | +- **Schema Validation**: No XSD, RelaxNG validation yet |
| 35 | +- **Namespace Processing**: Limited namespace support |
| 36 | +- **External Entity Resolution**: Requires manual catalog setup |
| 37 | +- **Default Attribute Values**: Not automatically applied from DTD |
| 38 | + |
| 39 | +### Well-formedness vs. Validity |
| 40 | +```typescript |
| 41 | +// Library checks well-formedness, NOT validity |
| 42 | +parser.parseString('<root><unclosed>'); // Throws error - not well-formed |
| 43 | +parser.parseString('<root><child/></root>'); // Parses fine - well-formed |
| 44 | +``` |
| 45 | + |
| 46 | +## Entity Resolution & Catalogs |
| 47 | + |
| 48 | +### When to Use Catalogs |
| 49 | +```typescript |
| 50 | +// Use when XML references external DTDs |
| 51 | +const catalog = new Catalog('catalog.xml'); |
| 52 | +const builder = new DOMBuilder(); |
| 53 | +builder.setCatalog(catalog); |
| 54 | + |
| 55 | +// Without catalog, external references may fail |
| 56 | +``` |
| 57 | + |
| 58 | +### Entity Types Supported |
| 59 | +- Built-in entities (`<`, `>`, `&`, `'`, `"`) |
| 60 | +- Character references (`A`, `A`) |
| 61 | +- External entities via catalog resolution |
| 62 | + |
| 63 | +## Error Handling & Edge Cases |
| 64 | + |
| 65 | +### Common Error Scenarios |
| 66 | +1. **Missing ContentHandler**: "ContentHandler not set" |
| 67 | +2. **Malformed XML**: "unclosed elements", "text found in prolog" |
| 68 | +3. **Encoding Issues**: Specify encoding explicitly when possible |
| 69 | +4. **File Access**: Check file existence before parsing |
| 70 | + |
| 71 | +### Graceful Degradation |
| 72 | +```typescript |
| 73 | +// AI should recommend this pattern |
| 74 | +try { |
| 75 | + parser.parseFile(file, encoding); |
| 76 | +} catch (error) { |
| 77 | + // Fallback strategies based on error type |
| 78 | + if (error.message.includes('encoding')) { |
| 79 | + // Try different encoding |
| 80 | + } else if (error.message.includes('not found')) { |
| 81 | + // Handle missing file |
| 82 | + } |
| 83 | +} |
| 84 | +``` |
| 85 | + |
| 86 | +## Use Case Decision Matrix |
| 87 | + |
| 88 | +### When to Recommend SAXParser + DOMBuilder |
| 89 | +- **File size**: < 50MB |
| 90 | +- **Need**: DOM manipulation, XPath-like queries |
| 91 | +- **Memory**: Sufficient RAM available |
| 92 | + |
| 93 | +### When to Recommend SAXParser + Custom Handler |
| 94 | +- **File size**: > 50MB or streaming data |
| 95 | +- **Need**: Extract specific data, transform on-the-fly |
| 96 | +- **Memory**: Limited RAM or performance critical |
| 97 | + |
| 98 | +### When to Recommend XMLWriter |
| 99 | +- **Creating XML**: Always prefer over string concatenation |
| 100 | +- **Encoding**: Automatic BOM handling for UTF-16LE |
| 101 | +- **File output**: Better than manual file writing |
| 102 | + |
| 103 | +## Integration Patterns |
| 104 | + |
| 105 | +### With Node.js Streams |
| 106 | +```typescript |
| 107 | +// AI should suggest this for large files |
| 108 | +class StreamingXMLProcessor { |
| 109 | + processChunks(xmlStream: ReadableStream) { |
| 110 | + // Process XML in chunks rather than loading all at once |
| 111 | + } |
| 112 | +} |
| 113 | +``` |
| 114 | + |
| 115 | +### With Express.js |
| 116 | +```typescript |
| 117 | +// Validate XML in middleware |
| 118 | +app.use('/api/xml', (req, res, next) => { |
| 119 | + try { |
| 120 | + const parser = new SAXParser(); |
| 121 | + // Validate before processing |
| 122 | + } catch (error) { |
| 123 | + return res.status(400).json({ error: 'Invalid XML' }); |
| 124 | + } |
| 125 | +}); |
| 126 | +``` |
| 127 | + |
| 128 | +### With TypeScript Strict Mode |
| 129 | +```typescript |
| 130 | +// Always check for undefined/null with strict mode |
| 131 | +const doc = builder.getDocument(); |
| 132 | +if (!doc) return; // Required check |
| 133 | + |
| 134 | +const root = doc.getRoot(); |
| 135 | +if (!root) return; // Required check |
| 136 | +``` |
| 137 | + |
| 138 | +## DTD and Grammar Features |
| 139 | + |
| 140 | +### Current DTD Support |
| 141 | +- Element declarations (`<!ELEMENT>`) |
| 142 | +- Attribute list declarations (`<!ATTLIST>`) |
| 143 | +- Entity declarations (`<!ENTITY>`) |
| 144 | +- Notation declarations (`<!NOTATION>`) |
| 145 | +- Internal subsets |
| 146 | +- External DTD references |
| 147 | + |
| 148 | +### DTD Limitations |
| 149 | +- No validation against DTD rules |
| 150 | +- Parameter entities supported but limited |
| 151 | +- Conditional sections supported |
| 152 | +- No default attribute value application |
| 153 | + |
| 154 | +## Namespace Handling |
| 155 | + |
| 156 | +### Current Support |
| 157 | +```typescript |
| 158 | +// Basic namespace detection |
| 159 | +element.getNamespace(); // Returns prefix before ':' |
| 160 | +element.getName(); // Returns full name including prefix |
| 161 | +``` |
| 162 | + |
| 163 | +### Limitations |
| 164 | +- No namespace URI resolution |
| 165 | +- No namespace context management |
| 166 | +- No namespace validation |
| 167 | + |
| 168 | +## AI Agent Recommendations |
| 169 | + |
| 170 | +### Code Quality Checks |
| 171 | +1. **Always check return values** for undefined/null |
| 172 | +2. **Use try-catch** around all parsing operations |
| 173 | +3. **Specify encoding** when known to avoid detection overhead |
| 174 | +4. **Choose appropriate ContentHandler** based on use case |
| 175 | +5. **Use XMLWriter** for XML generation, not string concatenation |
| 176 | + |
| 177 | +### Performance Optimization |
| 178 | +1. **File size assessment**: Recommend streaming for large files |
| 179 | +2. **Memory profiling**: Suggest custom handlers for memory-constrained environments |
| 180 | +3. **Encoding specification**: Reduce parsing overhead |
| 181 | +4. **Incremental processing**: Break large operations into chunks |
| 182 | + |
| 183 | +### Error Prevention |
| 184 | +1. **Input validation**: Check file existence, encoding validity |
| 185 | +2. **Resource cleanup**: Ensure FileReader.closeFile() is called |
| 186 | +3. **Error propagation**: Provide meaningful error messages |
| 187 | +4. **Fallback strategies**: Handle common failure scenarios |
| 188 | + |
| 189 | +### Best Practices Enforcement |
| 190 | +1. **Null safety**: Enforce null checks in TypeScript strict mode |
| 191 | +2. **Resource management**: Proper file handle cleanup |
| 192 | +3. **Encoding consistency**: UTF-8 default, explicit when needed |
| 193 | +4. **Error boundaries**: Isolate XML processing errors |
| 194 | + |
| 195 | +## Common Anti-patterns to Avoid |
| 196 | + |
| 197 | +### Memory Leaks |
| 198 | +```typescript |
| 199 | +// BAD: Parser instance reuse without cleanup |
| 200 | +const parser = new SAXParser(); |
| 201 | +// Process multiple files without proper cleanup |
| 202 | + |
| 203 | +// GOOD: Fresh parser instances or proper cleanup |
| 204 | +for (const file of files) { |
| 205 | + const parser = new SAXParser(); |
| 206 | + // Process file |
| 207 | +} |
| 208 | +``` |
| 209 | + |
| 210 | +### Unsafe Type Assumptions |
| 211 | +```typescript |
| 212 | +// BAD: Assuming non-null returns |
| 213 | +const root = doc.getRoot().getName(); // May throw |
| 214 | + |
| 215 | +// GOOD: Null checking |
| 216 | +const root = doc.getRoot(); |
| 217 | +if (root) { |
| 218 | + const name = root.getName(); |
| 219 | +} |
| 220 | +``` |
| 221 | + |
| 222 | +### String Concatenation for XML |
| 223 | +```typescript |
| 224 | +// BAD: Manual XML construction |
| 225 | +let xml = '<?xml version="1.0"?><root>'; |
| 226 | +xml += '<child>' + data + '</child>'; |
| 227 | +xml += '</root>'; |
| 228 | + |
| 229 | +// GOOD: Using library classes |
| 230 | +const doc = new XMLDocument(); |
| 231 | +const root = new XMLElement('root'); |
| 232 | +// ... proper construction |
| 233 | +``` |
| 234 | + |
| 235 | +This guide should help AI agents provide more accurate, safe, and efficient recommendations when working with the TypesXML library. |
0 commit comments